Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect implementation of self-attention #23

Open
TrinitialChan opened this issue Nov 21, 2023 · 0 comments
Open

Incorrect implementation of self-attention #23

TrinitialChan opened this issue Nov 21, 2023 · 0 comments

Comments

@TrinitialChan
Copy link

Your paper specifies that the Decoder section performs a stacked multi-head self-attention operation, however I have found in the code that the behavior of the DecoderLayer class is inconsistent with the above description. By printing the attn_output_weights of the self_attn module, I found attention map shaped '([L, 1, 1])', and there is clearly a problem with such an attention computation. #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant