fix the self_attn module of DecoderLayer #22

TrinitialChan · 2023-11-16T03:42:54Z

Your paper specifies that the Decoder section performs a stacked multi-head self-attention operation, however I have found in the code that the behavior of the DecoderLayer class is inconsistent with the above description. By printing the attn_output_weights of the self_attn module, I found attention map shaped '([L, 1, 1])', and there is clearly a problem with such an attention computation. I provided a quickfix in this PR.

kilimchoi · 2023-12-07T04:57:03Z

@xingqian2018 can you check this?

fix the self_attn module of DecoderLayer

3e14cc0

TrinitialChan mentioned this pull request Nov 21, 2023

Incorrect implementation of self-attention #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the self_attn module of DecoderLayer #22

fix the self_attn module of DecoderLayer #22

TrinitialChan commented Nov 16, 2023

kilimchoi commented Dec 7, 2023

fix the self_attn module of DecoderLayer #22

Are you sure you want to change the base?

fix the self_attn module of DecoderLayer #22

Conversation

TrinitialChan commented Nov 16, 2023

kilimchoi commented Dec 7, 2023