A minor detail about positional embedding #133
-
Hi, I found an interesting phenomenon in the process of using your MSA Transformer model. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
This is borrowed directly from fairseq. |
Beta Was this translation helpful? Give feedback.
-
Got it! Thank you for your explanation :) |
Beta Was this translation helpful? Give feedback.
This is borrowed directly from fairseq.
From a quick skim the crux seems to be to have a specific index in the learned embedding (ie
self.padding_idx=1
) which learns the embedding for padding. The actual positional embeddings (in our case 0-1023) are shifted to avoid re-using it (+2), so the total number of learned embedding vectors is 1024+2