A minor detail about positional embedding #133

Yijia-Xiao · 2021-10-02T11:01:05Z

Yijia-Xiao
Oct 2, 2021

Hi, I found an interesting phenomenon in the process of using your MSA Transformer model.
In the LearnedPositionalEmbedding class, you added padding_idx to positions (kind of like offset). I wonder why padding_idx is added and is this addition necessary? Thank you!

Answered by tomsercu

Oct 6, 2021

This is borrowed directly from fairseq.
From a quick skim the crux seems to be to have a specific index in the learned embedding (ie self.padding_idx=1) which learns the embedding for padding. The actual positional embeddings (in our case 0-1023) are shifted to avoid re-using it (+2), so the total number of learned embedding vectors is 1024+2

View full answer

tomsercu · 2021-10-06T01:47:35Z

tomsercu
Oct 6, 2021

This is borrowed directly from fairseq.
From a quick skim the crux seems to be to have a specific index in the learned embedding (ie self.padding_idx=1) which learns the embedding for padding. The actual positional embeddings (in our case 0-1023) are shifted to avoid re-using it (+2), so the total number of learned embedding vectors is 1024+2

0 replies

Yijia-Xiao · 2021-10-06T09:47:24Z

Yijia-Xiao
Oct 6, 2021
Author

Got it! Thank you for your explanation :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A minor detail about positional embedding #133

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

A minor detail about positional embedding #133

Yijia-Xiao Oct 2, 2021

Replies: 2 comments

tomsercu Oct 6, 2021

Yijia-Xiao Oct 6, 2021 Author

Yijia-Xiao
Oct 2, 2021

tomsercu
Oct 6, 2021

Yijia-Xiao
Oct 6, 2021
Author