Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

A minor detail about positional embedding #133

Answered by tomsercu
Yijia-Xiao asked this question in Q&A
Discussion options

You must be logged in to vote

This is borrowed directly from fairseq.
From a quick skim the crux seems to be to have a specific index in the learned embedding (ie self.padding_idx=1) which learns the embedding for padding. The actual positional embeddings (in our case 0-1023) are shifted to avoid re-using it (+2), so the total number of learned embedding vectors is 1024+2

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by tomsercu
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #132 on October 06, 2021 01:38.