Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

<eos> token/depth of MSAs (MSA Transformer) #89

Answered by tomsercu
AlexanderKroll asked this question in Q&A
Discussion options

You must be logged in to vote

Hi Alexander!

  1. correct, each individual sample contains 2*14 tokens and fills up the GPU. Batch size 512 is achieved with distributed dataparallel and accumulating gradients of several fw/bw passes into a single batch (fairseq flag --update-freq)
  2. no particular reason, should be completely irrelevant for results.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@AlexanderKroll
Comment options

Answer selected by AlexanderKroll
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants