<eos> token/depth of MSAs (MSA Transformer) #89
-
Hi everyone, Congratulations on your great paper and thanks for making the model/code publicly available! (1) In the MSA paper you used "a batch size of 512 MSAs". Does this mean that you kept the number of MSAs fixed for all batches independent of the length of the sequences and that chose the depth of the MSAs (after subsampling) such that the maximum number of tokens per batch is not exceeded? If yes, this would mean that shorter sequences are trained with a much higher depth (of the subsampled MSA) than longer sequences. Did you also try/consider to fix the depth of the subsampled MSAs and instead choose the number of MSAs per batch such that maximum number of tokens is not exceeded? (2) We recognized that you didn't use the end of sequence token for encoding of the input sequence in the MSA Transformer in comparison to the encoding for the ESM-1b model. Is there a particular reason why the token is not used for the MSA Transformer? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi Alexander!
|
Beta Was this translation helpful? Give feedback.
Hi Alexander!
--update-freq
)