You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.
I was curious about some of the training details for ESM model "esm2_t33_650M_UR50D".
What was the batch size used during training per V100 GPU? From the supplement, I read 2 million tokens per batch across 512 V100s, which means ~4 sequences padded to length 1024 per GPU. I am asking because I cannot fit more than 1 sequence (1024 tokens) per V100 for the "esm2_t33_650M_UR50D" model, using the huggingface trainer with fp16. Is this expected?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I was curious about some of the training details for ESM model "esm2_t33_650M_UR50D".
What was the batch size used during training per V100 GPU? From the supplement, I read 2 million tokens per batch across 512 V100s, which means ~4 sequences padded to length 1024 per GPU. I am asking because I cannot fit more than 1 sequence (1024 tokens) per V100 for the "esm2_t33_650M_UR50D" model, using the huggingface trainer with fp16. Is this expected?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions