ESM Training Details #580

scopello · 2023-06-29T00:21:28Z

scopello
Jun 29, 2023

Hi,

I was curious about some of the training details for ESM model "esm2_t33_650M_UR50D".

What was the batch size used during training per V100 GPU? From the supplement, I read 2 million tokens per batch across 512 V100s, which means ~4 sequences padded to length 1024 per GPU. I am asking because I cannot fit more than 1 sequence (1024 tokens) per V100 for the "esm2_t33_650M_UR50D" model, using the huggingface trainer with fp16. Is this expected?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESM Training Details #580

{{title}}

Replies: 0 comments

Select a reply

ESM Training Details #580

scopello Jun 29, 2023

Replies: 0 comments

scopello
Jun 29, 2023