Some questions associated with pre-training #160
-
Hi, Thank you for your amazing work. I have some questions regarding pre-training that I hope you can help with.
Thank you for your time and consideration. Regards, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The following answer is from @rmrao, via email: I don’t think the esm repo has this code, but it was trained using the fairseq repo at github.com/pytorch/fairseq. I also have a version in a personal repo: https://github.com/rmrao/evo/blob/ac86e2b8a6f78d5e3a0b8bf47a978a12735ca8c4/evo/dataset.py#L653. Yes, max_positions is the maximum length of a sequence. It was trained for a total of ~500,000 updates, with an approximate batch size of 512 sequences (batch size was actually based on # tokens, with a maximum of 1024 tokens / sequence, so a total of 512 * 1024 tokens per update). Hope that helps! |
Beta Was this translation helpful? Give feedback.
The following answer is from @rmrao, via email:
I don’t think the esm repo has this code, but it was trained using the fairseq repo at github.com/pytorch/fairseq. I also have a version in a personal repo: https://github.com/rmrao/evo/blob/ac86e2b8a6f78d5e3a0b8bf47a978a12735ca8c4/evo/dataset.py#L653.
Yes, max_positions is the maximum length of a sequence.
It was trained for a total of ~500,000 updates, with an approximate batch size of 512 sequences (batch size was actually based on # tokens, with a maximum of 1024 tokens / sequence, so a total of 512 * 1024 tokens per update).
Hope that helps!