Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

Some questions associated with pre-training #160

Discussion options

You must be logged in to vote

The following answer is from @rmrao, via email:

I don’t think the esm repo has this code, but it was trained using the fairseq repo at github.com/pytorch/fairseq. I also have a version in a personal repo: https://github.com/rmrao/evo/blob/ac86e2b8a6f78d5e3a0b8bf47a978a12735ca8c4/evo/dataset.py#L653.

Yes, max_positions is the maximum length of a sequence.

It was trained for a total of ~500,000 updates, with an approximate batch size of 512 sequences (batch size was actually based on # tokens, with a maximum of 1024 tokens / sequence, so a total of 512 * 1024 tokens per update).

Hope that helps!

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by vigneshvalliappan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant