You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team,
First, I would like to express my gratitude for the hard work involved in developing OLMo and making everything publicly available.
I have a question about the 1B model. The paper mentions that it was trained up to 2 trillion tokens, but the current GitHub indicates it has been trained up to 3 trillion tokens. Since all model checkpoints trained up to 3 trillion tokens are publicly available, can we consider the training beyond 2 trillion tokens as the second epoch? The Dolma dataset is described as having 2 trillion tokens in the paper, but on GitHub, there is only a data order file for epoch 1, which is confusing.
Additionally, I would like to know the exact step when the first epoch ends. The train.pt file for the 7B model explicitly indicates when it reaches the second epoch, but this is not specified for the 1B models. Could you clarify this?
Thank you in advance!
The text was updated successfully, but these errors were encountered:
Hi,
The 2T in the paper is a typo. Thank you for pointing it out!
I'm not as familiar with the data side of things, but according to https://huggingface.co/datasets/allenai/dolma we used a 2T sample of dolma for 7B and a 3T version of dolma for the 1B. Thus there would be no second epoch for OLMo 1B.
❓ The question
Hi team,
First, I would like to express my gratitude for the hard work involved in developing OLMo and making everything publicly available.
I have a question about the 1B model. The paper mentions that it was trained up to 2 trillion tokens, but the current GitHub indicates it has been trained up to 3 trillion tokens. Since all model checkpoints trained up to 3 trillion tokens are publicly available, can we consider the training beyond 2 trillion tokens as the second epoch? The Dolma dataset is described as having 2 trillion tokens in the paper, but on GitHub, there is only a data order file for epoch 1, which is confusing.
Additionally, I would like to know the exact step when the first epoch ends. The train.pt file for the 7B model explicitly indicates when it reaches the second epoch, but this is not specified for the 1B models. Could you clarify this?
Thank you in advance!
The text was updated successfully, but these errors were encountered: