Number of tokens Olmo-1B was trained: 2T or 3T? #697

jiyeonkimd · 2024-08-09T01:46:45Z

❓ The question

Hi team,
First, I would like to express my gratitude for the hard work involved in developing OLMo and making everything publicly available.
I have a question about the 1B model. The paper mentions that it was trained up to 2 trillion tokens, but the current GitHub indicates it has been trained up to 3 trillion tokens. Since all model checkpoints trained up to 3 trillion tokens are publicly available, can we consider the training beyond 2 trillion tokens as the second epoch? The Dolma dataset is described as having 2 trillion tokens in the paper, but on GitHub, there is only a data order file for epoch 1, which is confusing.

Additionally, I would like to know the exact step when the first epoch ends. The train.pt file for the 7B model explicitly indicates when it reaches the second epoch, but this is not specified for the 1B models. Could you clarify this?

Thank you in advance!

2015aroras · 2024-08-15T17:48:23Z

Hi,
The 2T in the paper is a typo. Thank you for pointing it out!

I'm not as familiar with the data side of things, but according to https://huggingface.co/datasets/allenai/dolma we used a 2T sample of dolma for 7B and a 3T version of dolma for the 1B. Thus there would be no second epoch for OLMo 1B.

jiyeonkimd added the type/question An issue that's a question label Aug 9, 2024

jiyeonkimd closed this as completed Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of tokens Olmo-1B was trained: 2T or 3T? #697

Number of tokens Olmo-1B was trained: 2T or 3T? #697

jiyeonkimd commented Aug 9, 2024 •

edited

Loading

2015aroras commented Aug 15, 2024

Number of tokens Olmo-1B was trained: 2T or 3T? #697

Number of tokens Olmo-1B was trained: 2T or 3T? #697

Comments

jiyeonkimd commented Aug 9, 2024 • edited Loading

❓ The question

2015aroras commented Aug 15, 2024

jiyeonkimd commented Aug 9, 2024 •

edited

Loading