You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking at the config files and noticed that the config files sometimes point to .npy files for the dataset. Is there any script to generate the same from a set of text files or any other format.
The text was updated successfully, but these errors were encountered:
You can use Hugging Face to download the dataset directly or the Dolma toolkit. The Hugging Face repository provides easy access to the dataset, and the Dolma toolkit offers utilities to handle different data format. If you need further help, feel free to follow up.
❓ The question
I was looking at the config files and noticed that the config files sometimes point to
.npy
files for the dataset. Is there any script to generate the same from a set of text files or any other format.The text was updated successfully, but these errors were encountered: