-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage example #1
Comments
Hi! The code is written in easy format for anyone to see, edit and modify according to the needs.
If you are locally building it, make sure you adjust the model's hyperparameters such as Cheers! 🎉 Good luck training LLaMA 2 on BitNet. If you think the documentation needs improvement or you can improve the code quality, make sure to raise an issue and open a Pull Request. |
Hi, thank you very much for your help. I managed to train the model as per your script documentation. I was wondering if there's a way to use your code to train over the togethercomputer/RedPajama-Data-1T-Sample or togethercomputer/RedPajama-Data-1T dataset? I have been trying but I keep getting errors related to the "train"/"test" split. Thank you! |
Hi. I wonder how much training percentage of original dataset defined in the script did you used to train BitNet architecture. If you can share the model weights on HF and tag me nirajandhakal and I would love to check your model out. Also about RedPajama dataset, can you show me what kind of errors are you facing? I will test out the script when I am free. |
Hi, I'm not sure what you mean with training percentage, I just ran the script exactly as it presents on the repo on the openwebtext-tokenized-small dataset. About RedPajama, I get an error related to the fact there's no "test" split so the script fails as there's no tokenized_data["test"] entry. What shoud I pass as eval_dataset? Or maybe there's a preprocessing step I have to do beforehand? Thanks in advance |
Can you provide an example of how to launch a training instance? how can one choose the llama model size (350M, 750M, .. 7B, etc)? Thanks in advance
The text was updated successfully, but these errors were encountered: