DMaS-LLaMa-Lite: 1.7B-Parameter LLaMa Model Training Code

This repository contains the training scripts and configuration files for DMaS-LLaMa-Lite, a 1.7B-parameter model inspired by the LLaMa architecture and trained from scratch on approximately 20 billion tokens of curated high-quality data.

The overall training code is modified from BuildNanoGPT, while the LLaMa implementation is adapted from hengjiUSTC's learn-llm.

Model Overview

DMaS-LLaMa-Lite is a 1.7B-parameter LLaMa-based language model pretrained with the following key highlights:

High-Quality Training Data: Curated from the FineWeb-Edu dataset, emphasizing educational and coherent content.
Training Stability: Insights include the importance of optimizer state restoration and managing hardware transitions.
Efficient Performance: Competitive downstream task results achieved with significantly fewer tokens than comparable models.

For more details, refer to our paper:

Experience of Training a 1.7B-Parameter LLaMa Model From Scratch
Miles Q Li, Benjamin Fung, Shih-Chia Huang (2024)
arXiv preprint link

Citation

If you use this repository or the pre-trained model, please cite our work:

@article{li2024effectiveness,
  title={Experience of Training a 1.7B-Parameter LLaMa Model From Scratch},
  author={Li, Miles Q and Fung, Benjamin and Huang, Shih-Chia},
  journal={arXiv preprint arXiv:2412.13335},
  year={2024}
}

Resources

Training Code: GitHub Repository
Pre-trained Checkpoints: HuggingFace Collection

Acknowledgments

This work builds on open-source projects, including:

We thank the community for their valuable contributions and tools that make this research possible.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
models		models
LICENSE		LICENSE
README.md		README.md
convert_llama_to_hf.py		convert_llama_to_hf.py
fineweb.py		fineweb.py
hellaswag.py		hellaswag.py
presets.py		presets.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DMaS-LLaMa-Lite: 1.7B-Parameter LLaMa Model Training Code

Model Overview

Citation

Resources

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

McGill-DMaS/DMaS-LLaMa-Lite-Training-Code

Folders and files

Latest commit

History

Repository files navigation

DMaS-LLaMa-Lite: 1.7B-Parameter LLaMa Model Training Code

Model Overview

Citation

Resources

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages