This repository contains the training scripts and configuration files for DMaS-LLaMa-Lite, a 1.7B-parameter model inspired by the LLaMa architecture and trained from scratch on approximately 20 billion tokens of curated high-quality data.
The overall training code is modified from BuildNanoGPT, while the LLaMa implementation is adapted from hengjiUSTC's learn-llm.
DMaS-LLaMa-Lite is a 1.7B-parameter LLaMa-based language model pretrained with the following key highlights:
- High-Quality Training Data: Curated from the FineWeb-Edu dataset, emphasizing educational and coherent content.
- Training Stability: Insights include the importance of optimizer state restoration and managing hardware transitions.
- Efficient Performance: Competitive downstream task results achieved with significantly fewer tokens than comparable models.
For more details, refer to our paper:
Experience of Training a 1.7B-Parameter LLaMa Model From Scratch
Miles Q Li, Benjamin Fung, Shih-Chia Huang (2024)
arXiv preprint link
If you use this repository or the pre-trained model, please cite our work:
@article{li2024effectiveness,
title={Experience of Training a 1.7B-Parameter LLaMa Model From Scratch},
author={Li, Miles Q and Fung, Benjamin and Huang, Shih-Chia},
journal={arXiv preprint arXiv:2412.13335},
year={2024}
}
- Training Code: GitHub Repository
- Pre-trained Checkpoints: HuggingFace Collection
This work builds on open-source projects, including:
We thank the community for their valuable contributions and tools that make this research possible.