v0.2.0
Overview
It’s been awhile since we’ve done a release and we have a ton of cool, new features in the torchtune library including distributed QLoRA support, new models, sample packing, and more! Checkout #new-contributors for an exhaustive list of new contributors to the repo.
Enjoy the new release and happy tuning!
New Features
Here’s some highlights of our new features in v0.2.0.
Recipes
- We added support for QLoRA with FSDP2! This means users can now run 70B+ models on multiple GPUs. We provide example configs for Llama2 7B and 70B sizes. Note: this currently requires you to install PyTorch nightlies to access the FSDP2 methods. (#909)
- Also by leveraging FSDP2, we see a speed up of 12% tokens/sec and a 3.2x speedup in model init over FSDP1 with LoRA (#855)
- We added support for other variants of the Meta-Llama3 recipes including:
- We introduce a quantization-aware training (QAT) recipe. Training with QAT shows significant improvement in model quality if you plan on quantizing your model post-training. (#980)
- torchtune made updates to the eval recipe including:
Models
- Phi-3 Mini-4K-Instruct from Microsoft (#876)
- Gemma 7B from Google (#971)
- Code Llama2: 7B, 13B, and 70B sizes from Meta (#847)
- @salman designed and implemented reward modeling for Mistral models (#840, #991)
Perf, memory, and quantization
- We made improvements to our FSDP + Llama3 recipe, resulting in 13% more savings in allocated memory for the 8B model. (#865)
- Added Int8 per token dynamic activation + int4 per axis grouped weight (8da4w) quantization (#884)
Data/Datasets
- We added support for a widely requested feature - sample packing! This feature drastically speeds up model training - e.g. 2X faster with the alpaca dataset. (#875, #1109)
- In addition to our instruct tuning, we now also support continued pretraining and include several example datasets like wikitext and CNN DailyMail. (#868)
- Users can now train on multiple datasets using concat datasets (#889)
- We now support OpenAI conversation style data (#890)
Miscellaneous
- @jeromeku added a much more advanced profiler so users can understand the exact bottlenecks in their LLM training. (#1089)
- We made several metric logging improvements:
- Users can now save models in a safetensor format. (#1096)
- Updated activation checkpointing to support selective layer and selective op activation checkpointing (#785)
- We worked with the Hugging Face team to provide support for loading adapter weights fine tuned via torchtune directly into the PEFT library. (#933)
Documentation
- We wrote a new tutorial for fine-tuning Llama3 with chat data (#823) and revamped the datasets tutorial (#994)
- Looooooooong overdue, but we added proper documentation for the tune CLI (#1052)
- Improved contributing guide (#896)
Bug Fixes
- @Optimox found and fixed a bug to ensure that LoRA dropout was correctly applied (#996)
- Fixed a broken link for Llama3 tutorial in #805
- Fixed Gemma model generation (#1016)
- Bug workaround: to download CNN DailyMail, launch a single device recipe first and once it’s downloaded you can use the dataset for distributed recipes.
New Contributors
- @supernovae made their first contribution in #803
- @eltociear made their first contribution in #814
- @Carolinabanana made their first contribution in #810
- @musab-mk made their first contribution in #818
- @apthagowda97 made their first contribution in #816
- @lessw2020 made their first contribution in #785
- @weifengpy made their first contribution in #843
- @musabgultekin made their first contribution in #857
- @xingyaoww made their first contribution in #890
- @vmoens made their first contribution in #902
- @andrewor14 made their first contribution in #884
- @kunal-mansukhani made their first contribution in #926
- @EvilFreelancer made their first contribution in #889
- @water-vapor made their first contribution in #950
- @Optimox made their first contribution in #995
- @tambulkar made their first contribution in #1011
- @christobill made their first contribution in #1004
- @j-dominguez9 made their first contribution in #1056
- @andyl98 made their first contribution in #1061
- @hmosousa made their first contribution in #1065
- @yasser-sulaiman made their first contribution in #1055
- @parthsarthi03 made their first contribution in #1081
- @mdeff made their first contribution in #1086
- @jeffrey-fong made their first contribution in #1096
- @jeromeku made their first contribution in #1089
- @man-shar made their first contribution in #1126
Full Changelog: v0.1.1...v0.2.0