-
Notifications
You must be signed in to change notification settings - Fork 10
Home
This project focuses on distillation of one or multiple teacher language models into a single student model. The goal is to collect general knowledge of various teacher models and encapsulate it into a more compact and efficient student model.
Before you start, ensure you have the following installed:
-
Python 3.10+ (pipeline is being developed on 3.10.11):
https://www.python.org/downloads/
Clone the repository:
git clone https://github.com/golololologol/LLM-Distillery
cd LLM-Distillery
Inside the folder, run open_venv.bat
for Windows
For Linux:
chmod +x open_venv.sh
./open_venv.sh
This will create a virtual environment, and will keep the activated venv open for you to manually install the following packages into it:
-
Exllamav2 0.0.19+ (0.2.7 is recommended):
Choose the correct version for your particular setup of CUDA Toolkit, PyTorch, Python, and OS
https://github.com/turboderp/exllamav2/releases -
bitsandbytes 0.41.3+ (0.45.1 is recommended):
For Linux, you can skip this step, and get to the next one straight away.
For Windows, manually installbitsandbytes 0.45.1
+ from here using the following command:
pip install --no-deps https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-0.45.1-py3-none-win_amd64.whl
- Flash Attention 2 2.4.2+ (2.5.2 is recommended):
If you have python not 3.10.x, then lurk through here to find a fitting wheel.
pip install https://github.com/bdashore3/flash-attention/releases/download/v2.5.2/flash_attn-2.5.2+cu122torch2.2.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
Now finally, install all other necessary packages into the venv:
pip install -r requirements.txt
Check out this page to prepare everything for your first distillation run: Preparations for the first start
Contributions are welcome! Feel free to open issues or submit pull requests.
This project is licensed under Apache License 2.0. See the LICENSE file for more details.