The repo is a demonstration of fine tuning an open-source LLM (Llama-3-8B) utilizing different approaches and techniques. Fine-Tuning was done with ORPO technique that combines SFT and RLHF methods for preference alignment. The work explores fine tuning on multi-GPU environment utilizing distributed training methods like DeepSpeed and FSDP using the accelerate
library provided by HuggingFace.
- LLM -
Meta-Llama-3-8B
- Dataset (HF) -
mlabonne/orpo-dpo-mix-40k
- Fine-Tuning Method -
ORPO
- Accelerator Technique -
DeepSpeed ZeRO-3
- Trainer API -
HuggingFace
- Run-time environment -
multi-GPU (2x NVIDIA RTX 6000)
- Create conda environment using
llama.yml
conda env create -f llama.yml
- Run
llm_llama3_fine_tuning_orpo.ipynb
- Put the token issued by Hugging Face into the
HF_TOKEN
variable. - Before you start
notebook_launcher(main, num_processes=2)
cell, the result oftorch.cuda.is_initialized()
must beFalse
. If the result is True, The cell returns Error.
Thanks for the work shared by Maxime Labonn in his blog here.