This project demonstrates how to fine-tune a pre-trained language model named unsloth/Llama-3.2-3B-Instruct
using the Hugging Face transformers
, trl
, and datasets
libraries. The notebook walks through all steps, from installing dependencies to fine-tuning the model and performing text generation. Below, are detailed instructions and explanations of each step.
- Overview
- Features
- Prerequisites
- Setup
- Steps
- 1. Install Required Libraries
- 2. Import Dependencies
- 3. Load Pre-Trained Model
- 4. Apply Parameter-Efficient Fine-Tuning (PEFT)
- 5. Prepare Dataset and Tokenizer
- 6. Configure the Fine-Tuning Trainer
- 7. Train and Save the Model
- 8. Reload Fine-Tuned Model
- 9. Optimize Model for Inference
- 10. Generate Text
- Results
- Customization
- Contributing
- License
This notebook showcases the following:
- Fine-tuning a pre-trained language model for a specific dataset.
- Preparing datasets with conversational templates for instruction tuning.
- Using Parameter-Efficient Fine-Tuning (PEFT) to optimize resource usage.
- Saving, loading, and deploying a fine-tuned model for text generation.
- PEFT Fine-Tuning: Enables resource-efficient training by updating only a subset of model parameters.
- Custom Dataset Preparation: Prepares conversational data using ShareGPT templates.
- Text Generation: Demonstrates model inference with advanced sampling techniques like top-p sampling and temperature control.
- Modular Approach: Code is modular and easily customizable.
- Python 3.8 or later
- GPU with CUDA support (recommended for faster training and inference)
- Required Python libraries (see Setup)
-
Clone this repository or download the notebook.
-
Install the required Python libraries by running the following commands in your terminal:
pip install unsloth torch transformers datasets trl
-
Ensure your system has sufficient memory and GPU resources to handle the model.
This step installs the necessary Python libraries:
!pip install unsloth
!pip install torch
!pip install transformers
!pip install datasets
!pip install trl
Import essential modules for model handling, dataset preparation, and fine-tuning.
import torch
from unsloth import FastLanguageModel
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth.chat_templates import get_chat_template, standardize_sharegpt
Load the unsloth/Llama-3.2-3B-Instruct
model with 4-bit precision for efficient memory usage.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Llama-3.2-3B-Instruct",
max_seq_length=2048,
load_in_4bit=True
)
Enable PEFT by configuring specific model layers for fine-tuning.
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
]
)
Load a dataset, format it with chat templates, and tokenize it.
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")
dataset = load_dataset("mlabonne/FineTome-100k", split="train")
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(
lambda examples: {
"text": [
tokenizer.apply_chat_template(convo, tokenize=False)
for convo in examples["conversations"]
]
},
batched=True
)
Initialize the trainer with the model, dataset, and training arguments.
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=60,
learning_rate=2e-4,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
logging_steps=1,
output_dir="outputs"
)
)
Fine-tune the model and save it for later use.
trainer.train()
model.save_pretrained("finetuned_model")
Reload the fine-tuned model for inference.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="finetuned_model",
max_seq_length=2048,
load_in_4bit=True
)
Enable faster inference with model optimizations.
FastLanguageModel.for_inference(model)
Perform text generation using the fine-tuned model.
input_prompt = "Once upon a time"
inputs = tokenizer(input_prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048).to(device)
outputs = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_length=200,
num_return_sequences=1,
temperature=0.7,
top_p=0.9
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
After fine-tuning, the model generates high-quality text that aligns with the dataset's structure and style. The generated text can be used for various NLP tasks like chatbot development, content creation, and more.
You can customize:
- Model: Replace
unsloth/Llama-3.2-3B-Instruct
with another pre-trained model. - Dataset: Load a different dataset or preprocess it with your templates.
- Training Parameters: Adjust learning rate, batch size, and training steps for different results.
Contributions are welcome! Please submit issues or pull requests for improvements.
This project is licensed under the MIT License. See the LICENSE file for details.