[Paper]
Large language models (LLMs) face critical memory challenges during on-device fine-tuning. While LoRA reduces trainable parameters, buffered activation memory remains a bottleneck. HyC-LoRA addresses this with:
✨ Key Innovations:
- Buffered Activation Mechansim Analysis: System's view of buffered activation problem in LoRA fine-tuning
- Two-tier Optimization:
- Intra-operator: Mitigates quantization errors via outlier detection
- Inter-operator: Error compensation through LoRA adapter fusion
- System Integration: Full-stack implementation achieving 3.97× memory reduction with minimal accuracy loss
- CUDA 12.0+
- Python 3.10+
- PyTorch 2.0+
# Clone repository
git clone https://github.com/HyC-LoRA-release
cd HyC-LoRA-release
# Create conda environment (recommended)
conda create -n hyclora python=3.10
conda activate hyclora
# Install dependencies
pip install -r requirements.txt
mkdir <your model dir> # Then download the huggingface models into the dir.
# P.S. It is recommended to use the local model dir to avoid network problems :-(
GSM8K (llama/mistral):
$ bash run_gsm8k.sh
Wikitext-2 (llama/mistral):
$ bash run_wikitext2.sh
Arithmetic Datasets (llama/mistral):
$ bash download_datasets.sh # prepare the datasets
$ bash run_multitask.sh
GLUE (RoBERTa):
$ bash run_glue.sh
Long Sequence (llama):
download the data from:
put them to ./dataset
, then run:
$ bash run_longseq.sh
The core parameters can be adjusted in these script:
# model config
model_name=llama-2-7b-hf
model_dir=<your model dir>
# hyclora hyperparameter config
use_hyclora=True
layer_type=intra_inter
iteration_threshold=5
softmax_outlier_ratio=0.05
layernorm_outlier_ratio=0.005
q_bit=4
Parameter | Description | Options | Default |
---|---|---|---|
use_hyclora |
Enable HyC-LoRA type training (forward+backward) code | [True, False] | True |
layer_type |
Compression strategy | [baseline, intra, intra_inter, intra_inter_full_fuse] | intra_inter |
q_bit |
Quantization bits for activations | [2, 4,8] | 4 |
softmax_outlier_ratio |
Outlier threshold for attention maps | 0.0-1.0 | 0.05 |
layernorm_outlier_ratio |
Outlier threshold for layernorm/rmsnorm layers | 0.0-1.0 | 0.005 |
explaination of
layer_type
:
baseline
: No compressionintra
: Intra-operator optimizationintra_inter
: Intra-operator optimization + Inter-operator optimizationintra_inter_full_fuse
: Equivalent tointra_inter
at the algorithmic level, but with kernel fusion for high EMA (external memory access) operations such as LoRA, activation function, Hadamard product, quantization etc. during forward and backward passes
Our code is built upon the following projects:
We thank the authors for their open-sourced code.
@article{hyclora2024,
title={HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression},
author={Anonymous Authors},
journal={Preprint},
year={2024},
url={xxxxxxxx}
}