Low-Dimensional Adam

This repository contains a reference torch implementation of the LDAdam optimizer as proposed in the paper: LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.

Abstract: We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy keeps the optimizer's memory footprint to a fraction of the model size. LDAdam relies on a new projection-aware update rule for the optimizer states that allows for transitioning between subspaces, i.e., estimation of the statistics of the projected gradients. To mitigate the errors due to low-rank projection, LDAdam integrates a new generalized error feedback mechanism, which explicitly accounts for both gradient and optimizer state compression. We prove the convergence of LDAdam under standard assumptions, and show that LDAdam allows for accurate and efficient fine-tuning and pre-training of language models.

LDAdam Optimizer Usage

To integrate the optimizer into your own pipeline, please use the following snippet:

from LowDimensionalAdam import LDAdamW

# define param groups as fullrank_params and lowrank_params
optimizer = LDAdamW(
    params=[{'params': fullrank_params, 'enable_lowrank': False},
    {'params': lowrank_params, 'enable_lowrank': True, 'rank':16, 'rho':0.908}],
    lr=0.001,
    betas=(0.908,0.99),
    eps=1e-8,
    weight_decay=0.0,
)

# you can then use the variable `optimizer` as any other PyTorch optimizer

Installation

To install LDAdam latest stable version from source, please run:

pip3 install git+https://github.com/IST-DASLab/LDAdam.git

To clone the project and install it as a Python package in a new conda environment named LDAdam, please run:

git clone https://github.com/IST-DASLab/LDAdam.git
cd LDAdam
source install.sh
conda activate LDAdam

Reproduce Experiments

Fine-tuning BERT Model for GLUE Benchmark

To conduct experiments on fine-tuning RoBERTa-base model on the GLUE benchmark, we rely on the Huggingface Transformers project. The additional dependencies required can be found in the file glue_requirements.txt. To install them, please run:

pip3 install -r experiments/glue_finetuning/glue_requirements.txt

For reproductibility purposes, we provide the scripts we used to run our experiments.

Pre-training Llama Model on the C4 dataset

To conduct our experiments on Llama pre-training on the C4 dataset, we follow the training procedure provided by the ReLora project and adapted for the GaLore project. The additional dependencies required can be found in the file c4_requirements.txt. To install them, please run:

pip3 install -r experiments/c4_pretraining/c4_requirements.txt

For reproductibility purposes, we provide the scripts we used to run our experiments.

Finetuning Llama Model on the GSM8K dataset

To run our experiments on fine-tuning the Llama2 7B model on the GSM8K dataset, we use the training pipeline provided by the MicroAdam project, based on MosaicML's LLM foundry framework.

Citation

@misc{robert2024ldadamadaptiveoptimizationlowdimensional,
      title={LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics}, 
      author={Thomas Robert and Mher Safaryan and Ionut-Vlad Modoranu and Dan Alistarh},
      year={2024},
      eprint={2410.16103},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.16103}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
experiments		experiments
ldadamw_torch		ldadamw_torch
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Low-Dimensional Adam

LDAdam Optimizer Usage

Installation

Reproduce Experiments

Fine-tuning BERT Model for GLUE Benchmark

Pre-training Llama Model on the C4 dataset

Finetuning Llama Model on the GSM8K dataset

Citation

About

Releases

Packages

Languages

License

IST-DASLab/LDAdam

Folders and files

Latest commit

History

Repository files navigation

Low-Dimensional Adam

LDAdam Optimizer Usage

Installation

Reproduce Experiments

Fine-tuning BERT Model for GLUE Benchmark

Pre-training Llama Model on the C4 dataset

Finetuning Llama Model on the GSM8K dataset

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages