Create a new conda environment:
conda env create -f environment.yml
Activate the environment:
conda activate hf-bench-env
Please include an .env
file in the root directory with the following variables. The models and datasets are downloaded to the HF_HOME
directory, unless they are already stored there.
HF_ACCESS_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
HF_HOME=/path/to/hf_home
To update the environment after changing the environment.yml
file:
conda env update -f environment.yml --prune
[!NOTE] To run the benchmark, you must login to Weights & Biases and often also Hugging Face.
Below are commands for running the benchmark on
-
Machine with GPUs
python -m hf_bench.benchmark --num_of_examples=1
-
Cluster
./hf_bench/submit/lsf.sh --num_of_examples=1
After running the above sanity check with one example and the default
experiment config, you can run the benchmark with 30 examples and a custom experiment config:
python -m hf_bench.benchmark --experiment_config deepseek-r1-qwen-32b
or
./hf_bench/submit/lsf.sh --experiment_config deepseek-r1-qwen-32b
To adjust the hardware request to the cluster, edit the submit script.
After loading the models, you can monitor the progress of the benchmark here: https://wandb.ai/generating-faster/hf-bench.
The results are stored in the results
branch:
To add new results, add the results CSV to the benchmark_results
directory. GitHub Actions will automatically update results_all.csv
, results_summary.csv
, and results_max_speedup.csv
files.
If you use our algorithms (or the code in this repo), please cite our paper (https://arxiv.org/abs/2502.05202):
@article{timor2025accelerating,
title={Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies},
author={Timor, Nadav and Mamou, Jonathan and Korat, Daniel and Berchansky, Moshe and Pereg, Oren and Jain, Gaurav and Schwartz, Roy and Wasserblat, Moshe and Harel, David},
journal={arXiv preprint arXiv:2502.05202},
year={2025}
}