GitHub - ServiceNow/Fast-LLM: Accelerating your LLM training to full speed

Accelerating your LLM training to full speed

Overview

Fast-LLM is a new open-source library for training large language models, built on PyTorch and Triton. It is extremely fast, scales to large clusters, supports a wide range of model architectures, and is easy to use. Unlike commercial frameworks like Megatron-LM, which are largely closed off and fragmented across forks, Fast-LLM is fully open-source and encourages community-driven development. Researchers can freely customize and optimize as needed, making it a flexible and hackable alternative that combines the speed of specialized tools with the openness of libraries like Hugging Face Transformers.

Note

Fast-LLM is not affiliated with Fast.AI, FastHTML, FastAPI, FastText, or other similarly named projects. Our library's name refers to its speed and efficiency in language model training.

Why Fast-LLM?

🚀 Fast-LLM is Blazingly Fast:
- ⚡️ Optimized kernel efficiency and reduced overheads.
- 🔋 Optimized memory usage for best performance.
- ⏳ Minimizes training time and cost.
📈 Fast-LLM is Highly Scalable:
- 📡 Distributed training across multiple GPUs and nodes using 3D parallelism (Data, Tensor, and Pipeline).
- 🔗 Supports sequence length parallelism to handle longer sequences effectively.
- 🧠 ZeRO-1, ZeRO-2, and ZeRO-3 implementations for improved memory efficiency.
- 🎛️ Mixed precision training support for better performance.
- 🏋️‍♂️ Large batch training and gradient accumulation support.
- 🔄 Reproducible training with deterministic behavior.
🎨 Fast-LLM is Incredibly Flexible:
- 🤖 Compatible with all common language model architectures in a unified class.
- ⚡ Efficient dropless Mixture-of-Experts (MoE) implementation with SoTA performance.
- 🧩 Customizable language model architectures, data loaders, loss functions, and optimizers (in progress).
- 🤗 Seamless integration with Hugging Face Transformers.
🎯 Fast-LLM is Super Easy to Use:
- 📦 Pre-built Docker images for quick deployment.
- 📝 Simple YAML configuration for hassle-free setup.
- 💻 Command-line interface for easy launches.
- 📊 Detailed logging and real-time monitoring features.
- 📚 Extensive documentation and practical tutorials (in progress).
🌐 Fast-LLM is Truly Open Source:
- ⚖️ Licensed under Apache 2.0 for maximum freedom to use Fast-LLM at work, in your projects, or for research.
- 💻 Fully developed on GitHub with a public roadmap and transparent issue tracking.
- 🤝 Contributions and collaboration are always welcome!

Usage

We'll walk you through how to use Fast-LLM to train a large language model on a cluster with multiple nodes and GPUs. We'll show an example setup using a Slurm cluster and a Kubernetes cluster.

For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file examples/mistral-4-node-benchmark.yaml is pre-configured for a multi-node setup with 4 DGX nodes, each with 8 A100-80GB or H100-80GB GPUs.

Note

Fast-LLM scales from a single GPU to large clusters. You can start small and expand based on your resources.

Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of 9,800 tokens/s/H100 (batch size 32, sequence length 8k) on a 4-node cluster with 32 H100s.

Running Fast-LLM on a Slurm Cluster

Prerequisites

A Slurm cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
CUDA 12.1 or higher.
Dependencies: PyTorch, Triton, and Apex installed on all nodes.

Steps

Deploy the nvcr.io/nvidia/pytorch:24.07-py3 Docker image to all nodes (recommended), because it contains all the necessary dependencies.

Install Fast-LLM on all nodes:

sbatch <<EOF
#!/bin/bash
#SBATCH --nodes=$(scontrol show node | grep -c NodeName)
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks=$(scontrol show node | grep -c NodeName)
#SBATCH --exclusive

srun bash -c 'pip install --no-cache-dir -e "git+https://github.com/ServiceNow/Fast-LLM.git#egg=llm[CORE,OPTIONAL,DEV]"'
EOF

Use the example Slurm job script examples/fast-llm.sbat to submit the job to the cluster:
```
sbatch examples/fast-llm.sbat
```
Monitor the job's progress:
- Logs: Follow job_output.log and job_error.log in your working directory for logs.
- Status: Use squeue -u $USER to see the job status.

Now, you can sit back and relax while Fast-LLM trains your model at full speed! ☕

Running Fast-LLM on a Kubernetes Cluster

Prerequisites

A Kubernetes cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
KubeFlow installed.
Locked memory limit set to unlimited at the host level on all nodes. Ask your cluster admin to do this if needed.

Steps

Create a Kubernetes PersistentVolumeClaim (PVC) named fast-llm-home that will be mounted to /home/fast-llm in the container using examples/fast-llm-pvc.yaml:
```
kubectl apply -f examples/fast-llm-pvc.yaml
```
Create a PyTorchJob resource using the example configuration file examples/fast-llm.pytorchjob.yaml:
```
kubectl apply -f examples/fast-llm.pytorchjob.yaml
```
Monitor the job status:
- Use kubectl get pytorchjobs to see the job status.
- Use kubectl logs -f fast-llm-master-0 -c pytorch to follow the logs.

That's it! You're now up and running with Fast-LLM on Kubernetes. 🚀

Next Steps

📖 Want to learn more? Check out our documentation for more information on how to use Fast-LLM.

🔨 We welcome contributions to Fast-LLM! Have a look at our contribution guidelines.

🐞 Something doesn't work? Open an issue!

License

Fast-LLM is licensed by ServiceNow, Inc. under the Apache 2.0 License. See LICENSE for more information.

Vulnerability Reporting

For security issues, email [email protected]. See our security policy.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
Megatron-LM @ f6b9b4b		Megatron-LM @ f6b9b4b
docs		docs
examples		examples
fast_llm		fast_llm
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yaml		mkdocs.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Why Fast-LLM?

Usage

Running Fast-LLM on a Slurm Cluster

Prerequisites

Steps

Running Fast-LLM on a Kubernetes Cluster

Prerequisites

Steps

Next Steps

License

Vulnerability Reporting

About

Releases

Packages

Contributors 5

Languages

License

ServiceNow/Fast-LLM

Folders and files

Latest commit

History

Repository files navigation

Overview

Why Fast-LLM?

Usage

Running Fast-LLM on a Slurm Cluster

Prerequisites

Steps

Running Fast-LLM on a Kubernetes Cluster

Prerequisites

Steps

Next Steps

License

Vulnerability Reporting

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages