Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve document #126

Merged
merged 3 commits into from
Nov 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 28 additions & 4 deletions docs/getting-started/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,47 @@ Here're the system requirements for MS-AMP.
* CUDA version 11 or later (which can be checked by running `nvcc --version`).
* PyTorch version 1.14 or later (which can be checked by running `python -c "import torch; print(torch.__version__)"`).

We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). For example, to start PyTorch 2.1 container, run the following command:
You can try MS-AMP in two ways: Using Docker or installing from source:

* Using Docker is a convenient way to get started with MS-AMP. You can use the pre-built Docker image to quickly set up an environment for running MS-AMP.
* On the other hand, installing from source gives you more control over the installation process and allows you to customize the installation to your needs.

## Use Docker

You can try the latest MS-AMP Docker container with the following commands:

```bash
sudo docker run -it -d --name=msampcu121 --privileged --net=host --ipc=host --gpus=all -v /:/hostroot ghcr.io/azure/msamp:main-cuda12.1 bash
sudo docker exec -it msampcu121 bash
```

MS-AMP is pre-installed in Docker container and you can verify it by running:

```bash
python -c 'import msamp;print(msamp.__version__)'
```

We also provide stable Docker images [here](../user-tutorial/container-images.mdx).

## Install from source

We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) to avoid messing up local environment.
For example, to start PyTorch 2.1 container, run the following command:

```bash
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:23.04-py3 bash
sudo docker exec -it msamp bash
```

## Install MS-AMP
You can clone the source from GitHub.
Then, you can clone the source from GitHub.

```bash
git clone https://github.com/Azure/MS-AMP.git
cd MS-AMP
git submodule update --init --recursive
```

If you want to train model with multiple GPU, you need to install MSCCL to support FP8.
If you want to train model with multiple GPU, you need to install MSCCL to support FP8. Please note that the compilation of MSCCL may take ~40 minutes on A100 nodes and ~7 minutes on H100 node.

```bash
cd third_party/msccl
Expand Down
4 changes: 2 additions & 2 deletions docs/getting-started/run-msamp.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ After installing MS-AMP, you can run several simple examples using MS-AMP. Pleas
python mnist.py --enable-msamp --opt-level=O2
```

### 2. Run mnist using multi GPUS in single node
### 2. Run mnist using multi GPUs in single node

```bash
torchrun --nproc_per_node=$GPUS mnist_ddp.py --enable-msamp --opt-level=O2
torchrun --nproc_per_node=8 mnist_ddp.py --enable-msamp --opt-level=O2
```

## CIFAR10
Expand Down
Loading