Skip to content

Commit

Permalink
correct usage
Browse files Browse the repository at this point in the history
  • Loading branch information
tocean committed Oct 18, 2023
1 parent 625bcfb commit 71d0a94
Show file tree
Hide file tree
Showing 5 changed files with 7 additions and 23 deletions.
6 changes: 3 additions & 3 deletions docs/getting-started/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ Here're the system requirements for MS-AMP.
* Python version 3.7 or later (which can be checked by running `python3 --version`).
* Pip version 18.0 or later (which can be checked by running `python3 -m pip --version`).
* CUDA version 11 or later (which can be checked by running `nvcc --version`).
* PyTorch version 1.13 or later (which can be checked by running `python -c "import torch; print(torch.__version__)"`).
* PyTorch version 1.14 or later (which can be checked by running `python -c "import torch; print(torch.__version__)"`).

We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). For example, to start PyTorch 1.14 container, run the following command:
We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). For example, to start PyTorch 2.1 container, run the following command:

```bash
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:22.12-py3 bash
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:23.04-py3 bash
sudo docker exec -it msamp bash
```

Expand Down
4 changes: 3 additions & 1 deletion docs/getting-started/run-msamp.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ id: run-msamp
---

# Run examples
After installing MS-AMP, you can run several simple examples using MS-AMP.
After installing MS-AMP, you can run several simple examples using MS-AMP. Please note that before running these commands, you need to change work directory to [examples](https://github.com/Azure/MS-AMP/tree/main/examples).

## MNIST
### 1. Run mnist using single GPU
Expand Down Expand Up @@ -37,3 +37,5 @@ deepspeed cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config_msamp.js
```bash
deepspeed cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config_zero_msamp.json
```

For more comprehensive examples, please go to [MS-AMP-Examples](https://github.com/Azure/MS-AMP-Examples).
2 changes: 1 addition & 1 deletion docs/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ id: introduction

## Features

__MS-AMP__ is an automatic mixed precision package for deep learning developed by Microsoft:
__MS-AMP__ is an automatic mixed precision package for deep learning developed by Microsoft.

Features:

Expand Down
16 changes: 0 additions & 16 deletions docs/user-tutorial/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,6 @@ model, optimizer = msamp.initialize(model, optimizer, opt_level="O2")
...
```

For distributed training job, you need to add `optimizer.all_reduce_grads(model)` after backward to reduce gradients in process group.

Example:

```python
scaler = torch.cuda.amp.GradScaler()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = loss(output, target)
scaler.scale(loss).backward()
optimizer.all_reduce_grads(model)
scaler.step(optimizer)
```

For applying MS-AMP to DeepSpeed ZeRO, add a "msamp" section in deepspeed config file:

```json
Expand Down
2 changes: 0 additions & 2 deletions examples/mnist_ddp.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,6 @@ def train(args, model, device, train_loader, optimizer, epoch):
output = model(data)
loss = F.nll_loss(output, target)
scaler.scale(loss).backward()
if hasattr(optimizer, 'all_reduce_grads'):
optimizer.all_reduce_grads(model)
scaler.step(optimizer)
scaler.update()
if dist.get_rank() == 0:
Expand Down

0 comments on commit 71d0a94

Please sign in to comment.