Skip to content

Commit

Permalink
[Bug Fixed] Install nccl to system path (#57)
Browse files Browse the repository at this point in the history
**Description**
Closes #53 .
If we use "make install" to install nccl, it will install nccl to
/usr/local/lib. And if we want pytorch or extension link to the nccl we
build, we need to change LD_LIBRARY_PATH and persist it. Another
solution is we just install nccl to the system path and suggest user to
use docker.

**Major Revision**
- Install nccl to system path.
- Recommend to use PyTorch Container.

---------

Co-authored-by: JackieWu <[email protected]>
  • Loading branch information
tocean and wkcn authored Apr 21, 2023
1 parent 14fe8e2 commit ac659d5
Showing 1 changed file with 13 additions and 9 deletions.
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ Features:
- CUDA version 11 or later (which can be checked by running `nvcc --version`).
- PyTorch version 1.13 or later (which can be checked by running `python -c "import torch; print(torch.__version__)"`).

We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). For example, to start PyTorch 1.13 container, run the following command:

```
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:22.09-py3 bash
sudo docker exec -it msamp bash
```

### Install MS-AMP

You can clone the source from GitHub.
Expand All @@ -44,13 +51,18 @@ make -j src.build NVCC_GENCODE="-gencode=arch=compute_80,code=sm_80"
# H100
make -j src.build NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90"

sudo make install
apt-get update
apt install build-essential devscripts debhelper fakeroot
make pkg.debian.build
dpkg -i build/pkg/deb/libnccl2_*.deb

cd -
```

Then, you can install MS-AMP from source.

```
python3 -m pip install --upgrade pip
python3 -m pip install .
make postinstall
```
Expand All @@ -61,14 +73,6 @@ After that, you can verify the installation by running:
python3 -c "import msamp; print(msamp.__version__)"
```

### Run unit tests

You can execute the following command to run unit tests.

```
python3 setup.py test
```

### Usage

Enabling MS-AMP is very simple when traning model on single GPU, you only need to add one line of code `msamp.initialize(model, optimizer, opt_level)` after defining model and optimizer.
Expand Down

0 comments on commit ac659d5

Please sign in to comment.