-
Notifications
You must be signed in to change notification settings - Fork 641
Horovod Installation
afiaka87 edited this page Apr 13, 2021
·
8 revisions
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz
tar -xf openmpi-4.1.0.tar.gz
cd openmpi-4.1.0
gunzip -c openmpi-4.1.0.tar.gz | tar xf -
cd openmpi-4.1.0
./configure --prefix=/usr/local
# <...lots of output...>
make all install
If installation went well - you should be able to install horovod now:
pip install horovod
- Run a machine with 4 GPUS
$ horovodrun -np 4 python train_dalle.py --image_text_folder=/path/to/your/dataset --distributed_backend horovod
- Run on 4 machines with 4 GPUs each:
$ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train_dalle.py --image_text_folder=/path/to/your/dataset --distributed_backend horovod
- Horovod autotuning:
$ mpirun -x HOROVOD_AUTOTUNE=1 -x HOROVOD_AUTOTUNE_LOG=/tmp/autotune_log.csv ... train_dalle.py --image_text_folder=/path/to/your/dataset --distributed_backend horovod
If you are inside of a docker container - make sure to check if you have a docker0
LAN interface. If you do, you will need to follow specific instructions to ensure that this interface is ignored. See https://horovod.readthedocs.io/en/stable/mpi.html for further details.