Deepspeed ZeRO Infinity

DeepSpeed "ZeRO Infinity"

Offload parameters and optimizer to CPU and/or NVM-E drive

warning: This stuff is experimental. If you have issues let us know in the Issues section. so we can help you fix it or figure it out.

Also - many of these options will not work well or at all on anything other than deepspeed "stage 3". DeepSpeed is sort of a tough install - and stage 3 is often unsupported on GPUs other than the V100 and A100. There are cards which are similar enough in architecture - the RTX2000 and RTX3000 series of cards, that could work, but currently have a tough time with it.

Dependencies:

llvm-9-dev
cmake
gcc
python3.8.x
deepspeed
libaio-dev
cudatoolkit=10.2 or 11.1 # Doesn't work on 11.2 unfortunately.
pytorch=1.8.*

Debian

apt install -y libaio-dev gcc cmake llvm-9-dev
python -V # Check your version

# For CUDA 11.1 - change if you have a different version. CUDA 11.2 not supported.
pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install deepspeed
pip3 install dalle-pytorch

Pop!_OS 20.04 (see notes about 20.10)

At the time of this writing, 20.04 still uses system76-cuda-10.2 and system76-cuddn-10.2 in their "latest" release. On 20.10 system76-cuda-latest will give you cuda-toolkit-11.2. As such if you're on Pop!_OS version 20.10 (not 20.04), then you should be sure to install system76-cuda-11.1 and system76-cudnn-11.1 instead.

sudo apt install system76-cuda-latest
sudo apt install system76-cudnn-latest
sudo update-alternatives --config cuda
# Choose the most recent version of cuda-toolkit-you see here. 

# After you're done - to switch back to your original cuda-toolkit version, just run:
sudo update-alternatives --config cuda

Stage 3 Barebones configuration template

In your train_dalle.py there is a dictionary "deepspeed_config" which you need to change. There are far more parameters to tinker with. You can find those at the DeepSpeed ZeRO json config documentation

deepspeed_config = {
    "zero_optimization": {
        "stage": 3,
        # Offload the model parameters If you have an nvme drive - you should use the nvme option.
        # Otherwise, use 'cpu' and remove the `nvme_path` line
        "offload_param": {
            "device": "nvme",
            "nvme_path": "/path/to/nvme/folder",
        },
        # Offload the optimizer of choice. If you have an nvme drive - you should use the nvme option.
        # Otherwise, use 'cpu' and remove the `nvme_path` line
        "offload_optimizer": {
            "device": "nvme", # options are 'none', 'cpu', 'nvme'
            "nvme_path": "/path/to/nvme/folder",
        },
    },
    # Override pytorch's Adam optim with `FusedAdam` (just called Adam here). Can 
    "optimizer": {
        "type": "Adam",  # You can also use AdamW here
        "params": {
            "lr": LEARNING_RATE,
        },
    },
    'train_batch_size': BATCH_SIZE,
    'gradient_clipping': GRAD_CLIP_NORM,
    'fp16': {
        'enabled': args.fp16,
    },
}

lord krishna with arjun

Provide feedback

Saved searches