diff --git a/README.md b/README.md index 0b13e385..b1f8668e 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,19 @@ Here are the system settings we recommend to start training your own diffusion m - Ubuntu Version: 20.04 - Use a system with NVIDIA GPUs +- For running on NVIDIA H100s, use a docker image with PyTorch 1.13+ e.g. [MosaicML's PyTorch base image](https://hub.docker.com/r/mosaicml/pytorch/tags) + - Recommended tag: `mosaicml/pytorch_vision:2.0.1_cu118-python3.10-ubuntu20.04` + - This image comes pre-configured with the following dependencies: + - PyTorch Version: 2.0.1 + - CUDA Version: 11.8 + - Python Version: 3.10 + - Ubuntu Version: 20.04 + - Depending on the training config, an additional install of `xformers` may be needed: + ``` + pip install -U ninja + pip install -U git+https://github.com/facebookresearch/xformers + ``` + # How many GPUs do I need? We benchmarked the U-Net training throughput as we scale the number of A100 GPUs from 8 to 128. Our time estimates are based on training Stable Diffusion 2.0 base on 1,126,400,000 images at 256x256 resolution and 1,740,800,000 images at 512x512 resolution. Our cost estimates are based on $2 / A100-hour. Since the time and cost estimates are for the U-Net only, these only hold if the VAE and CLIP latents are computed before training. It took 3,784 A100-hours (cost of $7,600) to pre-compute the VAE and CLIP latents offline. If you are computing VAE and CLIP latents while training, expect a 1.4x increase in time and cost.