Skip to content

Commit

Permalink
Add pytorch/training/gpu/2.3.1/transformers/4.48.0/py311/Dockerfile (
Browse files Browse the repository at this point in the history
…#134)

* Add latest PyTorch DLC with bumped dependencies

* Fix `Dockerfile` due to extra `&&`

* Lower `flash-attn` dependency version

* Add `uv` to install `pip` dependencies faster

This commit also contains some formatting improvements to better debug
the `Dockerfile` such as indentation when a command is divided in
multiple lines to know that it refers to the unindented command above;
also set bash as the default shell, and fix `gcloud` CLI installation

* Bump `transformers` to 4.48.0 and fix `Dockerfile` formatting

Bump the `transformers` dependency to 4.48.0 to support the ModernBERT
architecture, as well as bumping `diffusers` including new video and
image generation pipelines, as well as a bunch of other features,
improvements and bug fixes. Additionally, the `Dockerfile` formatting
has been fixed.

* Update `containers/pytorch/training/README.md`

* Fix `containers/pytorch/training/README.md`

* Set `transformers` version to 4.47.1 instead

* Remove `--upgrade` flag from `torch` and `transformers` install

* Bump `torch` to 2.3.1 and move `Dockerfile`

* Remove `uv` from `Dockerfile`

* Upgrade `transformers` to 4.48.0

* Remove strict version pinning on `protobuf`
  • Loading branch information
alvarobartt authored Jan 14, 2025
1 parent 847a657 commit e570d07
Show file tree
Hide file tree
Showing 2 changed files with 103 additions and 2 deletions.
4 changes: 2 additions & 2 deletions containers/pytorch/training/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The PyTorch Training containers will start a training job that will start on `do
docker run --gpus all -ti \
-v $(pwd)/artifact:/artifact \
-e HF_TOKEN=$(cat ~/.cache/huggingface/token) \
us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-42.ubuntu2204.py310 \
us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-47.ubuntu2204.py311 \
trl sft \
--model_name_or_path google/gemma-2b \
--attn_implementation "flash_attention_2" \
Expand Down Expand Up @@ -76,7 +76,7 @@ The PyTorch Training containers come with two different containers depending on
- **GPU**: To build the PyTorch Training container for GPU, an instance with at least one NVIDIA GPU available is required to install `flash-attn` (used to speed up the attention layers during training and inference).

```bash
docker build -t us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-42.ubuntu2204.py310 -f containers/pytorch/training/gpu/2.3.0/transformers/4.42.3/py310/Dockerfile .
docker build -t us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-47.ubuntu2204.py311 -f containers/pytorch/training/gpu/2.3.0/transformers/4.47.1/py311/Dockerfile .
```

- **TPU**: You can build PyTorch Training container for Google Cloud TPUs on any machine with docker build, you do not need to build it on a TPU VM
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
FROM nvidia/cuda:12.1.1-devel-ubuntu22.04
SHELL ["/bin/bash", "-c"]

LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive

# Versions
ARG CUDA="cu121"
ARG PYTORCH="2.3.1"
ARG FLASH_ATTN="2.6.3"
ARG TRANSFORMERS="4.48.0"
ARG HUGGINGFACE_HUB="0.27.0"
ARG DIFFUSERS="0.32.1"
ARG PEFT="0.14.0"
ARG TRL="0.13.0"
ARG BITSANDBYTES="0.45.0"
ARG DATASETS="3.2.0"
ARG ACCELERATE="1.2.1"
ARG EVALUATE="0.4.3"
ARG SENTENCE_TRANSFORMERS="3.3.1"
ARG DEEPSPEED="0.16.1"
ARG MAX_JOBS=4

RUN apt-get update -y && \
apt-get install software-properties-common -y && \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get -y upgrade --only-upgrade systemd openssl cryptsetup && \
apt-get install -y \
build-essential \
bzip2 \
curl \
git \
git-lfs \
tar \
gcc \
g++ \
cmake \
gnupg \
libprotobuf-dev \
libaio-dev \
protobuf-compiler \
python3.11 \
python3.11-dev \
libsndfile1-dev \
ffmpeg && \
apt-get clean autoremove --yes && \
rm -rf /var/lib/apt/lists/*

# Set Python 3.11 as the default python version
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 && \
ln -sf /usr/bin/python3.11 /usr/bin/python

# Install pip from source and upgrade it
RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
python get-pip.py && \
rm get-pip.py && \
pip install --upgrade pip

# Install latest release PyTorch (PyTorch must be installed before any DeepSpeed C++/CUDA ops.)
RUN pip install --no-cache-dir --index-url https://download.pytorch.org/whl/${CUDA} "torch==${PYTORCH}" torchvision torchaudio

# Install and upgrade Flash Attention 2
RUN pip install --no-cache-dir packaging ninja
RUN MAX_JOBS=${MAX_JOBS} pip install --no-build-isolation flash-attn==${FLASH_ATTN}

# Install Hugging Face Libraries
RUN pip install --no-cache-dir \
"transformers[sklearn,sentencepiece,vision]==${TRANSFORMERS}" \
"huggingface_hub[hf_transfer]==${HUGGINGFACE_HUB}" \
"diffusers==${DIFFUSERS}" \
"datasets==${DATASETS}" \
"accelerate==${ACCELERATE}" \
"evaluate==${EVALUATE}" \
"peft==${PEFT}" \
"trl==${TRL}" \
"sentence-transformers==${SENTENCE_TRANSFORMERS}" \
"deepspeed==${DEEPSPEED}" \
"bitsandbytes==${BITSANDBYTES}" \
tensorboard \
jupyter notebook

ENV HF_HUB_ENABLE_HF_TRANSFER="1"

# Install Google Cloud Dependencies
RUN pip install --upgrade --no-cache-dir \
google-cloud-storage \
google-cloud-bigquery \
google-cloud-aiplatform \
google-cloud-pubsub \
google-cloud-logging

# Install Google CLI single command
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" \
| tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg \
| apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && \
touch /var/lib/dpkg/status && \
apt-get update -y && \
apt-get install google-cloud-sdk -y && \
apt-get clean autoremove --yes && \
rm -rf /var/lib/{apt,dpkg,cache,log}

0 comments on commit e570d07

Please sign in to comment.