-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Import the training dir from the ai-lab-recipes repository
This commit imports a specific commit ref (current `main`) of the `training` directory of the github.com/containers/ai-lab-recipes repository. Updating the contents to a newer commit ref can be done by updating the ref in the `Makefile` and then running `make update-training-dir`. Note there is current a discrepency between the `README.md` and this content. The `README.md` needs updates to reflect that all operations are to be done from within the `training` directory. Signed-off-by: Russell Bryant <[email protected]>
- Loading branch information
Showing
34 changed files
with
1,468 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,3 +13,4 @@ ignores: | |
- "**/node_modules/**" | ||
- ".tox/**" | ||
- "venv/**" | ||
- "training/**" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
default: help | ||
|
||
help: | ||
@echo "To build a bootable container image you first need to create instructlab container images for a particular vendor " | ||
@echo | ||
@echo " - make instruct-amd" | ||
@echo " - make instruct-intel" | ||
@echo " - make instruct-nvidia" | ||
@echo " - make instruct-vllm" | ||
@echo | ||
@echo "Once instruct images created, create advanced training containers, deepspeed only for nvidia" | ||
@echo | ||
@echo " - make deepspeed" | ||
@echo " - make vllm" | ||
@echo | ||
@echo "Once instruct images are created, create bootc container images" | ||
@echo | ||
@echo " - make amd" | ||
@echo " - make intel" | ||
@echo " - make nvidia" | ||
@echo | ||
@echo "If these images are going to be used on a cloud, you might want to add cloud-init." | ||
@echo | ||
@echo " - make cloud-amd" | ||
@echo " - make cloud-intel" | ||
@echo " - make cloud-nvidia" | ||
@echo " - make cloud-vllm" | ||
@echo | ||
@echo "Make prune. This command will remove all buildah containers if left behind from podman build and then prune all unused container images. Useful if you are running out of space." | ||
@echo | ||
@echo " - make prune" | ||
@echo | ||
@echo "To create a disk image" | ||
@echo | ||
@echo " - make disk-amd" | ||
@echo " - make disk-intel" | ||
@echo " - make disk-nvidia" | ||
|
||
# | ||
# Create instructlab AI container images | ||
# | ||
.PHONY: | ||
instruct-amd: | ||
make -C instructlab amd | ||
|
||
.PHONY: | ||
instruct-nvidia: | ||
make -C instructlab nvidia | ||
|
||
.PHONY: | ||
instruct: instruct-amd instruct-nvidia | ||
|
||
.PHONY: deepspeed | ||
deepspeed: | ||
make -C deepspeed/ image | ||
|
||
.PHONY: vllm | ||
vllm: | ||
make -C vllm/ image | ||
|
||
# | ||
# Create bootc container images prepared for AI | ||
# | ||
.PHONY: amd nvidia intel vllm | ||
amd: | ||
make -C amd-bootc/ bootc | ||
intel: | ||
make -C intel-bootc/ bootc | ||
nvidia: | ||
make -C nvidia-bootc/ dtk bootc | ||
|
||
# | ||
# Make Bootc container images preinstalled with cloud-init | ||
# | ||
.PHONY: | ||
cloud-amd: | ||
make VENDOR=amd -C cloud | ||
|
||
.PHONY: | ||
cloud-intel: | ||
make VENDOR=intel -C cloud | ||
|
||
.PHONY: | ||
cloud-nvidia: | ||
make VENDOR=nvidia -C cloud | ||
|
||
.PHONY: | ||
cloud: cloud-amd cloud-intel cloud | ||
|
||
# | ||
# We often see users running out of space. These commands are useful for freeing wasted space. | ||
# Note becarful to not run this target if a podman build is in progress. | ||
# | ||
.PHONY: prune | ||
prune: | ||
buildah rm --all | ||
podman image prune -f | ||
|
||
# Create disk images with bootc-image-builder | ||
# | ||
.PHONY: disk-amd | ||
disk-amd: | ||
make -C amd-bootc/ bootc-image-builder | ||
.PHONY: disk-intel | ||
disk-intel: | ||
make -C intel-bootc/ bootc-image-builder | ||
.PHONY: disk-nvidia | ||
disk-nvidia: | ||
make -C nvidia-bootc/ bootc-image-builder | ||
|
||
.PHONY: clean | ||
clean: | ||
rm -rf build |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
Linux Operating System Bootable containers enabled for AI Training | ||
=== | ||
|
||
In order to run accelerated AI workloads, we've prepared [bootc](https://github.com/containers/bootc) container images for the major AI platforms. | ||
|
||
# Makefile targets | ||
|
||
| Target | Description | | ||
|-----------------|---------------------------------------------------------------------| | ||
| amd | Create bootable container for AMD platform | | ||
| deepspeed | DeepSpeed container for optimization deep learning | | ||
| cloud-amd | Add cloud-init to bootable container for AMD platform | | ||
| cloud-intel | Add cloud-init to bootable container for Intel platform | | ||
| cloud-nvidia | Add cloud-init to bootable container for Nvidia platform | | ||
| disk-amd | Create disk image from bootable container for AMD platform | | ||
| disk-intel | Create disk image from bootable container for Intel platform | | ||
| disk-nvidia | Create disk image from bootable container for Nvidia platform | | ||
| instruct-amd | Create instruct lab image for bootable container for AMD platform | | ||
| instruct-intel | Create instruct lab image for bootable container for Intel platform | | ||
| instruct-nvidia | Create instruct lab image for bootable container for Nvidia platform| | ||
| intel | Create bootable container for Intel Habanalabs platform | | ||
| nvidia | Create bootable container for NVidia platform | | ||
| vllm | Containerized inference/serving engine for LLMs | | ||
|
||
# Makefile variables | ||
|
||
| Variable | Description | Default | | ||
|---------------------------|-------------------------------------------------|---------------------------------------------| | ||
| FROM | Overrides the base image for the Containerfiles | `quay.io/centos-bootc/centos-bootc:stream9` | | ||
| REGISTRY | Container Registry for storing container images | `quay.io` | | ||
| REGISTRY_ORG | Container Registry organization | `ai-lab` | | ||
| IMAGE_NAME | Container image name | platform (i.e. `amd`) | | ||
| IMAGE_TAG | Container image tag | `latest` | | ||
| CONTAINER_TOOL | Container tool used for build | `podman` | | ||
| CONTAINER_TOOL_EXTRA_ARGS | Container tool extra arguments | ` ` | | ||
|
||
|
||
Note: AI content is huge and requires a lot of disk space >200GB free to build. | ||
|
||
# How to build InstructLab containers | ||
|
||
In order to do AI Training you need to build instructlab container images. | ||
|
||
Simply execute `make instruct-<platform>`. For example: | ||
|
||
* make instruct-amd | ||
* make instruct-intel | ||
* make instruct-nvidia | ||
|
||
Once you have these container images built it is time to build vllm. | ||
|
||
# How to build the vllm inference engine | ||
|
||
* make vllm | ||
|
||
# On nvidia systems, you need to build the deepspeed container | ||
|
||
* make deepspeed | ||
|
||
# How to build bootc container images | ||
|
||
In order to build the images (by default based on CentOS Stream), a simple `make <platform>` should be enough. For example to build the `nvidia`, `amd` and `intel` bootc containers, respectively: | ||
|
||
``` | ||
make nvidia | ||
make amd | ||
make intel | ||
``` | ||
|
||
## How to build bootc container images based on Red Hat Enterprise Linux | ||
|
||
In order to build the training images based on Red Hat Enterprise Linux bootc images, the appropriate base container image must be used in the `FROM` field and the build process must be run on an *entitled Red Hat 9.x Enterprise Linux* with a valid subscription. | ||
|
||
For example: | ||
|
||
``` | ||
make nvidia FROM=registry.redhat.io/rhel9/rhel-bootc:9.4 | ||
make amd FROM=registry.redhat.io/rhel9/rhel-bootc:9.4 | ||
make intel FROM=registry.redhat.io/rhel9/rhel-bootc:9.4 | ||
``` | ||
|
||
Of course, the other Makefile variables are still available, so the following is a valid build command: | ||
|
||
``` | ||
make nvidia REGISTRY=myregistry.com REGISTRY_ORG=ai-training IMAGE_NAME=nvidia IMAGE_TAG=v1 FROM=registry.redhat.io/rhel9/rhel-bootc:9.4 | ||
``` | ||
|
||
# How to build Cloud ready images | ||
|
||
Bootc container images can be installed on physical machines, virtual machines and in the cloud. Often it is useful to add the cloud-init package when running the operating systems in the cloud. | ||
|
||
To add cloud-init to your existing bootc container image, executing `make cloud-<platform>` should be enough. For example to build the `cloud-nvidia`, `cloud-amd` and `cloud-intel` bootc containers, respectively: | ||
|
||
``` | ||
make cloud-nvidia | ||
make cloud-amd | ||
make cloud-intel | ||
``` | ||
|
||
# How to build disk images | ||
bootc-image-builder produces disk images using a bootable container as input. Disk images can be used to directly provision a host | ||
The process will write the disk image in <platform>-bootc/build | ||
|
||
To invoke bootc-image-builder, execute make disk-<platform> | ||
``` | ||
make disk-nvidia | ||
``` | ||
or | ||
``` | ||
make disk-nvidia DISK_TYPE=ami BOOTC_IMAGE=quay.io/ai-lab/nvidia-bootc-cloud:latest | ||
``` | ||
|
||
In addition to the variables common to all targets, a few extra can be defined to customize disk image creation | ||
|
||
| Variable | Description | Default | | ||
|-----------------------|-----------------------------------|--------------------------------------------------| | ||
| BOOTC_IMAGE | Image to use as input | `$REGISTRY/$REGISTRY_ORG/$IMAGE_NAME:$IMAGE_TAG` | | ||
| DISK_TYPE | Type of image to build | `qcow2` | | ||
| IMAGE_BUILDER_CONFIG | Path to a build-config file | `EMPTY` | | ||
|
||
Image builder config file is documented in [bootc-image-builder README](https://github.com/osbuild/bootc-image-builder?tab=readme-ov-file#-build-config) | ||
|
||
The following image disk types are currently available: | ||
| Disk type | Target environment | | ||
|-----------------------|---------------------------------------------------------------------------------------| | ||
| `ami` | [Amazon Machine Image](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) | | ||
| `qcow2` **(default)** | [QEMU](https://www.qemu.org/) | | ||
| `vmdk` | [VMDK](https://en.wikipedia.org/wiki/VMDK) usable in vSphere, among others | | ||
| `anaconda-iso` | An unattended Anaconda installer that installs to the first disk found. | | ||
| `raw` | Unformatted [raw disk](https://en.wikipedia.org/wiki/Rawdisk). | | ||
|
||
# Troubleshooting | ||
|
||
Sometimes, interrupting the build process may lead to wanting a complete restart of the process. For those cases, we can instruct `podman` to start from scratch and discard the cached layers. This is possible by passing the `--no-cache` parameter to the build process by using the `CONTAINER_TOOL_EXTRA_ARGS` variable: | ||
|
||
``` | ||
make <platform> CONTAINER_TOOL_EXTRA_ARGS="--no-cache" | ||
``` | ||
|
||
The building of accelerated images requires a lot of temporary disk space. In case you need to specify a directory for temporary storage, this can be done with the `TMPDIR` environment variable: | ||
|
||
``` | ||
make <platform> TMPDIR=/path/to/tmp | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
FROM quay.io/centos-bootc/centos-bootc:stream9 | ||
|
||
ADD rocm.repo /etc/yum.repos.d/rocm.repo | ||
|
||
# Include growfs service | ||
COPY build/usr /usr | ||
|
||
ARG EXTRA_RPM_PACKAGES='' | ||
RUN dnf install -y \ | ||
rocm-smi \ | ||
${EXTRA_RPM_PACKAGES} \ | ||
&& dnf clean all | ||
|
||
# Setup /usr/lib/containers/storage as an additional store for images. | ||
# Remove once the base images have this set by default. | ||
RUN sed -i -e '/additionalimage.*/a "/usr/lib/containers/storage",' \ | ||
/etc/containers/storage.conf && \ | ||
cp /run/.input/ilab /usr/local/bin/ilab | ||
|
||
ARG INSTRUCTLAB_IMAGE="quay.io/ai-lab/instructlab-amd:latest" | ||
ARG VLLM_IMAGE | ||
|
||
RUN sed -i 's/__REPLACE_TRAIN_DEVICE__/cuda/' /usr/local/bin/ilab | ||
RUN sed -i 's/__REPLACE_CONTAINER_DEVICE__/nvidia.com\/gpu=all/' /usr/local/bin/ilab | ||
RUN sed -i "s%__REPLACE_CONTAINER_NAME__%${INSTRUCTLAB_IMAGE}%" /usr/local/bin/ilab | ||
|
||
# Added for running as an OCI Container to prevent Overlay on Overlay issues. | ||
VOLUME /var/lib/containers | ||
|
||
# Prepull the instructlab image | ||
RUN IID=$(podman --root /usr/lib/containers/storage pull oci:/run/.input/vllm) && \ | ||
podman --root /usr/lib/containers/storage image tag ${IID} ${VLLM_IMAGE} | ||
RUN IID=$(podman --root /usr/lib/containers/storage pull oci:/run/.input/instructlab-amd) && \ | ||
podman --root /usr/lib/containers/storage image tag ${IID} ${INSTRUCTLAB_IMAGE} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
|
||
VENDOR ?= amd | ||
IMAGE_NAME ?= $(VENDOR)-bootc | ||
|
||
include ../common/Makefile.common | ||
|
||
default: bootc | ||
|
||
.PHONY: bootc | ||
bootc: prepare-files growfs | ||
"${CONTAINER_TOOL}" build \ | ||
$(ARCH:%=--platform linux/%) \ | ||
--security-opt label=disable \ | ||
--cap-add SYS_ADMIN \ | ||
--file Containerfile \ | ||
-v ${OUTDIR}:/run/.input:ro \ | ||
--tag "${BOOTC_IMAGE}" \ | ||
--build-arg "INSTRUCTLAB_IMAGE=$(INSTRUCTLAB_IMAGE)" \ | ||
--build-arg "VLLM_IMAGE=$(VLLM_IMAGE)" \ | ||
$(EXTRA_RPM_PACKAGES:%=--build-arg EXTRA_RPM_PACKAGES=%) \ | ||
$(FROM:%=--from=%) \ | ||
${CONTAINER_TOOL_EXTRA_ARGS} . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
[ROCm-6.0.2] | ||
name=ROCm6.0.2 | ||
baseurl=https://repo.radeon.com/rocm/rhel$releasever/6.0.2/main | ||
enabled=1 | ||
priority=50 | ||
gpgcheck=1 | ||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
default: cloud | ||
|
||
include ../common/Makefile.common | ||
|
||
REGISTRY ?= quay.io | ||
REGISTRY_ORG ?= ai-lab | ||
IMAGE_TAG ?= latest | ||
|
||
.PHONY: init | ||
init: | ||
git clone https://gitlab.com/bootc-org/examples.git 2> /dev/null || true | ||
(cd examples; git pull origin main) | ||
|
||
.PHONY: cloud | ||
cloud: init | ||
"${CONTAINER_TOOL}" build \ | ||
$(ARCH:%=--platform linux/%) \ | ||
--tag "${REGISTRY}/${REGISTRY_ORG}/${IMAGE_NAME}-cloud:${IMAGE_TAG}" \ | ||
--from="${BOOTC_IMAGE}" \ | ||
examples/cloud-init | ||
|
||
.PHONY: push | ||
push: push-amd push-nvidia |
Oops, something went wrong.