Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Use precise terminology for image components #11157

Merged
merged 3 commits into from
Jan 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/jvm_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -247,10 +247,10 @@ jobs:
matrix:
variant:
- name: cpu
container_id: xgb-ci.jvm
image_repo: xgb-ci.jvm
artifact_from: build-test-jvm-packages
- name: gpu
container_id: xgb-ci.jvm_gpu_build
image_repo: xgb-ci.jvm_gpu_build
artifact_from: build-jvm-gpu
scala_version: ['2.12', '2.13']
steps:
Expand All @@ -272,4 +272,4 @@ jobs:
- name: Deploy JVM packages to S3
run: |
bash ops/pipeline/deploy-jvm-packages.sh ${{ matrix.variant.name }} \
${{ matrix.variant.container_id }} ${{ matrix.scala_version }}
${{ matrix.variant.image_repo }} ${{ matrix.scala_version }}
10 changes: 5 additions & 5 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -220,22 +220,22 @@ jobs:
matrix:
include:
- description: single-gpu
container: xgb-ci.gpu
image_repo: xgb-ci.gpu
suite: gpu
runner: linux-amd64-gpu
artifact_from: build-cuda
- description: multiple-gpu
container: xgb-ci.gpu
image_repo: xgb-ci.gpu
suite: mgpu
runner: linux-amd64-mgpu
artifact_from: build-cuda
- description: cpu-amd64
container: xgb-ci.cpu
image_repo: xgb-ci.cpu
suite: cpu
runner: linux-amd64-cpu
artifact_from: build-cuda
- description: cpu-arm64
container: xgb-ci.aarch64
image_repo: xgb-ci.aarch64
suite: cpu-arm64
runner: linux-arm64-cpu
artifact_from: build-cpu-arm64
Expand All @@ -257,4 +257,4 @@ jobs:
mv -v wheelhouse/xgboost .
chmod +x ./xgboost
- name: Run Python tests, ${{ matrix.description }}
run: bash ops/pipeline/test-python-wheel.sh ${{ matrix.suite }} ${{ matrix.container }}
run: bash ops/pipeline/test-python-wheel.sh ${{ matrix.suite }} ${{ matrix.image_repo }}
55 changes: 29 additions & 26 deletions doc/contrib/ci.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,17 @@ To make changes to the CI container, carry out the following steps:
4. Submit a pull request to `dmlc/xgboost-devops <https://github.com/dmlc/xgboost-devops>`_ with
the proposed changes to the Dockerfile. Make note of the pull request number. Example: ``#204``
5. Clone `dmlc/xgboost <https://github.com/dmlc/xgboost>`_ and update all references to the
old container to point to the new container. More specifically, all Docker tags of format
``492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main`` should have the last
component replaced with ``PR-#``, where ``#`` is the pull request number. For the example above,
old container to point to the new container. More specifically, all container image URIs of form
``492475357299.dkr.ecr.us-west-2.amazonaws.com/[image_repo]:main`` should have its image tag
(last component) replaced with ``PR-#``, where ``#`` is the pull request number.
For the example above,
we'd replace ``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main`` with
``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:PR-204``.
6. Now submit a pull request to `dmlc/xgboost <https://github.com/dmlc/xgboost>`_. The CI will
run tests using the new container. Verify that all tests pass.
7. Merge the pull request in ``dmlc/xgboost-devops``. Wait until the CI completes on the ``main`` branch.
8. Go back to the the pull request for ``dmlc/xgboost`` and change the container references back
to ``:main``.
8. Go back to the the pull request for ``dmlc/xgboost`` and revise all the container references to use
the old tag ``:main``.
9. Merge the pull request in ``dmlc/xgboost``.

.. _build_run_docker_locally:
Expand Down Expand Up @@ -83,11 +84,12 @@ and invoke ``containers/docker_build.sh`` as follows:
# For local testing, set them to "main"
export GITHUB_SHA="main"
export BRANCH_NAME="main"
bash containers/docker_build.sh CONTAINER_ID
bash containers/docker_build.sh IMAGE_REPO

where ``CONTAINER_ID`` identifies for the container. The wrapper script will look up the YAML file
``containers/ci_container.yml``. For example, when ``CONTAINER_ID`` is set to ``xgb-ci.gpu``,
the script will use the corresponding entry from ``containers/ci_container.yml``:
where ``IMAGE_REPO`` is the name of the container image. The wrapper script will look up the
YAML file ``containers/ci_container.yml``. For example, when ``IMAGE_REPO`` is set to
``xgb-ci.gpu``, the script will use the corresponding entry from
``containers/ci_container.yml``:

.. code-block:: yaml

Expand All @@ -113,10 +115,11 @@ the build arguments are:

The build arguments provide inputs to the ``ARG`` instructions in the Dockerfile.

When ``containers/docker_build.sh`` completes, you will have access to the container with tag
``492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main``. The prefix
``492475357299.dkr.ecr.us-west-2.amazonaws.com/`` was added so that the container could
later be uploaded to AWS Elastic Container Registry (ECR), a private Docker registry.
When ``containers/docker_build.sh`` completes, you will have access to the container with the
(fully qualified) URI ``492475357299.dkr.ecr.us-west-2.amazonaws.com/[image_repo]:main``.
The prefix ``492475357299.dkr.ecr.us-west-2.amazonaws.com/`` was added so that
the container could later be uploaded to AWS Elastic Container Registry (ECR),
a private Docker registry.

-----------------------------------------
To run commands within a Docker container
Expand All @@ -126,7 +129,7 @@ Invoke ``ops/docker_run.py`` from the main ``dmlc/xgboost`` repo as follows:
.. code-block:: bash

python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/[image_repo]:[image_tag] \
[--use-gpus] \
-- "command to run inside the container"

Expand All @@ -138,12 +141,12 @@ For example:

# Run without GPU
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \
-- bash ops/pipeline/build-cpu-impl.sh cpu

# Run with NVIDIA GPU
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--use-gpus \
-- bash ops/pipeline/test-python-wheel-impl.sh gpu

Expand All @@ -154,7 +157,7 @@ Optionally, you can specify ``--run-args`` to pass extra arguments to ``docker r
# Allocate extra space in /dev/shm to enable NCCL
# Also run the container with elevated privileges
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--use-gpus \
--run-args='--shm-size=4g --privileged' \
-- bash ops/pipeline/test-python-wheel-impl.sh gpu
Expand All @@ -171,7 +174,7 @@ Examples: useful tasks for local development

export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu_build_rockylinux8:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.gpu_build_rockylinux8:main \
-- ops/pipeline/build-cuda-impl.sh

* Run Python tests
Expand All @@ -180,7 +183,7 @@ Examples: useful tasks for local development

export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.cpu:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.cpu:main \
-- ops/pipeline/test-python-wheel-impl.sh cpu

* Run Python tests with GPU algorithm
Expand All @@ -189,7 +192,7 @@ Examples: useful tasks for local development

export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.gpu:main \
--use-gpus \
-- ops/pipeline/test-python-wheel-impl.sh gpu

Expand All @@ -199,7 +202,7 @@ Examples: useful tasks for local development

export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.gpu:main \
--use-gpus \
--run-args='--shm-size=4g' \
-- ops/pipeline/test-python-wheel-impl.sh mgpu
Expand All @@ -212,7 +215,7 @@ Examples: useful tasks for local development
export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
export SCALA_VERSION=2.12 # Specify Scala version (2.12 or 2.13)
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.jvm:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.jvm:main \
--run-args "-e SCALA_VERSION" \
-- ops/pipeline/build-test-jvm-packages-impl.sh

Expand All @@ -224,7 +227,7 @@ Examples: useful tasks for local development
export SCALA_VERSION=2.12 # Specify Scala version (2.12 or 2.13)
export USE_CUDA=1
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.jvm_gpu_build:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.jvm_gpu_build:main \
--use-gpus \
--run-args "-e SCALA_VERSION -e USE_CUDA --shm-size=4g" \
-- ops/pipeline/build-test-jvm-packages-impl.sh
Expand Down Expand Up @@ -456,7 +459,7 @@ For example, when you run ``bash containers/docker_build.sh xgb-ci.gpu``, the lo

# docker_build.sh calls docker_build.py...
python3 containers/docker_build.py --container-def gpu \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--build-arg CUDA_VERSION_ARG=12.4.1 --build-arg NCCL_VERSION_ARG=2.23.4-1 \
--build-arg RAPIDS_VERSION_ARG=24.10

Expand All @@ -480,14 +483,14 @@ Here is an example with ``docker_run.py``:

# Run without GPU
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \
-- bash ops/pipeline/build-cpu-impl.sh cpu

# Run with NVIDIA GPU
# Allocate extra space in /dev/shm to enable NCCL
# Also run the container with elevated privileges
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--use-gpus \
--run-args='--shm-size=4g --privileged' \
-- bash ops/pipeline/test-python-wheel-impl.sh gpu
Expand Down
12 changes: 6 additions & 6 deletions ops/docker_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def fancy_print_cli_args(*, cli_args: list[str]) -> None:

def docker_run(
*,
container_tag: str,
image_uri: str,
command_args: list[str],
use_gpus: bool,
workdir: pathlib.Path,
Expand All @@ -71,7 +71,7 @@ def docker_run(
itertools.chain.from_iterable([["-e", f"{k}={v}"] for k, v in user_ids.items()])
)
docker_run_cli_args.extend(extra_args)
docker_run_cli_args.append(container_tag)
docker_run_cli_args.append(image_uri)
docker_run_cli_args.extend(command_args)

cli_args = ["docker", "run"] + docker_run_cli_args
Expand All @@ -90,7 +90,7 @@ def main(*, args: argparse.Namespace) -> None:
run_args.append("-it")

docker_run(
container_tag=args.container_tag,
image_uri=args.image_uri,
command_args=args.command_args,
use_gpus=args.use_gpus,
workdir=args.workdir,
Expand All @@ -102,18 +102,18 @@ def main(*, args: argparse.Namespace) -> None:
if __name__ == "__main__":
parser = argparse.ArgumentParser(
usage=(
f"{sys.argv[0]} --container-tag CONTAINER_TAG [--use-gpus] [--interactive] "
f"{sys.argv[0]} --image-uri IMAGE_URI [--use-gpus] [--interactive] "
"[--workdir WORKDIR] [--run-args RUN_ARGS] -- COMMAND_ARG "
"[COMMAND_ARG ...]"
),
description="Run tasks inside a Docker container",
)
parser.add_argument(
"--container-tag",
"--image-uri",
type=str,
required=True,
help=(
"Container tag to identify the container, e.g. "
"Fully qualified image URI to identify the container, e.g. "
"492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main"
),
)
Expand Down
6 changes: 3 additions & 3 deletions ops/pipeline/build-cpu-arm64.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,17 @@ source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

WHEEL_TAG=manylinux_2_28_aarch64
CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.aarch64:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.aarch64:main

echo "--- Build CPU code targeting ARM64"
set -x
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- ops/pipeline/build-cpu-arm64-impl.sh

echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard"
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- auditwheel repair --only-plat \
--plat ${WHEEL_TAG} python-package/dist/*.whl
python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \
Expand Down
6 changes: 3 additions & 3 deletions ops/pipeline/build-cpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ set -euo pipefail
source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.cpu:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.cpu:main

echo "--- Build CPU code"
set -x
Expand All @@ -24,13 +24,13 @@ export UBSAN_OPTIONS='print_stacktrace=1:log_path=ubsan_error.log'
# Work around https://github.com/google/sanitizers/issues/1614
sudo sysctl vm.mmap_rnd_bits=28
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
--run-args '-e ASAN_SYMBOLIZER_PATH -e ASAN_OPTIONS -e UBSAN_OPTIONS
--cap-add SYS_PTRACE' \
-- bash ops/pipeline/build-cpu-impl.sh cpu-sanitizer

# Test without sanitizer
rm -rf build/
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- bash ops/pipeline/build-cpu-impl.sh cpu
12 changes: 6 additions & 6 deletions ops/pipeline/build-cuda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ fi

if [[ "$#" -lt 2 ]]
then
echo "Usage: $0 [container_id] {enable-rmm,disable-rmm}"
echo "Usage: $0 [image_repo] {enable-rmm,disable-rmm}"
exit 2
fi
container_id="$1"
image_repo="$1"
rmm_flag="$2"

# Validate RMM flag
Expand All @@ -35,8 +35,8 @@ source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

WHEEL_TAG=manylinux_2_28_x86_64
BUILD_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main"
MANYLINUX_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main"
BUILD_IMAGE_URI="${DOCKER_REGISTRY_URL}/${image_repo}:main"
MANYLINUX_IMAGE_URI="${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main"

echo "--- Build with CUDA"

Expand All @@ -57,13 +57,13 @@ fi
set -x

python3 ops/docker_run.py \
--container-tag ${BUILD_CONTAINER_TAG} \
--image-uri ${BUILD_IMAGE_URI} \
--run-args='-e BUILD_ONLY_SM75 -e USE_RMM' \
-- ops/pipeline/build-cuda-impl.sh

echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard"
python3 ops/docker_run.py \
--container-tag ${MANYLINUX_CONTAINER_TAG} \
--image-uri ${MANYLINUX_IMAGE_URI} \
-- auditwheel repair --only-plat \
--plat ${WHEEL_TAG} python-package/dist/*.whl
python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \
Expand Down
4 changes: 2 additions & 2 deletions ops/pipeline/build-gpu-rpkg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ fi
source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.gpu_build_r_rockylinux8:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.gpu_build_r_rockylinux8:main

echo "--- Build XGBoost R package with CUDA"
set -x
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- ops/pipeline/build-gpu-rpkg-impl.sh \
${GITHUB_SHA}

Expand Down
4 changes: 2 additions & 2 deletions ops/pipeline/build-jvm-doc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ fi

source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main

echo "--- Build JVM packages doc"
set -x
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- ops/pipeline/build-jvm-doc-impl.sh ${BRANCH_NAME}
4 changes: 2 additions & 2 deletions ops/pipeline/build-jvm-gpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ set -euo pipefail
source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main

echo "--- Build libxgboost4j.so with CUDA"

Expand All @@ -32,5 +32,5 @@ mkdir -p build-gpu/
# TODO(hcho3): Remove this once new CUDA version ships with CCCL 2.6.0+
git clone https://github.com/NVIDIA/cccl.git -b v2.6.1 --quiet --depth 1
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- bash -c "${COMMAND}"
Loading
Loading