-
Notifications
You must be signed in to change notification settings - Fork 89
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ci: de-duplicate deps and get ci images to build again
- Loading branch information
1 parent
76f43c0
commit 9ec4a39
Showing
53 changed files
with
980 additions
and
513 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# About these Dockerfiles | ||
|
||
As dask-gateway can be used to start different kinds of dask clusters, we need | ||
to be able to test against those dask cluster backends. To do that we maintain | ||
docker images setup to run the various dask cluster backends so we can test | ||
against them. | ||
|
||
The images doesn't install `dask-gateway-server` within them as then we would | ||
need to rebuild the images all the time with the specific version of | ||
`dask-gateway-server` we want to test. Instead, the idea is to mount the local | ||
code to a container and install dependencies before that before running the | ||
tests. For example the `start.sh` script starts a container, and | ||
`install.sh`/`script.sh` are wrappers to run `_install.sh`/`_script.py` scripts | ||
in the started container. | ||
|
||
## Manual build and update of images | ||
|
||
For now these images are built and updated manually. Below are instructions for | ||
a maintainer of the dask/dask-gateway repo on how to do it. | ||
|
||
1. Create a personal access token (PAT) for your account with `write:packages` | ||
permissions at https://github.com/settings/tokens/new. | ||
|
||
1. Login to the ghcr.io container registry with the PAT: | ||
|
||
```shell | ||
docker login ghcr.io -u your-username | ||
``` | ||
|
||
1. Build the images: | ||
|
||
```shell | ||
docker build --no-cache -t ghcr.io/dask/dask-gateway-ci-base ./base | ||
docker build --no-cache -t ghcr.io/dask/dask-gateway-ci-hadoop ./hadoop | ||
docker build --no-cache -t ghcr.io/dask/dask-gateway-ci-pbs ./pbs | ||
docker build --no-cache -t ghcr.io/dask/dask-gateway-ci-slurm ./slurm | ||
``` | ||
|
||
1. Verify that images seem to work | ||
|
||
```shell | ||
# hadoop: verify that the supervisord programs starts successfully | ||
docker run --hostname=master.example.com --rm ghcr.io/dask/dask-gateway-ci-hadoop | ||
|
||
# pbs: verify that logs doesn't include errors | ||
docker run --hostname=pbs --rm ghcr.io/dask/dask-gateway-ci-pbs | ||
|
||
# slurm: verify that the supervisord programs starts successfully | ||
docker run --hostname=slurm --rm ghcr.io/dask/dask-gateway-ci-slurm | ||
``` | ||
|
||
1. Push the images: | ||
|
||
```shell | ||
docker push ghcr.io/dask/dask-gateway-ci-base | ||
docker push ghcr.io/dask/dask-gateway-ci-hadoop | ||
docker push ghcr.io/dask/dask-gateway-ci-pbs | ||
docker push ghcr.io/dask/dask-gateway-ci-slurm | ||
``` | ||
|
||
## Debugging | ||
|
||
### General advice | ||
|
||
1. If you get a `docker build` error, you can do `docker run -it --rm <hash>` to | ||
a saved layer before the erroring step and then manually do the next `RUN` | ||
step or inspect the file system of its current state. Note that intermediary | ||
layers are not saved if you have set `export DOCKER_BUILDKIT=1`, so this | ||
trick can only be used without buildkit. | ||
1. A Dockerfile's `COPY` command can update permissions of folders if you let it | ||
copy nested folders. For example, `COPY ./files /` would update the | ||
permissions of `/etc` based on the permissions set on the folder and files in | ||
this git repo locally. | ||
1. File permissions you have set in this git repo locally won't be version | ||
controlled, besides the execute bit. Due to that, you must avoid relying on | ||
local file permissions when building images. | ||
|
||
### The hadoop image | ||
|
||
Setting up the YARN backend, part of Hadoop, was very tricky. Here are some | ||
commands of relevance to debug the container. | ||
|
||
```shell | ||
# Build the base image | ||
docker build --tag ghcr.io/dask/dask-gateway-ci-base ./base | ||
|
||
# Build the hadoop image | ||
docker build --tag ghcr.io/dask/dask-gateway-ci-hadoop ./hadoop | ||
|
||
# Start a container and watch logs from supervisord that starts the various | ||
# programs we need to configure and run successfully. | ||
docker run --hostname master.example.com --rm ghcr.io/dask/dask-gateway-ci-hadoop | ||
|
||
# Start a container and inspect the container from a shell if something doesn't | ||
# start correctly. | ||
docker stop hadoop --timeout=0 | ||
docker run --name hadoop --hostname master.example.com --detach --rm ghcr.io/dask/dask-gateway-ci-hadoop | ||
docker exec -it hadoop bash | ||
|
||
# Useful commands to run INSIDE the built and started container | ||
supervisorctl status | ||
cat /var/log/supervisor/hdfs-namenode.log | ||
cat /var/log/supervisor/hdfs-datanode.log | ||
cat /var/log/supervisor/yarn-nodemanager.log | ||
cat /var/log/supervisor/yarn-resourcemanager.log | ||
cat /var/log/supervisor/krb5kdc.log | ||
cat /var/log/supervisor/kadmind.log | ||
``` | ||
|
||
### The slurm image | ||
|
||
If you upgrade `slurm` to a new version, you may very well run into breaking | ||
changes in your `slurm.conf`. | ||
|
||
```shell | ||
# Build the base image | ||
docker build --tag ghcr.io/dask/dask-gateway-ci-base ./base | ||
|
||
# Build the slurm image | ||
docker build --tag ghcr.io/dask/dask-gateway-ci-slurm ./slurm | ||
|
||
# Start a container and watch logs from supervisord that starts the various | ||
# programs we need to configure and run successfully. | ||
docker run --hostname slurm --rm ghcr.io/dask/dask-gateway-ci-slurm | ||
|
||
# Start a container and inspect the container from a shell if something doesn't | ||
# start correctly. | ||
docker stop slurm --timeout=0 | ||
docker run --name slurm --hostname slurm --detach --rm ghcr.io/dask/dask-gateway-ci-slurm | ||
docker exec -it slurm bash | ||
|
||
# Useful commands to run INSIDE the built and started container | ||
supervisorctl status | ||
cat /var/log/supervisord.log | ||
cat /var/log/supervisor/slurmdbd.log | ||
cat /var/log/supervisor/slurmctld.log | ||
``` |
Oops, something went wrong.