The images used to be built and pushed to our organization at DockerHub through GitHub Actions, but are now published as packages within this repo instead. We also use GitHub Actions for testing and pushing our stable images to production. You may also check out scripts.md for a more in-depth look at the Python code underlying these actions.
We have four actions that we use to develop, test, and deploy our Docker Stack.
- main.yml
- Build + push + test images.
- tag.yml
- Give "stable" tag to production images.
- tag_global_stable.yml
- Give an extra "global-stable" tag to production images.
- test_gpu.yml
- Test GPU code on our scipy-ml-notebook images.
We use a tool called doit that allows for more complicated actions to be written and executed during Actions. See dodo.py for those functions.
Updating Docker images is very similar to updating an open-source library. Build, test, and deploy will be building the Docker images, testing images if they have the right contents and features, and lastly publishing it on GHCR. We also add in steps to generate image "manifests" for listing out package informations and publishing them to the project wiki, and steps to dump logs and various artifacts that were produced in the Actions run into zip files and uploaded for reference.
This action defined by main.yml
triggers our entire pipeline. It happens on all PRs to the main branch and all commits on all branches. (Tips: to skip action on push, add "skip ci" to your commit message.)
A general introduction to jobs::docker_pipeline
steps in main.yml
Before anything is triggered, Github will check for "skip ci" in PR title and commit message. If the string is found, the entire workflow is skipped.
Github runner/runtime is the top-level caller of the following actions/steps.
- set up the environment by doing general clean-up and dependency installations.
Setup artifacts
: create subfolders in the project root for storing files. (artifacts/
,manifests/
,logs/
) See file system doc for more.- The pipeline will then look for changes in the
images/
in the latest git commit to determine which images' source was changed. This information will be used to determine what images need to be updated. The list of changed images is kept inartifacts/IMAGES_CHANGED
. Clone Wiki
: clone thewiki/
, which is a Github backend hidden folder consisting ofHome.md
and all manifest pages of successful image build. The primary purpose is to add image manifest pages of the current build if we are currently on main.Build stack
: perform all core tasks of this pipeline which can be broken down into the following steps See scripts.md for a more in-depth look at this step.:- use git API to check what files have changed.
- load information from spec.yml.
- This is where all images get their year-quarter prefix from (i.e. 2023.2). It is under tag.prefix.
- use above 2 information, build a n-nary tree to encode all details for following tasks.
- login to GHCR
- do a BFS on the tree. For each tree Node (corresponding to an image), a list of operations is carried out. See scripts.py
- store logs in .yml format to build_artifacts
Push Wiki to GitHub
: (activate ONLY IFBuild stack
is successful ANDgit.ref
, which is current branch, is main) make the new image manifest pages permanent and public.Archive artifacts and logs
: zipartifacts/
,manifests/
, andlogs/
and make it ready for download at Actions summary page.
This action is run manually and requires an existing tag (most likely 202x.x-main). The requirement is that all 3 images had been pushed to GHCR AND their manifests (.md files) exist under wiki, like this. There is an optional dry-run setting that allows you to verify the output of the action without actually pushing new stable images.
After being executed, the action pulls each image in the stack from GHCR using the doit tag
as defined in dodo.py and then pushes them back up to GHCR using the format "ucsdets<image_name>:<year>.<quarter>-stable". For example: ghcr.io/ucsd-ets/datascience-notebook:2023.2-stable.
The tag pulls the images with matching tag to the value the user passes in, regardless of configuration elsewhere. For example, if "2021.2-dev" is supplied to the action, it will always try to look for those <image_name>:2021.2-dev and tag them as stable even if the most recent year-quarter prefix is 2023.2.
It will update Home.md by appending the manifest links of these stable images to the table.
This action will not run until the test_gpu.yml has been run and passed.
This is also a manual action and very similar to tag.yml
. Here are their differences:
- Its purpose is to tag the set of "year-quarter-stable" images (by
tag.yml
) into "global-stable" ones. E.g. ghcr.io/ucsd-ets/datascience-notebook:2023.2-stable to ghcr.io/ucsd-ets/datascience-notebook:stable. - A "year-quarter-stable" answers "what are the production images in that quarter", while a "global-stable" answers "what are the production images being used now, this quarter".
tag.yml
expect an input like "202x.x-main", but branch name is not necessarily "main". I.e. in rare cases, you may enable thePush Wiki to GitHub
step for a dev-branch build (such that their markdown manifests exist) and tag them as "year-quarter-stable". Buttag_global_stable.yml
enforces that you can only tag "year-quarter-stable" images into "global-stable".- The above enforcement is achieved by user input format.
tag.yml
receives <year>.<quarter>-<branch_name>" as input, whiletag_global_stable.yml
only accepts <year>.<quarter>". tag.yml
will update Home.md by appending another cell for "year-quarter-stable" images built in the current tagging action.tag_global_stable.yml
will rewrite Stable_Tag.md, which only holds a single cell for current global-stable images.
This action executes code that actually requires the usage of a GPU (that is, training some simple ML model instead of calling is_gpu_available()
or something) on the scipy-ml-notebook. It can be run manually, but will also run everytime tag.yml is called. It takes the same tag argument that tag.yml does.
When executed, the action logs onto dsmlp-login.ucsd.edu as grader-test-01 (who's password is stored in the GitHub Actions secrets, and should be updated if the account's password is to be changed). It then launches the scipy-ml-notebook with the specified tag and runs pytest to verify that Tensorflow and PyTorch work. This test is required to pass for tag.yml to be run.