Skip to content

Garvan-Data-Science-Platform/hpci

Repository files navigation

hpci

You can't spell reproducibility without CI

hpci is a tool to integrate Continuous Integration (CI) with High Performance Compute (HPC).

Install hpci on your CI runner, and hpci can schedule and monitor jobs on HPC. When a job has finished, the job's exit status code is recorded.

hpci will copy log files from HPC to the CI runner, print these to the CI logs. Then hpci will exit will the same exit status code as the job on HPC, therefore CI will crash if the job on HPC fails.

Generated with template-haskell

How to use hpci

Documentation is still being developed. Contact the maintainer (email details in the hpci.cabal file) if you are interested in using hpci.

Development environment

If you use nix:

This project uses nix and nix flakes to provide a consistent software environment for development and compilation. To get started with nix I recommend using the Determinante Systems nix installer, and the Zero to Nix guide to learning Nix and flakes. There is also a .envrc file (requires installing direnv and nix-direnv) to automatically start a nix flake devShell when you cd into the directory containing the code from this repo. The flake uses cabal2nix to generate nix build instructions from the Cabal file, therefore new haskell dependencies can be added via cabal, with no need to adjust the flake.nix.

The flake.nix file is configured to work for both x86_64-linux and aarch-darwin architecture + operating systems, however the set up is slightly different between them:

  • the x86_64-linux development environment uses a version of GHC that has been compiled against musl rather than glibc.

    • This allows generation of a statically linked Haskell executable (see this blog post for more details).
    • I am not currently using linux for my day-to-day development on this project, rather I am using this exclusively for generating portable executables.
  • the aarch64-darwin development environment does not use musl-linked packages.

If you do not set up direnv (see links above), activate a nix development environment using nix develop .#devShells.x86_64-linux or nix develop .#devShells.aarch64-darwin depending on your architecture + operating system.

To build a statically linked executable on x86_64-linux use nix build .#packages.x86_64-linux.hpci.

To build a dynamically linked executable on aarch64-darwin use cabal build inside the flake devShell.

If you do not use nix:

You do not need nix to run the code, all you need is the Glascow Haskell Compiler (GHC) and Cabal package manager. These can been downloaded using GHCup.

To compile and run the program, use cabal run exes -- followed by the following required arguments:

  • --user username on remote system
  • --host IP address or domain name or remote host
  • --port port declaration - typically 22
  • --publickey local filepath for ssh public key
  • --privatekey local filepath for ssh private key schedule to schedule a job on HPC
  • --script local filepath of .pbs script to accomany the qsub command
  • --logFile remote filepath of logfile produced by .pbs script to copy back to local system
  • -c Optional Configuration in the form of KEY1=VALUE1,KEY2=VALUE2 that is passed to qsub when submitting the job

OR

  • --user username on remote system
  • --host IP address or domain name or remote host
  • --port port declaration - typically 22
  • --publickey local filepath for ssh public key
  • --privatekey local filepath for ssh private key exec "command" to run a command on HPC

Testing in CI

Integration tests are run against a version of OpenPBS dockerised along with an openssh server. The code for building the docker image (and associated scripts) are in the ci directory. It is tricky to build this image on aarch64-darwin as building the docker image involves compiling OpenPBS from source.

The .github/workflows/build-pbs.yml workflow file builds the image in ci (only if there has been a change to code in the ci directory, or the build-pbs.yml workflow file) and pushes the image to a GCP artifact registry. The .github/workflows/build-test.yml builds a fully statically linked version of hpci, pulls the test docker image, and runs the new hpci binary to connect to the OpenPBS docker container and run a basic job submission.

Testing locally

Create a ssh key called test_key at the root of the hpci repository with ssh-keygen -t ed25519 -C "hpci_test_key" -f ./test_key.

The makefile has convenience commands for local testing. A summary of commands can be accessed using make help.

For typical development and testing on an aarch64-darwin machine run:

  • gcloud auth login --project [GCP_PROJECT_NAME]
  • gcloud auth configure-docker [GCP_REGION]-docker.pkg.dev/[GCP_PROJECT_NAME]/docker
  • Make sure you have a container runtime like docker desktop or colima
  • make PROJECT=[GCP_PROJECT_NAME] pull to pull docker image
  • make PROJECT=[GCP_PROJECT_NAME] run to run OpenPBS docker container.
  • use docker logs -f pbs to watch the logs and wait until the sshd server has been restarted.
  • In a seperate terminal run ls app/*.hs | entr make test to recompile and run tests eachtime hpci haskell files are saved (requires installing entr)
  • If you are on a x86_84-linux machine, then you can run make test-bin to test with the compiled binary
  • make stop to stop (and automatically remove) docker container when finished