Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Setup a SLURM cluster in the GitHub CI for integration tests [MT-34] (#…
…84) * Test out GH Action to setup a fake SLURM cluster Signed-off-by: Fabrice Normandin <[email protected]> * Change the scope to also run on PRs Signed-off-by: Fabrice Normandin <[email protected]> * Change the command to use `srun (...) hostname` Signed-off-by: Fabrice Normandin <[email protected]> * Test out running tests that call srun over ssh Signed-off-by: Fabrice Normandin <[email protected]> * Use `poetry run pytest` instead of `pytest` Signed-off-by: Fabrice Normandin <[email protected]> * Try to test the `ensure_allocation` method Signed-off-by: Fabrice Normandin <[email protected]> * Simplify to avoid hanging on test setup Signed-off-by: Fabrice Normandin <[email protected]> * Skip making a Connection (hopefully fixes hang) Signed-off-by: Fabrice Normandin <[email protected]> * Try using a custom version of setup-slurm action Signed-off-by: Fabrice Normandin <[email protected]> * Rename custom action file Signed-off-by: Fabrice Normandin <[email protected]> * Try to fix the path to the custom action file Signed-off-by: Fabrice Normandin <[email protected]> * Fix role number in custom action file Signed-off-by: Fabrice Normandin <[email protected]> * Only mark one partition with Default: YES Signed-off-by: Fabrice Normandin <[email protected]> * Only have `localhost` as a node Signed-off-by: Fabrice Normandin <[email protected]> * Re-simplify test to check that slurm works Signed-off-by: Fabrice Normandin <[email protected]> * Put the slurm playbook in a file Signed-off-by: Fabrice Normandin <[email protected]> * Add main and unkillable partitions Signed-off-by: Fabrice Normandin <[email protected]> * Trying to add tests using the local SLURM cluster Signed-off-by: Fabrice Normandin <[email protected]> * Add `in_stream=False` to `run` and `simple_run` Signed-off-by: Fabrice Normandin <[email protected]> * Simplify tests: greatly reduce need for -s flag Signed-off-by: Fabrice Normandin <[email protected]> * `SlurmRemote.ensure_allocation` test works on Mila Signed-off-by: Fabrice Normandin <[email protected]> * Try to make tests timeout instead of hang in CI Signed-off-by: Fabrice Normandin <[email protected]> * Make slurm tests the integration tests in build Signed-off-by: Fabrice Normandin <[email protected]> * Skip some tests for now to debug the CI issues Signed-off-by: Fabrice Normandin <[email protected]> * Only run integration tests with slurm on linux :( Signed-off-by: Fabrice Normandin <[email protected]> * Debugging hanging integration test Signed-off-by: Fabrice Normandin <[email protected]> * Test if hanging test is due to nested sallocs Signed-off-by: Fabrice Normandin <[email protected]> * Skip tests that use salloc/sbatch in GitHub CI :( Signed-off-by: Fabrice Normandin <[email protected]> * Minor tying/docstring improvements to Remote class Signed-off-by: Fabrice Normandin <[email protected]> * Add some tests for SlurmRemote.run and such Signed-off-by: Fabrice Normandin <[email protected]> * Don't actually extract jobid from salloc for now Signed-off-by: Fabrice Normandin <[email protected]> * Add sleeps so sacct can update to show recent jobs Signed-off-by: Fabrice Normandin <[email protected]> * Mark tests that cause a hang in GitHub CI Signed-off-by: Fabrice Normandin <[email protected]> * Add timeout of 3 minutes to integration tests step Signed-off-by: Fabrice Normandin <[email protected]> * Remove check that fails in GitHub CI Signed-off-by: Fabrice Normandin <[email protected]> * Update tests/cli/test_slurm_remote.py Co-authored-by: satyaog <[email protected]> --------- Signed-off-by: Fabrice Normandin <[email protected]> Signed-off-by: Fabrice Normandin <[email protected]> Co-authored-by: satyaog <[email protected]>
- Loading branch information