A Python implementation of an adjustable resource manager for MPI applications of a computer cluster. It consists of:
- A scheduling module supporting two algorithms:
- FCFS: Favours old jobs.
- WFP3: Favours short/old jobs while taking into account their respective size.
- An optional backfilling module.
- A resource allocation module implementing three policies:
- Compact: Use all the cores of a node on one app.
- Spare: Use half the cores of a node on one app (unfavored).
- Strip (co-scheduling): Split the cores of a node between two apps. Can be improved with the inclusion of a heatmap, used to identify apps that match well together.
- The main module that glues all of the above.
The main module of this program has been designed as a command line interface (CLI). It takes two arguments:
- <-c,-config>: A yaml configuration file that contains the user's preferences. An example is shown below. This is a required parameter for commencing the execution. In order to run, simply type:
python3 main.py -c path/to/config
.
-
<-i,-info>: An optional argument that can take one of two values:
- queue: Display information about the cluster's current jobs.
- state: Display information about the cluster's nodes.
After being submitted as a batch job, the main program assigns MPI tasks across a number of bound nodes. These tasks are part of a queue consisting of applications from the Nas Parallel Benchmarks (NPB) suite.
In order to run the code, Python's PyYAML framework must be installed. This can be done with the pip package installer by typing pip install pyyaml
. Full code dependencies can be found in the environment.yml file which has been exported from the miniconda package manager and can be used locally with the conda env create -f environment.yml
command.