Skip to content

Commit

Permalink
Update installation
Browse files Browse the repository at this point in the history
  • Loading branch information
jan-janssen committed Nov 19, 2024
1 parent 294db24 commit 8fa5d7b
Show file tree
Hide file tree
Showing 3 changed files with 112 additions and 35 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ Up-scale python functions for high performance computing (HPC) with executorlib.

## Key Features
* **Up-scale your Python functions beyond a single computer.** - executorlib extends the [Executor interface](https://docs.python.org/3/library/concurrent.futures.html#executor-objects)
from the Python standard library and combines it with job schedulers for high performance computing (HPC) like [SLURM](https://slurm.schedmd.com)
and [flux](http://flux-framework.org). With this combination executorlib allows users to distribute their Python
functions over multiple compute nodes.
from the Python standard library and combines it with job schedulers for high performance computing (HPC) including
the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) and [flux](http://flux-framework.org).
With this combination executorlib allows users to distribute their Python functions over multiple compute nodes.
* **Parallelize your Python program one function at a time** - executorlib allows users to assign dedicated computing
resources like CPU cores, threads or GPUs to one Python function at a time. So you can accelerate your Python code
function by function.
resources like CPU cores, threads or GPUs to one Python function call at a time. So you can accelerate your Python
code function by function.
* **Permanent caching of intermediate results to accelerate rapid prototyping** - To accelerate the development of
machine learning pipelines and simulation workflows executorlib provides optional caching of intermediate results for
iterative development in interactive environments like jupyter notebooks.
Expand All @@ -37,7 +37,7 @@ with Executor(backend="local") as exe:
print([f.result() for f in future_lst])
```
In the same way executorlib can also execute Python functions which use additional computing resources, like multiple
CPU cores, CPU threads or GPUs. For example if the Python function internally uses tthe Message Passing Interface (MPI)
CPU cores, CPU threads or GPUs. For example if the Python function internally uses the Message Passing Interface (MPI)
via the [mpi4py](https://mpi4py.readthedocs.io) Python libary:
```python
from executorlib import Executor
Expand All @@ -56,11 +56,11 @@ with Executor(backend="local") as exe:
print(fs.result())
```
The additional `resource_dict` parameter defines the computing resources allocated to the execution of the submitted
Python function. In addition to the compute cores `cores` the resource dictionary can also define the threads per core
Python function. In addition to the compute cores `cores`, the resource dictionary can also define the threads per core
as `threads_per_core`, the GPUs per core as `gpus_per_core`, the working directory with `cwd`, the option to use the
OpenMPI oversubscribe feature with `openmpi_oversubscribe` and finally for the Simple Linux Utility for Resource
Management (SLURM) queuing system the option to provide additional command line arguments with the `slurm_cmd_args`
parameter - [resource dictionary]().
OpenMPI oversubscribe feature with `openmpi_oversubscribe` and finally for the [Simple Linux Utility for Resource
Management (SLURM)](https://slurm.schedmd.com) queuing system the option to provide additional command line arguments
with the `slurm_cmd_args` parameter - [resource dictionary]().

This flexibility to assign computing resources on a per-function-call basis simplifies the up-scaling of Python programs.
Only the part of the Python functions which benefit from parallel execution are implemented as MPI parallel Python
Expand Down
109 changes: 87 additions & 22 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,78 @@
# Installation
## Compatible Job Schedulers
For optimal performance the [flux framework](https://flux-framework.org) is recommended as job scheduler. Even when the
[Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) or any other job scheduler is already
installed on the HPC cluster [flux framework](https://flux-framework.org) can be installed as a secondary job scheduler
to leverage [flux framework](https://flux-framework.org) for the distribution of resources within a given allocation of
the primary scheduler.
## Minimal
Executorlib internally uses the [zero message queue (zmq)](https://zeromq.org) for communication between the Python
processes and [cloudpickle](https://github.com/cloudpipe/cloudpickle) for serialization of Python functions to communicate
them from one process to another. So for a minimal installation of executorlib only these two dependencies are installed:
```
pip install executorlib
```
Alternative to the [Python package manager](https://pypi.org/project/executorlib/), executorlib can also be installed
via the [conda package manager](https://anaconda.org/conda-forge/executorlib):
```
conda install -c conda-forge executorlib
```
A number of features are not available in this minimalistic installation of executorlib, these include the execution of
MPI parallel Python funtions, which requires the [mpi4py](https://mpi4py.readthedocs.io) package, the caching based on
the hierarchical data format (HDF5), which requires the [h5py](https://www.h5py.org) package, the submission to job
schedulers, which requires the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) and the
visualisation of dependencies, which requires a number of visualisation packages.

Alternatively, `executorlib` can directly create job steps in a SLURM allocation using the `srun `command. Still this
always queries the central database of the SLURM job scheduler which can decrease the performance of the job scheduler
and is not recommended.
## MPI support
The submission of MPI parallel Python functions requires the installation of the [mpi4py](https://mpi4py.readthedocs.io)
package. This can be installed in combination with executorlib using either the [Python package manager](https://pypi.org/project/mpi4py/):
```
pip install executorlib[mpi]
```
Or alternatively using the [conda package manager](https://anaconda.org/conda-forge/mpi4py):
```
conda install -c conda-forge executorlib mpi4py
```
Given the C++ bindings included in the [mpi4py](https://mpi4py.readthedocs.io) package it is recommended to use a binary
distribution of [mpi4py](https://mpi4py.readthedocs.io) and only compile it manually when a specific version of MPI is
used. The mpi4py documentation covers the [installation of mpi4py](https://mpi4py.readthedocs.io/en/stable/install.html)
in more detail.

## Caching
While the caching is an optional feature for [Local Mode] and for the distribution of Python functions in a given
allocation of an HPC job scheduler [HPC Allocation Mode], it is required for the submission of individual functions to
an HPC job scheduler [HPC Submission Mode]. This is required as in [HPC Submission Mode] the Python function is stored
on the file system until the requested computing resources become available. The caching is implemented based on the
hierarchical data format (HDF5). The corresponding [h5py](https://www.h5py.org) package can be installed using either
the [Python package manager](https://pypi.org/project/h5py/):
```
pip install executorlib[cache]
```
Or alternatively using the [conda package manager](https://anaconda.org/conda-forge/h5py):
```
conda install -c conda-forge executorlib h5py
```
Again, given the C++ bindings of the [h5py](https://www.h5py.org) package to the HDF5 format, a binary distribution is
recommended. The h5py documentation covers the [installation of h5py](https://docs.h5py.org/en/latest/build.html) in
more detail.

## HPC Submission Mode
[HPC Submission Mode] requires the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to
interface with the job schedulers and [h5py](https://www.h5py.org) package to enable caching, as explained above. Both
can be installed via the [Python package manager](https://pypi.org/project/pysqa/):
```
pip install executorlib[submission]
```
Or alternatively using the [conda package manager](https://anaconda.org/conda-forge/pysqa):
```
conda install -c conda-forge executorlib h5py pysqa
```
Depending on the choice of job scheduler the [pysqa](https://pysqa.readthedocs.io) package might require additional
dependencies, still at least for [SLURM](https://slurm.schedmd.com) no additional requirements are needed. The pysqa
documentation covers the [installation of pysqa](https://pysqa.readthedocs.io/en/latest/installation.html) in more
detail.

## HPC Allocation Mode
For optimal performance in [HPC Allocation Mode] the [flux framework](https://flux-framework.org) is recommended as job
scheduler. Even when the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) or any other
job scheduler is already installed on the HPC cluster. [flux framework](https://flux-framework.org) can be installed as
a secondary job scheduler to leverage [flux framework](https://flux-framework.org) for the distribution of resources
within a given allocation of the primary scheduler.

## executorlib with Flux Framework
The [flux framework](https://flux-framework.org) uses `libhwloc` and `pmi` to understand the hardware it is running on and to booststrap MPI.
`libhwloc` not only assigns CPU cores but also GPUs. This requires `libhwloc` to be compiled with support for GPUs from
your vendor. In the same way the version of `pmi` for your queuing system has to be compatible with the version
Expand Down Expand Up @@ -57,9 +119,10 @@ For the version 5 of openmpi the backend changed to `pmix`, this requires the ad
```
conda install -c conda-forge flux-core flux-sched flux-pmix openmpi>=5 executorlib
```
In addition, the `pmi="pmix"` parameter has to be set for the `executorlib.Executor` to switch to `pmix` as backend.
In addition, the `flux_executor_pmi_mode="pmix"` parameter has to be set for the `executorlib.Executor` to switch to
`pmix` as backend.

## Test Flux Framework
### Test Flux Framework
To validate the installation of flux and confirm the GPUs are correctly recognized, you can start a flux session on the
login node using:
```
Expand Down Expand Up @@ -98,15 +161,17 @@ startup process of flux using:
srun –mpi=pmi2 flux start python <script.py>
```

## Without Flux Framework
It is possible to install `executorlib` without flux, for example for using it on a local workstation or in combination
with the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com). While this is not recommended
in the high performance computing (HPC) context as `executorlib` with `block_allocation=False` is going to create a SLURM
job step for each submitted python function.

In this case `executorlib` can be installed using:
## Visualisation
The visualisation of the dependency graph with the `plot_dependency_graph` parameter requires [pygraphviz](https://pygraphviz.github.io/documentation/stable/).
This can installed via the [Python package manager](https://pypi.org/project/pygraphviz/):
```
conda install -c conda-forge executorlib
pip install executorlib[graph]
```

This also includes workstation installations on Windows and MacOS.
Or alternatively using the [conda package manager](https://anaconda.org/conda-forge/pygraphviz):
```
conda install -c conda-forge executorlib pygraphviz matplotlib networkx ipython
```
Again given the C++ bindings of [pygraphviz](https://pygraphviz.github.io/documentation/stable/) to the graphviz library
it is recommended to install a binary distribution. The pygraphviz documentation covers the [installation of pysqa](https://pygraphviz.github.io/documentation/stable/install.html)
in more detail. Furthermore, [matplotlib](https://matplotlib.org), [networkx](https://networkx.org) and [ipython](https://ipython.readthedocs.io)
are installed as additional requirements for the visualisation.
18 changes: 15 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,27 @@ Documentation = "https://executorlib.readthedocs.io"
Repository = "https://github.com/pyiron/executorlib"

[project.optional-dependencies]
mpi = ["mpi4py==4.0.1"]
hdf = ["h5py==3.12.1"]
cache = ["h5py==3.12.1"]
graph = [
"pygraphviz==1.14",
"matplotlib==3.9.2",
"networkx==3.4.2",
"ipython==8.29.0",
]
queue = ["pysqa==0.2.2"]
mpi = ["mpi4py==4.0.1"]
submission = [
"pysqa==0.2.2",
"h5py==3.12.1",
]
all = [
"mpi4py==4.0.1",
"pysqa==0.2.2",
"h5py==3.12.1",
"pygraphviz==1.14",
"matplotlib==3.9.2",
"networkx==3.4.2",
"ipython==8.29.0",
]

[tool.setuptools.packages.find]
include = ["executorlib*"]
Expand Down

0 comments on commit 8fa5d7b

Please sign in to comment.