Skip to content

Simple python package for mapping a python function across a slurm cluster

License

Notifications You must be signed in to change notification settings

Sola85/slurm-map

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

slurm-map

Very simple parallel for-loops based on slurm!

Usage

Usage is very simple, similar to the multiprocessing library and pythons built-in map function.

This package provides a map function that maps a function over an iterable. However, unlike the built-in map or multiprocessings map, every element of the iterable is handled by a separate slurm job. This allows for easy parallelization of a for-loop over an entire compute cluster!

Example:

import slurm_map

def do_something(x):
    return x**2

if __name__ == '__main__':
    #results = [do_something(x) for x in data]
    #print(results)

    results = slurm_map.map(do_something, [1, 2, 3, 4, 5, 6, 7])
    print(results)

Simply executing this script (e.g. on a login-node) will submit all necessary jobs and collect the results!

Slurm arguments and pre-run commands can be passed as a parameter:

results = slurm_map.map(do_something, data, 
                        slurm_args="--mem=8G --time=3600", 
                        extra_commands=["hostname", "module load python3"])
    

Features:

  • No more batch scripts!
  • Forwarding of stdout and stderr, in other words print statements work
  • Interruptable! Meaning after submission, the above python script can be interrupted and the jobs will continue running. Restarting the same python script will not submit new jobs but will continue to wait for the old ones to finish! Eg:
    > python submit.py
    Submitted batch job 67196
    Submitted batch job 67197
    Submitted batch job 67198
    Waiting for results from jobs [67097, 67098, 67198]
    ^C Keyboard interrupt
    ...
    
    > python submit.py
    Jobs were already started previously. Reusing those results.
    Waiting for results from jobs [67097, 67098, 67198]
    
    This module is therefore compatible with compute clusters that don't allow executing long tasks on the login nodes.
  • Keyword-argument-based caching:
    slurm_map.map(f, data, cleanup=False, kwargs={'a': 1})
    The above call will store its results, so a second invocation does not trigger more jobs to be lauched. More jobs will only be triggered if data or kwargs change. If data changes, only new elements will be computed, i.e. if a previous run computed map(f, data) and a new call to map(f, data+new_data) is made, then only f(new_data) is actually computed, the rest is taken from cache.

Job management

Two utilities are provided to manage jobs that were started using slurm_map:

  • Cancel all running jobs that belong to a slurm_map call on a specific function

    python -m slurm_map cancel <function_name>
  • Delete all files created by slurm_map for a specific slurm_map call:

    python -m slurm_map cleanup <function_name>

    slurm_map stores its files in ./.slurm_map/<function_name>.

Installation

Clone the repository, cd into it and run pip install .


Disclaimer

This tool only aims to manage slurm jobs to the extent needed to provide the base-functionality of the above map function. If you are looking for a general-purpose python interface for slurm, you might want to look at pyslurm, simple_slurm, slurm-pipline, slurm tools

About

Simple python package for mapping a python function across a slurm cluster

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages