Skip to content

Latest commit

 

History

History
175 lines (130 loc) · 4.29 KB

README.md

File metadata and controls

175 lines (130 loc) · 4.29 KB

Python Job Submit plugin for Slurm

Fixes based on original repo

  • Added support for Slurm-22 (and possibly onward)
  • Added support for Python-3.9 (and possibly onward)
  • Fixed some CPython ref counting
  • (Experimental) Slurm / Python version detection && checkout slurm source
  • Only job_submit is implemented, job_modify is not passed to python

Usage

Job script name is $SLURM_CONF_DIR/job_submit.py

To activate the plugin, edit slurm.conf and restart slurmctld

JobSubmitPlugins=python

Python plugin examples

Here is an example that logs the user's job submit description as yaml, and set the job partition to debug.

import yaml

SLURM_SUCCESS = 0
SLURM_ERROR = -1

def job_submit(job_desc, submit_uid):
  # All values shown can be overwritten if slurm allows it
  with open(f"/tmp/{job_desc['job_id']}.yaml", "rw") as f:
    yaml.dump(job_desc, f)

  # To edit job_desc, overwrite the fields in original object
  job_desc['partition'] = "debug"

  # Return SLURM_SUCCESS to accept the job
  return SLURM_SUCCESS

  # To reject the job, return
  # - SLURM_ERROR, or
  # - any value != 0
  #return SLURM_ERROR
  #return 12345
  #return "FAILED"

Interacting with slurm

Currently, only the following functions are provided by import slurm

def user_msg(msg: str) -> None:
  """log to user's stdout"""
  pass

def info(msg: str) -> None:
  """log to slurmctld as info"""
  pass

def error(msg: str) -> None:
  """log to slurmctld as error"""
  pass

Example

import slurm
import yaml

def job_submit(job_desc, submit_uid):
  # dump the yaml to user's pty
  slurm.user_msg(yaml.dump(job_desc))
  return 0

Remark

Identical to the LUA plugin, slurm global lock is held in the whole duration of the plugin.

  • Do not subprocess slurm related action that requires the slurm global lock
  • The plugin should end as quick as possible.

Installing

Dependencies

  • python3-devel
  • devel packages of the slurm build
  • slurm source
  • slurm config.h produced by ./configure

The plugin depends on header config.h generated by ./configure during slurm build.

One can obtain consistent config.h with slurm by compiling slurm from source together with this plugin. Please refer to slurm installation guide.

For those who prefer slurm on OS repo, e.g. epel, one can inspect the slurm.src source rpm for the dependencies and configure flags used during build. See section below.

Prepare config.h

Slurm in epel did not retain config.h. One can download and inspect the build spec similar to the following method

sudo dnf --repo epel-source --destdir ./slurm-repo-src --source download slurm.src
cd ./slurm-repo-src
rpm2cpio ./slurm-22.05.9-1.el9.src.rpm | cpio -idmv
less slurm.spec

For reference, as of the time of commit, slurm-devel-22.05.9-1.el9.x86_64 @elrepo shows

Rocky 9.3

Build-time dependencies

# deps
autoconf automake dbus-devel desktop-file-utils gcc make man2html perl-devel perl-ExtUtils-MakeMaker perl-interpreter perl-generators perl-podlators pkgconf check-devel lua-devel python3 systemd freeipmi-devel gtk2-devel hdf5-devel hwloc-devel libcurl-devel libssh2-devel lz4-devel mariadb-devel munge-devel numactl-devel pam-devel pmix-devel rdma-core-devel readline-devel rrdtool-devel zlib-devel http-parser-devel json-c-devel libjwt-devel libyaml-devel

Configure flags

# fyi: in slurm.spec, ucx is defined 
# - only for these arch, and
# - only for Fedora
#
# follow arch-inclusions for ucx
# %ifarch aarch64 ppc64le x86_64
# %bcond_without ucx
# %else
# %bcond_with ucx
# %endif

configure \
  --prefix=%{_prefix} \
  --sysconfdir=%{_sysconfdir}/%{name} \
  --with-pam_dir=%{_libdir}/security \
%if 0%{?fedora} && %{with ucx}
  --with-ucx=%{_prefix} \
%endif
  --enable-pam \
  --enable-really-no-cray \
  --enable-shared \
  --enable-x11 \
  --disable-static \
  --disable-debug \
  --disable-salloc-background \
  --disable-partial_attach \
  --with-oneapi=no \
  --with-shared-libslurm \
  --without-rpath

Fedora >= 34

# Deps: additional to above
ucx-devel
# Configure: add `--with-ucx`

Compile and install

# You may skip this if the defaults work for you
nano Makefile 

# The first make checkout slurm source
make

make && sudo make install

For more details please read the Makefile and source