Skip to content

Commit

Permalink
Merge pull request #31 from kpedro88/pipe_condor
Browse files Browse the repository at this point in the history
Usage of condor within containers
  • Loading branch information
kpedro88 authored May 31, 2024
2 parents c3b113c + a4b648f commit 683d8e4
Show file tree
Hide file tree
Showing 3 changed files with 239 additions and 0 deletions.
109 changes: 109 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,115 @@
# lpc-scripts
scripts of use on the cmslpc cluster

## `pipe_condor.sh`

HTCondor commands are installed on cmslpc interactive nodes, but by default they are not accessible inside containers.

The script [pipe_condor.sh](./pipe_condor.sh) enables calling HTCondor commands *on the host node* from inside a container.

### Usage

In your `.bashrc`:
```bash
source pipe_condor.sh
```

Whenever you edit your `.bashrc`, you should log out and log back in for the changes to take effect.

To check if this line in your `.bashrc` is being executed when you log in, make sure the following command shows some output:
```bash
echo $APPTAINERENV_APPTAINER_ORIG
```

Starting a container (the arguments are necessary for your `.bashrc` to be loaded inside the container):
```bash
cmssw-el7 -- /bin/bash
```

### Details

What happens:
* The `apptainer` command is replaced with a function that will create a set of pipes on the host node before running `apptainer`.
* Inside the container, all executables starting with `condor_` will automatically run on the host node.
* To run other commands on the host node, use `call_host cmd`, where `cmd` is the command you want to run (with any arguments).
* Nested containers are supported (the enable/disable status (see "Options" just below) is inherited from the top-level container and cannot be changed)

Options:
* Before sourcing the script in your `.bashrc`, you can add this line to change the directory where the pipes will be created (the default is `~/nobackup/pipes`):
```bash
export PIPE_CONDOR_DIR=your_dir
```
* If you want to disable this by default and only enable it on the fly, put this line in your `.bashrc`:
```bash
export PIPE_CONDOR_STATUS=${PIPE_CONDOR_STATUS:=disable}
```
Then to enable it temporarily:
```bash
PIPE_CONDOR_STATUS=enable cmssw-el7 ...
```
* Instead, if you have this enabled by default and you want to temporarily disable this for a specific container invocation:
```bash
PIPE_CONDOR_STATUS=disable cmssw-el7 ...
````
Caveats:
* cmslpc autodetection of the correct operating system for jobs is currently based on the host OS. Therefore, if you are submitting jobs in a container with a different OS, you will have to manually specify in your JDL file (the `X` in `condor_submit X`):
```
+DesiredOS = SL7
```
(other possible values are EL8 or EL9)
* if you are running in a non-RHEL container, then you should manually set a different line in your JDL file:
```
+ApptainerImage = "/path/to/your/container"
```
* CMS connect support is planned, but has not been tested yet.
## `bind_condor.sh`
It is also possible to use the HTCondor Python bindings inside a container.
This requires correctly specifying the HTCondor configuration.
A simple approach is provided in [bind_condor.sh](./bind_condor.sh).
### Usage
In your `.bashrc`:
```bash
source bind_condor.sh
```
That's it!
### Setting up bindings
You will also need to have the HTCondor Python bindings installed in your working environment.
For newer CMSSW versions, the installation procedure is simple:
```bash
cmsrel CMSSW_X_Y_Z
cd CMSSW_X_Y_Z/src
cmsenv
scram-venv
cmsenv
pip3 install htcondor
```
(Click [here](http://cms-sw.github.io/venv.html) to learn more about `scram-venv`)
For `CMSSW_10_6_X`, the Run 2 ultra-legacy analysis release that is only available for EL7 operating systems, there are some extra steps:
```bash
cmsrel CMSSW_10_6_30
cd CMSSW_10_6_30/src
cmsenv
scram-venv
cmsenv
pip3 install --upgrade pip
cmsenv
pip3 install --upgrade htcondor==10.3.0
```
In this particular case, it is necessary to upgrade `pip` and install a specific version of the bindings
because the Python version in `CMSSW_10_6_X` is old (Python 3.6.4).
**NOTE**: These recipes only install the bindings for Python3. (Python2 was still the default in `CMSSW_10_6_X`.)
You will need to make sure any scripts using the bindings are compatible with Python3.
## Unit and Integration testing
### Automated
Expand Down
7 changes: 7 additions & 0 deletions bind_condor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

BIND_CONDOR_CONFIG=/etc/condor/config.d/01_cmslpc_interactive
BIND_CONDOR_PY=/usr/local/bin/cmslpc-local-conf.py
export APPTAINER_BIND=${APPTAINER_BIND}${APPTAINER_BIND:+,}${BIND_CONDOR_CONFIG},${BIND_CONDOR_PY}

export APPTAINERENV_CONDOR_CONFIG=/etc/condor/config.d/01_cmslpc_interactive
123 changes: 123 additions & 0 deletions pipe_condor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
#!/bin/bash
# shellcheck disable=SC2155

# default values
# shellcheck disable=SC2076
if [ -z "$PIPE_CONDOR_STATUS" ]; then
export PIPE_CONDOR_STATUS=enable
elif [[ ! " enable disable " =~ " $PIPE_CONDOR_STATUS " ]]; then
echo "Warning: unsupported value $PIPE_CONDOR_STATUS for PIPE_CONDOR_STATUS; disabling"
export PIPE_CONDOR_STATUS=disable
fi
if [ -z "$PIPE_CONDOR_DIR" ]; then
export PIPE_CONDOR_DIR=~/nobackup/pipes
fi
export PIPE_CONDOR_DIR=$(readlink -f "$PIPE_CONDOR_DIR")
mkdir -p "$PIPE_CONDOR_DIR"
# ensure the pipe dir is bound
export APPTAINER_BIND=${APPTAINER_BIND}${APPTAINER_BIND:+,}${PIPE_CONDOR_DIR}

# concept based on https://stackoverflow.com/questions/32163955/how-to-run-shell-script-on-host-from-docker-container

# execute command sent to host pipe; send output to container pipe; store exit code
listenhost(){
# stop when host pipe is removed
while [ -e "$1" ]; do
# "|| true" is necessary to stop "Interrupted system call"
# must be *inside* eval to ensure EOF once command finishes
# now replaced with assignment of exit code to local variable (which also returns true)
tmpexit=0
eval "$(cat "$1") || tmpexit="'$?' >& "$2"
echo "$tmpexit" > "$3"
done
}
export -f listenhost

# creates randomly named pipe and prints the name
makepipe(){
PREFIX="$1"
PIPETMP=${PIPE_CONDOR_DIR}/${PREFIX}_$(uuidgen)
mkfifo "$PIPETMP"
echo "$PIPETMP"
}
export -f makepipe

# to be run on host before launching each apptainer session
startpipe(){
HOSTPIPE=$(makepipe HOST)
CONTPIPE=$(makepipe CONT)
EXITPIPE=$(makepipe EXIT)
# export pipes to apptainer
echo "export APPTAINERENV_HOSTPIPE=$HOSTPIPE; export APPTAINERENV_CONTPIPE=$CONTPIPE; export APPTAINERENV_EXITPIPE=$EXITPIPE"
}
export -f startpipe

# sends function to host, then listens for output, and provides exit code from function
call_host(){
if [ "${FUNCNAME[0]}" = "call_host" ]; then
FUNCTMP=
else
FUNCTMP="${FUNCNAME[0]}"
fi
echo "cd $PWD; $FUNCTMP $*" > "$HOSTPIPE"
cat < "$CONTPIPE"
return "$(cat < "$EXITPIPE")"
}
export -f call_host

# from https://stackoverflow.com/questions/1203583/how-do-i-rename-a-bash-function
copy_function() {
test -n "$(declare -f "$1")" || return
eval "${_/$1/$2}"
eval "export -f $2"
}
export -f copy_function

if [ -z "$APPTAINER_ORIG" ]; then
export APPTAINER_ORIG=$(which apptainer)
fi
# always set this (in case of nested containers)
export APPTAINERENV_APPTAINER_ORIG=$APPTAINER_ORIG

apptainer(){
if [ "$PIPE_CONDOR_STATUS" = "disable" ]; then
(
# shellcheck disable=SC2030
export APPTAINERENV_PIPE_CONDOR_STATUS=disable
$APPTAINER_ORIG "$@"
)
else
# in subshell to contain exports
(
# shellcheck disable=SC2031
export APPTAINERENV_PIPE_CONDOR_STATUS=enable
# only start pipes on host
# i.e. don't create more pipes/listeners for nested containers
if [ -z "$APPTAINER_CONTAINER" ]; then
eval "$(startpipe)"
listenhost "$APPTAINERENV_HOSTPIPE" "$APPTAINERENV_CONTPIPE" "$APPTAINERENV_EXITPIPE" &
LISTENER=$!
fi
# actually run apptainer
$APPTAINER_ORIG "$@"
# avoid dangling cat process after exiting container
# (again, only on host)
if [ -z "$APPTAINER_CONTAINER" ]; then
pkill -P "$LISTENER"
rm -f "$APPTAINERENV_HOSTPIPE" "$APPTAINERENV_CONTPIPE" "$APPTAINERENV_EXITPIPE"
fi
)
fi
}
export -f apptainer

# on host: get list of condor executables
if [ -z "$APPTAINER_CONTAINER" ]; then
export APPTAINERENV_HOSTFNS=$(compgen -c | grep ^condor_)
# in container: replace with call_host versions
elif [ "$PIPE_CONDOR_STATUS" = "enable" ]; then
# shellcheck disable=SC2153
for HOSTFN in $HOSTFNS; do
copy_function call_host "$HOSTFN"
done
fi

0 comments on commit 683d8e4

Please sign in to comment.