Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using cvmfsexec with cvmfs_shrinkwrap #100

Open
ocaisa opened this issue Oct 2, 2024 · 4 comments · May be fixed by #103
Open

Using cvmfsexec with cvmfs_shrinkwrap #100

ocaisa opened this issue Oct 2, 2024 · 4 comments · May be fixed by #103

Comments

@ocaisa
Copy link

ocaisa commented Oct 2, 2024

In an HPC context we are faced with not being able to install CernVM-FS and not having internet access. In EESSI, we've had good experiences using cvmfsexec to showcase to sites that CernVM-FS can already be used at their site even if they don't officially support it. The problem though is that, at many sites, external internet access is usually not available on compute nodes (and sometimes not at all).

To combat not having internet access EESSI would like to use cvmfs_shrinkwrap to package EESSI for individual sites so we can at least implement a complete proof-of-concept. From what I understand, we can already mount the resulting squashfs image inside apptainer (which most sites support), but running our applications (in particular MPI applications) inside a container requires an additional set of workarounds.

To come to my question, does (or can) cvmfsexec support the mounting of something that has been cvmfs_shrinkwrapped? If not, can you point us in the direction of where the right place to look is to see if we can add support for this?

@DrDaveD
Copy link
Collaborator

DrDaveD commented Oct 3, 2024

First, let me say that we have often found on HPCs that do not have any networking on the compute nodes that we can still use cvmfs by running squid on a login node or some such which has connectivity to both the compute nodes and the internet. I would highly encourage you to pursue that route first.

Second, if you must use shrinkwrap, I'm not sure it makes sense for cvmfsexec to be the interface to that. apptainer is the most flexible way to do that. On the other hand, I guess maybe what you like about cvmfsexec is the fact that it adds a mountpoint (/cvmfs) to the mount namespace while keeping most of the files on the host unchanged. That could be done with something simpler than cvmfsexec, although its true that managing the unshares are a bit tricky and cvmfsexec handles that. Would a tool that handles something just like the apptainer --bind options without starting a container be what you want? It would just start a subshell in that environment?

@ocaisa
Copy link
Author

ocaisa commented Oct 9, 2024

First, let me say that we have often found on HPCs that do not have any networking on the compute nodes that we can still use cvmfs by running squid on a login node or some such which has connectivity to both the compute nodes and the internet. I would highly encourage you to pursue that route first.

This can indeed work on some sites, but there are a number of EuroHPC sites where even the login nodes are very restricted in terms of connectivity. We figured that using shrinkwrap could give us a one-size-fits-all approach. We are hoping that the benefit of shrinkwrap is we get something that allows us to show off what the experience of using CernVM-FS/EESSI could be like to sites that are skeptical.

Second, if you must use shrinkwrap, I'm not sure it makes sense for cvmfsexec to be the interface to that. apptainer is the most flexible way to do that. On the other hand, I guess maybe what you like about cvmfsexec is the fact that it adds a mountpoint (/cvmfs) to the mount namespace while keeping most of the files on the host unchanged. That could be done with something simpler than cvmfsexec, although its true that managing the unshares are a bit tricky and cvmfsexec handles that. Would a tool that handles something just like the apptainer --bind options without starting a container be what you want? It would just start a subshell in that environment?

Indeed, it is the adding the mountpoint /cvmfs that is most interesting. This allows us to script around starting MPI processes which would be more difficult to get right with a container. From a recent blog post about this, we currently need two scripts to run MPI processes. The first is to in general execute commands in an environment where the repo is mounted, ~/bin/cvmfsexec_eessi.sh:

#!/bin/bash
if [ -d /cvmfs/software.eessi.io ]; then
    # run command directly, EESSI CernVM-FS repository is already mounted
    "$@"
else
    # run command via in subshell where EESSI CernVM-FS repository is mounted,
    # via cvmfsexec which is set up in a unique temporary directory
    orig_workdir=$(pwd)
    mkdir -p /tmp/$USER
    tmpdir=$(mktemp -p /tmp/$USER -d)
    cd $tmpdir
    git clone https://github.com/cvmfs/cvmfsexec.git > $tmpdir/git_clone.out 2>&1
    cd cvmfsexec
    ./makedist default > $tmpdir/cvmfsexec_makedist.out 2>&1
    cd $orig_workdir
    $tmpdir/cvmfsexec/cvmfsexec software.eessi.io -- "$@"
    # cleanup
    rm -rf $tmpdir
fi

and the second is to be able to start the MPI processes using the MPI implementation shipped in EESSI, since we use OpenMPI this means orted, and so we create a script in ~/bin/orted to make sure the command is passed through:

#!/bin/bash

# first remove path to this orted wrapper from $PATH, to avoid infinite loop
orted_wrapper_dir=$(dirname $0)
export PATH=$(echo $PATH | tr ':' '\n' | grep -v $orted_wrapper_dir | tr '\n' ':')

~/bin/cvmfsexec_eessi.sh orted "$@"

Putting these two together, we can then write reasonably normal job scripts, e.g.:

#!/bin/bash
#SBATCH --ntasks=96
#SBATCH --ntasks-per-node=48
#SBATCH --cpus-per-task=1
#SBATCH --time=5:0:0
#SBATCH --partition normal-arm
#SBATCH --export=None
#SBATCH --mem=30000M
~/bin/cvmfsexec_eessi.sh << EOF
export EESSI_SOFTWARE_SUBDIR_OVERRIDE=aarch64/a64fx
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load ESPResSo/4.2.2-foss-2023a
export SLURM_EXPORT_ENV=HOME,PATH,LD_LIBRARY_PATH,PYTHONPATH
mpirun -np 96 python3 lj.py
EOF

Having said all of that, that doesn't mean that what we are doing is the best approach, you may have a better suggestion. We're working in a space where we are not experts, we're just trying to get to something that allows us to show off what we have and works anywhere.

@DrDaveD
Copy link
Collaborator

DrDaveD commented Oct 10, 2024

I'm thinking I'll make a new tool called bindexec which heavily borrows from cvmfsexec and accepts pairs of paths to bind from and to.

@boegel
Copy link

boegel commented Oct 10, 2024

Small detail w.r.t. the orted hack: that works for OpenMPI 4.x (and earlier, I guess) only, for OpenMPI 5.x we'd need a different (though very similar) hack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants