-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: slurm: pulling Singularity image doesn’t work if SLURM node has no internet #919
Comments
Thanks for opening this issue @wtraylor. This helps us to understand different Slurm environments better. We can change to building images on the login node by default and give an option to use the compute node (if singularity is not present on the login node). What do you think @wtraylor and @ivotron? |
we could also make use of the |
My hunch is that it is more common to prepare everything for an experiment on the login node, while the actual execution happens on the computing nodes. For example, I would compile my application on the login node. So from my perspective it would be more intuitive to also prepare the container image on the login node. But I am not a very seasoned HPC user. |
another pattern I've heard from people working in HPC scenarios is that they don't have network connectivity at all to the outside world from the slurm cluster, not even on the login node, so they need to scp images to the login node and then run from there. In these scenarios, doing So for this issue, do we agree that we can do this:
given what @wtraylor mentioned above, this would address this issue, right? |
|
sorry, I didn't explain well. What I had in mind was running the entire workflow first on the frontend node once, so it builds containers, but in single-node mode (i.e. just doing Also, since the folder one uses on the login node is shared with all the nodes, building multiple times on each node is redundant. |
Yeah, building redundantly on multiple nodes is wrong. We can make changes to have 2 modes controlled through the config : i) build on login node (like in local without |
yeah, that sounds good. I'd go further and say to not implement ii) until users request it |
On the computer cluster I am using (Goethe HLR), only the login node has internet access, not the computing nodes. Therefore, building or pulling a Singularity image on the computing node does not work.
I run the workflow like this:
The details of the workflow are not relevant, but what’s important is that Popper (version
2020.09.01
) now tries to executesingularity pull
through SLURM, usingsrun
. That happens in src/popper/runner_slurm.py. Apparently, this behavior was introduced in pull request #912.I don’t have a good suggestion. It seems like some people want their Singularity images built on the computing node, and others (like me) on the login node.
The text was updated successfully, but these errors were encountered: