Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NXF_SINGULARITY_LIBRARYDIR in nf-core download #3019

Open
muffato opened this issue Jun 7, 2024 · 4 comments · May be fixed by #3163
Open

Support NXF_SINGULARITY_LIBRARYDIR in nf-core download #3019

muffato opened this issue Jun 7, 2024 · 4 comments · May be fixed by #3163
Assignees
Labels
download nf-core download enhancement

Comments

@muffato
Copy link
Member

muffato commented Jun 7, 2024

Description of feature

Hello,

We've found that nf-core download doesn't know how to use $NXF_SINGULARITY_LIBRARYDIR(it only knows $NXF_SINGULARITY_CACHEDIR). Although the following should do the trick (I haven't tested it)

env NXF_SINGULARITY_CACHEDIR=$NXF_SINGULARITY_LIBRARYDIR nf-core download ...

it could lead to some problems with the option --container-cache-utilisation amend for instance.

Ideally, nf-core download should know about both, that one is ro and the other one is rw, etc.

@muffato
Copy link
Member Author

muffato commented Jun 7, 2024

I propose to modify nf_core/download.py to:

  1. First check $NXF_SINGULARITY_LIBRARYDIR and use it, while explicitly forbidding --container-cache-utilisation amend. No automatic creation of $NXF_SINGULARITY_LIBRARYDIR would be considered.
  2. Otherwise use $NXF_SINGULARITY_CACHEDIR as it does now.

--container-cache-index would be expected to inform whichever is used.

Does that sound acceptable ?

@edmundmiller edmundmiller added the download nf-core download label Jun 13, 2024
@MatthiasZepper
Copy link
Member

MatthiasZepper commented Jul 18, 2024

I see no fundamental reason not to support it, but I have not understood fully what would be needed to make that happen.

First check $NXF_SINGULARITY_LIBRARYDIR and use it

  • What does using it mean? Instead of downloading a Singularity image via http://, it will be pulled from a cache already present on the system, where nf-core download runs? Thus, the command will look like singularity pull $NXF_SINGULARITY_CACHEDIR/someimage.img library://$NXF_SINGULARITY_LIBRARYDIR/someimage:tag ?

  • How is such a Singularity image encoded in a module? Does the Singularity Library Path always 1:1 correspond to the Docker URI? As of now, all modules hardcode the Singularity image to the Galaxy Depot:

    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/bwa:0.7.18--he4a0461_0' :
        'biocontainers/bwa:0.7.18--he4a0461_0' }"
  • How would the priority look like?
  1. Depot Galaxy 2. Singularity Library 3. Docker->Singularity conversion?
  2. Singularity Library 2. Depot Galaxy 3. Docker->Singularity conversion?

--container-cache-index would be expected to inform whichever is used.

  • Why would that be relevant? Is there a performance benefit to having a working Library on the execution system?

@muffato
Copy link
Member Author

muffato commented Jul 18, 2024

NXF_SINGULARITY_LIBRARYDIR is the read-only version of NXF_SINGULARITY_CACHEDIR. Nothing needs to change in the way modules, Singularity URLs, etc, are written. Internally, files can be copied from the library the same way they are from the cache: with a regular file copy.

I don't remember what I meant with "--container-cache-index would be expected to inform whichever is used." but to be honest, I don't fully understand what that option is for.

@MatthiasZepper
Copy link
Member

MatthiasZepper commented Jul 19, 2024

In that case, I am happy if you open a PR (I will be on holiday from July 25 to August 14 though, so someone else will likely have to review). The respective logic is here:

tools/nf_core/download.py

Lines 1085 to 1143 in 930ece5

containers_exist: List[str] = []
containers_cache: List[Tuple[str, str, Optional[str]]] = []
containers_download: List[Tuple[str, str, Optional[str]]] = []
containers_pull: List[Tuple[str, str, Optional[str]]] = []
for container in self.containers:
# Fetch the output and cached filenames for this container
out_path, cache_path = self.singularity_image_filenames(container)
# Check that the directories exist
out_path_dir = os.path.dirname(out_path)
if not os.path.isdir(out_path_dir):
log.debug(f"Output directory not found, creating: {out_path_dir}")
os.makedirs(out_path_dir)
if cache_path:
cache_path_dir = os.path.dirname(cache_path)
if not os.path.isdir(cache_path_dir):
log.debug(f"Cache directory not found, creating: {cache_path_dir}")
os.makedirs(cache_path_dir)
# We already have the target file in place or in remote cache, return
if os.path.exists(out_path) or os.path.basename(out_path) in self.containers_remote:
containers_exist.append(container)
continue
# We have a copy of this in the NXF_SINGULARITY_CACHE dir
if cache_path and os.path.exists(cache_path):
containers_cache.append((container, out_path, cache_path))
continue
# Direct download within Python
if container.startswith("http"):
containers_download.append((container, out_path, cache_path))
continue
# Pull using singularity
containers_pull.append((container, out_path, cache_path))
# Exit if we need to pull images and Singularity is not installed
if len(containers_pull) > 0:
if not (shutil.which("singularity") or shutil.which("apptainer")):
raise OSError(
"Singularity/Apptainer is needed to pull images, but it is not installed or not in $PATH"
)
if containers_exist:
if self.container_cache_index is not None:
log.info(
f"{len(containers_exist)} containers are already cached remotely and won't be retrieved."
)
# Go through each method of fetching containers in order
for container in containers_exist:
progress.update(task, description="Image file exists at destination")
progress.update(task, advance=1)
if containers_cache:
for container in containers_cache:
progress.update(task, description="Copying singularity images from cache")
self.singularity_copy_cache_image(*container)
progress.update(task, advance=1)

This is where containers are categorized and the self.singularity_copy_cache_image(*container) can be modified to support both then. Either explicit or with try/catch.

I don't remember what I meant with "--container-cache-index would be expected to inform whichever is used." but to be honest, I don't fully understand what that option is for.

We use nf-core download to download several pipelines with multiple revisions each for an HPC that has no internet connection. So I need a way to tell nf-core download which images are already cached on that other computer.

@muffato muffato self-assigned this Sep 7, 2024
@muffato muffato linked a pull request Sep 7, 2024 that will close this issue
4 tasks
@muffato muffato linked a pull request Sep 7, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
download nf-core download enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants