Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconfigure how containers are used in nextflow pipelines #442

Merged
merged 29 commits into from
Oct 2, 2024

Conversation

ens-LCampbell
Copy link
Member

@ens-LCampbell ens-LCampbell commented Sep 17, 2024

The basis of this PR is to integrate the use of the newly developed genomio-container (Docker) to be used in:

  • genome_prepare
  • dumper_pipeline
  • additional_seq_prepare

List of changes:

  • Removal of labels inside modules for container use, but specifically defining which process names are associated with which container. This allows the option to execute the pipeline entirely within a personal venv, OR via a combination of local + container(s).
  • Originally I had added a new binary flag 'genomio-container' param to pipelines indicates users opt to run process related to ensmebl python library entirely within the genomio container. But this is now longer the case, instead use of containers for genomio python library can be activated via a new profile 'genomio_prod' which imports a config for container definition at the withName process directive.
  • Remove all container related labels from modules, except those for modules requiring containers at all times.
  • Remove all container related process directive i.e. withLabel, withName from the main nextflow.config file
  • Removed singularity process flags from main nextflow.config file. Defined now at the workflow config level for each pipeline.
  • The changes allow for mixture of containers and local env (as is the case already as some processes require containers to run at all [GFF3_VALIDATION, NCBI_ASM_STATS).
  • Reorder subworkflows include statements to be alphabetical (first by module name, then by 'as' MODULE), to mirror declaration in withName configs.
  • Param verification at pipeline initialisation now shows which processes use containers.

Addition:

  • Dockerfile for generation of genomio container.
  • cicd configuration for container generation, triggering when main incorporates new changes via PR merge.

@ens-LCampbell ens-LCampbell marked this pull request as ready for review September 26, 2024 09:38
Copy link
Contributor

@JAlvarezJarreta JAlvarezJarreta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a neat update, lots to learn 🤓

cicd/gitlab/dot.gitlab-ci.yml Show resolved Hide resolved
cicd/gitlab/dot.gitlab-ci.yml Outdated Show resolved Hide resolved
cicd/gitlab/parts/dockerbuild.genomio.gitlab-ci.yml Outdated Show resolved Hide resolved
containers/docker/genomio/Dockerfile Outdated Show resolved Hide resolved
containers/docker/genomio/Dockerfile Outdated Show resolved Hide resolved
pipelines/nextflow/modules/database/db_factory.nf Outdated Show resolved Hide resolved
pipelines/nextflow/modules/gff3/gff3_validation.nf Outdated Show resolved Hide resolved
Copy link
Contributor

@Dishalodha Dishalodha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering if we should log the usage of -profile somewhere, maybe we can add it in nf-schema file so it shows as an optional parameter while running the pipeline?

ens-LCampbell and others added 6 commits October 1, 2024 10:59
Expand versioning regex for 10+ versions

Co-authored-by: J. Alvarez-Jarreta <[email protected]>
Co-authored-by: J. Alvarez-Jarreta <[email protected]>
Co-authored-by: J. Alvarez-Jarreta <[email protected]>
Co-authored-by: J. Alvarez-Jarreta <[email protected]>
Co-authored-by: Disha Lodha <[email protected]>
Co-authored-by: Disha Lodha <[email protected]>
@ens-LCampbell
Copy link
Member Author

I was also wondering if we should log the usage of -profile somewhere, maybe we can add it in nf-schema file so it shows as an optional parameter while running the pipeline?

Nextflow has a command to show the config process directives and profiles etc. In the pipeline workflow folder of each pipeline, if you do nextflow config -a it will show you the available profiles.

cd pipelines/nextflow/workflows/genome_prepare;
nextflow config -a

Copy link
Contributor

@JAlvarezJarreta JAlvarezJarreta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny bug to be addressed. Since there is no Python code involvement, I'm happy to merge with pylint failing (I believe this has been fixed in main already). However, the new sub-pipeline is also failing, and that may need to be fixed.

cicd/gitlab/dot.gitlab-ci.yml Outdated Show resolved Hide resolved
@JAlvarezJarreta JAlvarezJarreta merged commit 6d2ceea into main Oct 2, 2024
1 check failed
@JAlvarezJarreta JAlvarezJarreta deleted the lcampbell/docker_redo branch October 2, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants