Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/ensemblvep download #61

Merged
merged 1 commit into from
Jan 29, 2025
Merged

Conversation

georgette-femerling
Copy link
Contributor

@georgette-femerling georgette-femerling commented Jan 27, 2025

Description

Added the nf-core module to download the ensemblvep cache.

Added parameters:

  • download_cache
  • outdir_cache

Tests

  • Tested in Juno.
    Correctly downloads cache data when download_cache is set to true. If it is downloaded, vep_cache takes the value of the output. Does not interfere with any other process.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • [x ] Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • [x ] Reference Data Documentation in docs/reference_data.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

dmorais
dmorais previously approved these changes Jan 28, 2025
Copy link
Contributor

@dmorais dmorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality allows for the controlled caching of VEP files.
The vep cache can be dowloaded as a process and versioning can be more tightly controlled

@@ -71,6 +71,17 @@ process {
ext.args = { "--output-filename=${meta.id}.exomiser" }
}

withName: ENSEMBLVEP_DOWNLOAD {
container = 'quay.io/biocontainers/ensembl-vep:110.0--pl5321h2a3209d_0' // Trying with the conda container
Copy link
Contributor

@LysianeBouchard LysianeBouchard Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This container seems to be based on vep version 110. Shouldn't we be using version 111?

If it were, let's say, version 113 downloading the data for version 111, I wouldn't be too concerned, as newer versions typically support older data. However, the reverse might not be true and could potentially cause issues. Ideally, we should use the same version that we are currently using to avoid any compatibility problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woups. Definitely I agree with you. This was a mistake on my side when copying the container link. I'll put the PR back into draft, change this and verify it does not change anything.

@@ -76,7 +76,7 @@ See [docs/output.md](docs/output.md) for more details about pipeline outputs.

## Credits

Ferlab-Ste-Justine/Post-processing-Pipeline was originally written by Damien Geneste, David Morais, Felix-Antoine Le Sieur, Jeremy Costanza, Lysiane Bouchard.
Ferlab-Ste-Justine/Post-processing-Pipeline was originally written by Damien Geneste, David Morais, Felix-Antoine Le Sieur, Jeremy Costanza, Lysiane Bouchard, Georgette Femerling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

| `vep_cache_version` | _Optional_ | Version of the vep cache. e.g. `111` |
| `vep_genome` | _Optional_ | Genome assembly version of the vep cache |
| `download_cache` | _Optional_ | Download vep cache (default: false) |
| `outdir_cache` | _Optional_ | Path to write the cache to. If not declared, cache will be written to `<outputdir>/cache/` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to prefix the download_cache and outdir_cache parameters with "vep" to clarify that these are vep parameters. I leave the decision to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the missing vep parameters!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed. Will do that

### `Changed`
- [#59](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/59) Add ".snv" to VEP output filename prefix

## v2.4.1-dev - 16/01/2025
Copy link
Contributor

@LysianeBouchard LysianeBouchard Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great initiative! I wanted to mention this changelog issue before but didn't want to overwhelm you. I hope that you did not suffered too much 😄 .

Previously, the version in the changelog matched the last tag created + dev (e.g., 2.4.1-dev followed tag 2.4.1). To see changes from 2.4.1 compared to 2.4.0, you had to look at 2.4.0-dev, which was confusing.

I noticed this issue a while ago but I was hesitant to change it since we've been using this system for a while. I agree that using the "unreleased" convention, as you did, is clearer.

I support this change. Make sure, however, that it doesn't make the linter crash. I.e. you should verify that you can successfully run the nf-core lint command. It's possible that it will add an extra warning that we will want to ignore (you might have to edit file .nf-core.yml to do so).

If possible, I would tend to rename the section v2.4.1-dev to v2.4.1, as, if I am not mistaking, it is now accurately reflects changes from v2.4.0 to v2.4.1.

If we go for this change, it would be a good idea to mention it to Felix-Antoine as well if you did not already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! makes sense. I was a bit hesitant on how the changes were logged and assumed that 2.4.0 dev listed the changes made in the 2.4.0 release instead of the changes made starting from 2.4.0. I can adapt to any system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should switch to the convention you are introducing here. It's way more standard I think and intuitive. I actually wanted to propose this change to the team eventually. I'm glad that you introduced it right away.

I would just rename the header "v2.4.1-dev - 16/01/2025" header to "v2.4.1 - 16/01/2025". It's really just this header. I would not touch the others. We can add a note indicating that we change our way of maintaining the changelog from 2.4.1.

nextflow.config Outdated
@@ -92,7 +96,7 @@ params {
includeConfig 'conf/base.config'

// Load nf-core custom profiles from different Institutions
try {
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but you might want to remove the uncessary space after the { character.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going through the errors, but realized better to have another PR for that. So I may have by mistake inserted a space there.

take:
ch_samplesheet

main:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with moving the constant definitions in the main. I can't remember the reason why they were placed before the main, however. You might want to double check with @FelixAntoineLeSieur to make sure it is ok, i.e. no potential harm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to nf, if they are outside, it does not follow the convention of declaring a workflow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably not standard to put them outside indeed and you had a good thinking in seeking to standardize the workflow. There was a reason, however, I think, for putting them outside ... but I can't remember what it was. @FelixAntoineLeSieur might remember.

Don't worry too much about that. If you tested on juno and it runs smooth, I guess this is not causing problems.

ch_samplesheet

main:
def HOMO_SAPIENS_SPECIES = "homo_sapiens"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind to move this here, but I recommend to use lower case instead upper case, as we typically use upper case for global variables. This one is not global.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maintained the same naming convention, just moved the line.
Defining variables outside of workflows is not conventional in nextflow so it throws an error in format (not in running).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also only used within the workflow, so why would it be a global?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely don't mind if it is a local variable to the workflow. It is used only by the workflow so far indeed. We created a constant to encourage reuse and avoid hard coding the info at multiple places. But hey, I don't think it will ever be used outside the workflow, so your change is really fine.

My comment was just about using upper case for the variable name. I would tend to rename it homo_sapiens_species instead HOMO_SAPIENS_SPECIES. You don't have to; it's a suggestion.

@georgette-femerling georgette-femerling linked an issue Jan 29, 2025 that may be closed by this pull request
@georgette-femerling georgette-femerling marked this pull request as draft January 29, 2025 16:27
@georgette-femerling georgette-femerling marked this pull request as ready for review January 29, 2025 20:22
Copy link
Contributor

@LysianeBouchard LysianeBouchard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work :)

@georgette-femerling georgette-femerling merged commit 160cd5b into main Jan 29, 2025
3 checks passed
@georgette-femerling georgette-femerling deleted the feat/ensemblvep-download branch January 29, 2025 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nf-core module Adds a nf-core module
Development

Successfully merging this pull request may close these issues.

Check VEP version
3 participants