Skip to content

Add nextdenovo to nf-core modules. #8492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

elmedjadjirayane
Copy link

@elmedjadjirayane elmedjadjirayane commented May 16, 2025

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the number of threads should be dynamic on ${task.cpus}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nschan whatever is inside this config file is defined by the person running the pipeline, all what is inside must be specific to a run. This one is only a test file, the user must provide this config file to run assembly with nextdenovo. Do you prefer that this file be created dynamically with params?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least the threading option has to be generated dynamically for the commonly used resource scaling for failed jobs to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a simple echo of that variable inside the config file must be sufficient right.
echo “parallel_jobs= ${task.cpus}” ?

tuple val(meta), path(reads)
path config

output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these all outputs that are created by nextdenovo?
All outputs should be emitted in case someone would like to use one of the other outputs in their pipeline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextdenovo outputs, a fasta file containing the assembly and a stat file, of course there are other files but they mostly related to the process itself and not results of the assembly.

task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 20 # number of tasks used to run in parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also be set based on ${task.cpus}?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure there is a commit where I set this dynamically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I left it by mistake but I still add a line to the config containing that info before giving it to nextdenovo.

deltmp = yes
parallel_jobs = 20 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = ont # clr, ont, hifi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be switched easily? If integrated somewhere in a pipeline, do you expect users to craft a config specific for nextdenovo, or could this be created in a separate process e.g NEXTDENOVO_CREATE_CONFIG based on pipeline params and then passed to this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it is that the config file can be very long and with a lot of parameters, for me, the user creates the config file which specific to nextDenovo, the nextdenovo documentation is complete for this input file and can be easily created. If we create the config based in pipeline params, it would be too much in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, and it is nice if users are able to create their own file for this. However, what if they cannot because they do not understand the things that should go in there? I think having a convenience module that simply creates a config file (or some other option of creating that config file in this module) is important, non-technical users are part of the target audience. Those users would probably spend quite some time and end up submitting whatever the default for this tool is (which we could also simply provide). I am not sure if the config file generation needs to be in the module, but at least at the pipeline level a "i dont know and this is all very confusing" option should be available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see. So the module still expects to be given a config file as input but when using that module in a pipeline, it will be created by a simple helper module that would take params and just fills the config file that we would give to nextdenovo. It means that this part won't be handled by nextdenovo's module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that is a decision that should be discussed. I would think if such a module is created anyway, it should be easily findable for others that would like to use your nextdenovo module, so having a nextdenovo/create_config alongside nextdenovo/nextdenovo would make sense to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree.


[correct_option]
read_cutoff = 500
genome_size = 100k # estimated genome size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another important parameter that may need to be set dynamically?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this will depend whether the user gives the config as input or if we create for them. I recommend that the config file be provided as input. The purpose of that file is to prevent users from typing tens of params. Though, we will do as you see fit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants