-
Notifications
You must be signed in to change notification settings - Fork 859
Add nextdenovo to nf-core modules. #8492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
correction_options = -p 15 | ||
|
||
[assemble_option] | ||
minimap2_options_cns = -t 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the number of threads should be dynamic on ${task.cpus}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nschan whatever is inside this config file is defined by the person running the pipeline, all what is inside must be specific to a run. This one is only a test file, the user must provide this config file to run assembly with nextdenovo. Do you prefer that this file be created dynamically with params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least the threading option has to be generated dynamically for the commonly used resource scaling for failed jobs to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a simple echo of that variable inside the config file must be sufficient right.
echo “parallel_jobs= ${task.cpus}”
?
tuple val(meta), path(reads) | ||
path config | ||
|
||
output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these all outputs that are created by nextdenovo?
All outputs should be emitted in case someone would like to use one of the other outputs in their pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextdenovo outputs, a fasta file containing the assembly and a stat file, of course there are other files but they mostly related to the process itself and not results of the assembly.
task = all # all, correct, assemble | ||
rewrite = yes # yes/no | ||
deltmp = yes | ||
parallel_jobs = 20 # number of tasks used to run in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably also be set based on ${task.cpus}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure there is a commit where I set this dynamically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think I left it by mistake but I still add a line to the config containing that info before giving it to nextdenovo.
deltmp = yes | ||
parallel_jobs = 20 # number of tasks used to run in parallel | ||
input_type = raw # raw, corrected | ||
read_type = ont # clr, ont, hifi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this be switched easily? If integrated somewhere in a pipeline, do you expect users to craft a config specific for nextdenovo, or could this be created in a separate process e.g NEXTDENOVO_CREATE_CONFIG
based on pipeline params and then passed to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I see it is that the config file can be very long and with a lot of parameters, for me, the user creates the config file which specific to nextDenovo, the nextdenovo documentation is complete for this input file and can be easily created. If we create the config based in pipeline params, it would be too much in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point, and it is nice if users are able to create their own file for this. However, what if they cannot because they do not understand the things that should go in there? I think having a convenience module that simply creates a config file (or some other option of creating that config file in this module) is important, non-technical users are part of the target audience. Those users would probably spend quite some time and end up submitting whatever the default for this tool is (which we could also simply provide). I am not sure if the config file generation needs to be in the module, but at least at the pipeline level a "i dont know and this is all very confusing" option should be available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see. So the module still expects to be given a config file as input but when using that module in a pipeline, it will be created by a simple helper module that would take params and just fills the config file that we would give to nextdenovo. It means that this part won't be handled by nextdenovo's module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that is a decision that should be discussed. I would think if such a module is created anyway, it should be easily findable for others that would like to use your nextdenovo module, so having a nextdenovo/create_config
alongside nextdenovo/nextdenovo
would make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree.
|
||
[correct_option] | ||
read_cutoff = 500 | ||
genome_size = 100k # estimated genome size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another important parameter that may need to be set dynamically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this will depend whether the user gives the config as input or if we create for them. I recommend that the config file be provided as input. The purpose of that file is to prevent users from typing tens of params. Though, we will do as you see fit!
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda