-
Notifications
You must be signed in to change notification settings - Fork 859
Add nextdenovo to nf-core modules. #8492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
967dddf
bce0016
f694bcd
cbf83d2
cc68e7a
0c78049
490ab68
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
dependencies: | ||
- bioconda::nextdenovo=2.5.2 | ||
- pip=23.3.1 | ||
- pip: | ||
- paralleltask==0.1.1 | ||
- python=3.8 | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
process NEXTDENOVO { | ||
tag "$meta.id" | ||
label 'process_high' | ||
|
||
conda "${moduleDir}/environment.yml" | ||
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
'https://depot.galaxyproject.org/singularity/nextdenovo:2.5.2--py310h0ceaa1d_6' : | ||
'biocontainers/nextdenovo:2.5.2--py310h0ceaa1d_6' }" | ||
|
||
input: | ||
tuple val(meta), path(reads) | ||
path config | ||
|
||
output: | ||
tuple val(meta), path("*.fasta.gz"), emit: fasta | ||
tuple val(meta), path("*.stat") , emit: stat | ||
path "versions.yml" , emit: versions | ||
|
||
when: | ||
task.ext.when == null || task.ext.when | ||
|
||
script: | ||
def args = task.ext.args ?: '' | ||
def prefix = task.ext.prefix ?: "${meta.id}" | ||
""" | ||
echo "parallel_jobs = ${task.cpus}" >> conf.cfg | ||
cat $config >> conf.cfg | ||
echo ${reads} > input.fofn | ||
nextDenovo \\ | ||
conf.cfg \\ | ||
input.fofn \\ | ||
|
||
gzip -c ./03.ctg_graph/nd.asm.fasta > ${prefix}.assembly.fasta.gz | ||
|
||
mv ./03.ctg_graph/nd.asm.fasta.stat ${prefix}.assembly_info.stat | ||
|
||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
\$( nextDenovo --version ) | ||
END_VERSIONS | ||
""" | ||
|
||
stub: | ||
def prefix = task.ext.prefix ?: "${meta.id}" | ||
""" | ||
echo stub | gzip -c > ${prefix}.assembly.fasta.gz | ||
echo contig_1 > ${prefix}.assembly_info.stat | ||
|
||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
\$( nextDenovo --version ) | ||
END_VERSIONS | ||
""" | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
name: "nextdenovo" | ||
description: NextDenovo is a string graph-based de novo assembler for long reads (CLR, | ||
HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu (no correction | ||
step for PacBio HiFi reads), but requires significantly less computing resources | ||
and storages | ||
keywords: | ||
- assembly | ||
- genome | ||
- de novo | ||
- genome assembler | ||
- single molecule | ||
tools: | ||
- "nextdenovo": | ||
description: "NextDenovo is a string graph-based de novo assembler for long reads | ||
(CLR, HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu | ||
(no correction step for PacBio HiFi reads), but requires significantly less | ||
computing resources and storages" | ||
homepage: "https://github.com/fenderglass/Flye" | ||
documentation: "https://nextdenovo.readthedocs.io/en/latest/" | ||
tool_dev_url: "https://github.com/Nextomics/NextDenovo" | ||
doi: "10.1186/s13059-024-03252-4" | ||
licence: ["GPL-3.0-license"] | ||
identifier: biotools:nextdenovo | ||
input: | ||
- - meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. [ id:'test' ] | ||
- reads: | ||
type: file | ||
description: Input reads from Oxford Nanopore or PacBio data in FASTA/FASTQ | ||
format. | ||
pattern: "*.{fasta,fastq,fasta.gz,fastq.gz,fa,fq,fa.gz,fq.gz}" | ||
- - config: | ||
type: file | ||
description: Input config file for nextDenovo | ||
pattern: "*" | ||
output: | ||
- fasta: | ||
- meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. [ id:'test' ] | ||
- "*.fasta.gz": | ||
type: file | ||
description: Assembled FASTA file | ||
pattern: "*.fasta.gz" | ||
- stat: | ||
- meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. [ id:'test' ] | ||
- "*.stat": | ||
type: file | ||
description: Extra information and statistics about resulting contigs | ||
pattern: "*.stat" | ||
- versions: | ||
- versions.yml: | ||
type: file | ||
description: File containing software versions | ||
pattern: "versions.yml" | ||
authors: | ||
- "@elmedjadjirayane" | ||
maintainers: | ||
- "@elmedjadjirayane" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
[General] | ||
job_type = local # local, slurm, sge, pbs, lsf | ||
job_prefix = nextDenovo | ||
task = all # all, correct, assemble | ||
rewrite = yes # yes/no | ||
deltmp = yes | ||
parallel_jobs = 20 # number of tasks used to run in parallel | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should probably also be set based on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am sure there is a commit where I set this dynamically. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I think I left it by mistake but I still add a line to the config containing that info before giving it to nextdenovo. |
||
input_type = raw # raw, corrected | ||
read_type = ont # clr, ont, hifi | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can this be switched easily? If integrated somewhere in a pipeline, do you expect users to craft a config specific for nextdenovo, or could this be created in a separate process e.g There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The way I see it is that the config file can be very long and with a lot of parameters, for me, the user creates the config file which specific to nextDenovo, the nextdenovo documentation is complete for this input file and can be easily created. If we create the config based in pipeline params, it would be too much in my opinion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand your point, and it is nice if users are able to create their own file for this. However, what if they cannot because they do not understand the things that should go in there? I think having a convenience module that simply creates a config file (or some other option of creating that config file in this module) is important, non-technical users are part of the target audience. Those users would probably spend quite some time and end up submitting whatever the default for this tool is (which we could also simply provide). I am not sure if the config file generation needs to be in the module, but at least at the pipeline level a "i dont know and this is all very confusing" option should be available. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I see. So the module still expects to be given a config file as input but when using that module in a pipeline, it will be created by a simple helper module that would take params and just fills the config file that we would give to nextdenovo. It means that this part won't be handled by nextdenovo's module? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess that is a decision that should be discussed. I would think if such a module is created anyway, it should be easily findable for others that would like to use your nextdenovo module, so having a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree. |
||
input_fofn = input.fofn | ||
workdir = . | ||
|
||
[correct_option] | ||
read_cutoff = 500 | ||
genome_size = 100k # estimated genome size | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is another important parameter that may need to be set dynamically? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again this will depend whether the user gives the config as input or if we create for them. I recommend that the config file be provided as input. The purpose of that file is to prevent users from typing tens of params. Though, we will do as you see fit! |
||
|
||
minimap2_options_raw = -t 8 | ||
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage. | ||
correction_options = -p 15 | ||
|
||
[assemble_option] | ||
minimap2_options_cns = -t 8 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally the number of threads should be dynamic on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nschan whatever is inside this config file is defined by the person running the pipeline, all what is inside must be specific to a run. This one is only a test file, the user must provide this config file to run assembly with nextdenovo. Do you prefer that this file be created dynamically with params? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think at least the threading option has to be generated dynamically for the commonly used resource scaling for failed jobs to work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a simple echo of that variable inside the config file must be sufficient right. |
||
nextgraph_options = -a 1 -z 1 -l 1 -q 0 -N 2 -u 2 -w 3 -B 50 -C 5 -L 1 -t 50 | ||
|
||
# see https://nextdenovo.readthedocs.io/en/latest/OPTION.html for a detailed introduction about all the parameters |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
// nf-core modules test nextdenovo | ||
nextflow_process { | ||
|
||
name "Test Process NEXTDENOVO" | ||
script "../main.nf" | ||
process "NEXTDENOVO" | ||
|
||
tag "modules" | ||
tag "modules_nfcore" | ||
tag "nextdenovo" | ||
|
||
|
||
test("nextdenovo_ont") { | ||
|
||
|
||
when { | ||
process { | ||
""" | ||
|
||
|
||
input[0] = [ | ||
[ id:'test' ], // meta map | ||
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/pacbio/fastq/test_hifi.fastq.gz', checkIfExists: true), | ||
] | ||
input[1] = file("${moduleDir}/tests/config.cfg") | ||
""" | ||
} | ||
} | ||
|
||
then { | ||
assertAll( | ||
{ assert process.success }, | ||
{ assert snapshot( | ||
file(process.out.fasta.get(0).get(1)).name, | ||
path(process.out.stat.get(0).get(1)).readLines()[1].contains("N50"), | ||
process.out.versions | ||
).match() } | ||
) | ||
} | ||
|
||
} | ||
|
||
|
||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"nextdenovo_ont": { | ||
"content": [ | ||
"test.assembly.fasta.gz", | ||
false, | ||
[ | ||
"versions.yml:md5,7ec3ec49cbe9e0a06d3d0be767e4cc0c" | ||
] | ||
], | ||
"meta": { | ||
"nf-test": "0.9.2", | ||
"nextflow": "25.04.2" | ||
}, | ||
"timestamp": "2025-05-16T18:13:58.045501794" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these all outputs that are created by nextdenovo?
All outputs should be emitted in case someone would like to use one of the other outputs in their pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextdenovo outputs, a fasta file containing the assembly and a stat file, of course there are other files but they mostly related to the process itself and not results of the assembly.