Skip to content

Add nextdenovo to nf-core modules. #8492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions modules/nf-core/nextdenovo/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::nextdenovo=2.5.2
- pip=23.3.1
- pip:
- paralleltask==0.1.1
- python=3.8


54 changes: 54 additions & 0 deletions modules/nf-core/nextdenovo/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
process NEXTDENOVO {
tag "$meta.id"
label 'process_high'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/nextdenovo:2.5.2--py310h0ceaa1d_6' :
'biocontainers/nextdenovo:2.5.2--py310h0ceaa1d_6' }"

input:
tuple val(meta), path(reads)
path config

output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these all outputs that are created by nextdenovo?
All outputs should be emitted in case someone would like to use one of the other outputs in their pipeline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextdenovo outputs, a fasta file containing the assembly and a stat file, of course there are other files but they mostly related to the process itself and not results of the assembly.

tuple val(meta), path("*.fasta.gz"), emit: fasta
tuple val(meta), path("*.stat") , emit: stat
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
echo "parallel_jobs = ${task.cpus}" >> conf.cfg
cat $config >> conf.cfg
echo ${reads} > input.fofn
nextDenovo \\
conf.cfg \\
input.fofn \\

gzip -c ./03.ctg_graph/nd.asm.fasta > ${prefix}.assembly.fasta.gz

mv ./03.ctg_graph/nd.asm.fasta.stat ${prefix}.assembly_info.stat

cat <<-END_VERSIONS > versions.yml
"${task.process}":
\$( nextDenovo --version )
END_VERSIONS
"""

stub:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
echo stub | gzip -c > ${prefix}.assembly.fasta.gz
echo contig_1 > ${prefix}.assembly_info.stat

cat <<-END_VERSIONS > versions.yml
"${task.process}":
\$( nextDenovo --version )
END_VERSIONS
"""
}
68 changes: 68 additions & 0 deletions modules/nf-core/nextdenovo/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: "nextdenovo"
description: NextDenovo is a string graph-based de novo assembler for long reads (CLR,
HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu (no correction
step for PacBio HiFi reads), but requires significantly less computing resources
and storages
keywords:
- assembly
- genome
- de novo
- genome assembler
- single molecule
tools:
- "nextdenovo":
description: "NextDenovo is a string graph-based de novo assembler for long reads
(CLR, HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu
(no correction step for PacBio HiFi reads), but requires significantly less
computing resources and storages"
homepage: "https://github.com/fenderglass/Flye"
documentation: "https://nextdenovo.readthedocs.io/en/latest/"
tool_dev_url: "https://github.com/Nextomics/NextDenovo"
doi: "10.1186/s13059-024-03252-4"
licence: ["GPL-3.0-license"]
identifier: biotools:nextdenovo
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- reads:
type: file
description: Input reads from Oxford Nanopore or PacBio data in FASTA/FASTQ
format.
pattern: "*.{fasta,fastq,fasta.gz,fastq.gz,fa,fq,fa.gz,fq.gz}"
- - config:
type: file
description: Input config file for nextDenovo
pattern: "*"
output:
- fasta:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- "*.fasta.gz":
type: file
description: Assembled FASTA file
pattern: "*.fasta.gz"
- stat:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- "*.stat":
type: file
description: Extra information and statistics about resulting contigs
pattern: "*.stat"
- versions:
- versions.yml:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@elmedjadjirayane"
maintainers:
- "@elmedjadjirayane"
25 changes: 25 additions & 0 deletions modules/nf-core/nextdenovo/tests/config.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 20 # number of tasks used to run in parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also be set based on ${task.cpus}?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure there is a commit where I set this dynamically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I left it by mistake but I still add a line to the config containing that info before giving it to nextdenovo.

input_type = raw # raw, corrected
read_type = ont # clr, ont, hifi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be switched easily? If integrated somewhere in a pipeline, do you expect users to craft a config specific for nextdenovo, or could this be created in a separate process e.g NEXTDENOVO_CREATE_CONFIG based on pipeline params and then passed to this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it is that the config file can be very long and with a lot of parameters, for me, the user creates the config file which specific to nextDenovo, the nextdenovo documentation is complete for this input file and can be easily created. If we create the config based in pipeline params, it would be too much in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, and it is nice if users are able to create their own file for this. However, what if they cannot because they do not understand the things that should go in there? I think having a convenience module that simply creates a config file (or some other option of creating that config file in this module) is important, non-technical users are part of the target audience. Those users would probably spend quite some time and end up submitting whatever the default for this tool is (which we could also simply provide). I am not sure if the config file generation needs to be in the module, but at least at the pipeline level a "i dont know and this is all very confusing" option should be available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see. So the module still expects to be given a config file as input but when using that module in a pipeline, it will be created by a simple helper module that would take params and just fills the config file that we would give to nextdenovo. It means that this part won't be handled by nextdenovo's module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that is a decision that should be discussed. I would think if such a module is created anyway, it should be easily findable for others that would like to use your nextdenovo module, so having a nextdenovo/create_config alongside nextdenovo/nextdenovo would make sense to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree.

input_fofn = input.fofn
workdir = .

[correct_option]
read_cutoff = 500
genome_size = 100k # estimated genome size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another important parameter that may need to be set dynamically?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this will depend whether the user gives the config as input or if we create for them. I recommend that the config file be provided as input. The purpose of that file is to prevent users from typing tens of params. Though, we will do as you see fit!


minimap2_options_raw = -t 8
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the number of threads should be dynamic on ${task.cpus}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nschan whatever is inside this config file is defined by the person running the pipeline, all what is inside must be specific to a run. This one is only a test file, the user must provide this config file to run assembly with nextdenovo. Do you prefer that this file be created dynamically with params?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least the threading option has to be generated dynamically for the commonly used resource scaling for failed jobs to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a simple echo of that variable inside the config file must be sufficient right.
echo “parallel_jobs= ${task.cpus}” ?

nextgraph_options = -a 1 -z 1 -l 1 -q 0 -N 2 -u 2 -w 3 -B 50 -C 5 -L 1 -t 50

# see https://nextdenovo.readthedocs.io/en/latest/OPTION.html for a detailed introduction about all the parameters
45 changes: 45 additions & 0 deletions modules/nf-core/nextdenovo/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// nf-core modules test nextdenovo
nextflow_process {

name "Test Process NEXTDENOVO"
script "../main.nf"
process "NEXTDENOVO"

tag "modules"
tag "modules_nfcore"
tag "nextdenovo"


test("nextdenovo_ont") {


when {
process {
"""


input[0] = [
[ id:'test' ], // meta map
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/pacbio/fastq/test_hifi.fastq.gz', checkIfExists: true),
]
input[1] = file("${moduleDir}/tests/config.cfg")
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(
file(process.out.fasta.get(0).get(1)).name,
path(process.out.stat.get(0).get(1)).readLines()[1].contains("N50"),
process.out.versions
).match() }
)
}

}



}
16 changes: 16 additions & 0 deletions modules/nf-core/nextdenovo/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"nextdenovo_ont": {
"content": [
"test.assembly.fasta.gz",
false,
[
"versions.yml:md5,7ec3ec49cbe9e0a06d3d0be767e4cc0c"
]
],
"meta": {
"nf-test": "0.9.2",
"nextflow": "25.04.2"
},
"timestamp": "2025-05-16T18:13:58.045501794"
}
}
Loading