Skip to content

Add nextdenovo to nf-core modules. #8492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions modules/nf-core/nextdenovo/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::nextdenovo=2.5.2
- pip=23.3.1
- pip:
- paralleltask==0.1.1
- python=3.8


54 changes: 54 additions & 0 deletions modules/nf-core/nextdenovo/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
process NEXTDENOVO {
tag "$meta.id"
label 'process_high'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/nextdenovo:2.5.2--py310h0ceaa1d_6' :
'biocontainers/nextdenovo:2.5.2--py310h0ceaa1d_6' }"

input:
tuple val(meta), path(reads)
path config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path config
val nextdenovo_parameters


output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these all outputs that are created by nextdenovo?
All outputs should be emitted in case someone would like to use one of the other outputs in their pipeline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextdenovo outputs, a fasta file containing the assembly and a stat file, of course there are other files but they mostly related to the process itself and not results of the assembly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could any of those files still be useful for someone?

tuple val(meta), path("*.fasta.gz"), emit: fasta
tuple val(meta), path("*.stat") , emit: stat
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
Copy link
Contributor

@nschan nschan Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def prefix = task.ext.prefix ?: "${meta.id}"
def prefix = task.ext.prefix ?: "${meta.id}"
nextdonovo_config = [
job_type: "local",
job_prefix: "nextDenovo",
task: "all",
rewrite: "yes",
deltmp: "yes",
parallel_jobs: "${task.cpus}",
input_type: "raw",
read_type: null,
input_fofn: "input.fofn",
workdir: "${meta.id}_nxtd_work",
read_cutoff: "1k",
genome_size: "1g",
sort_options: "-m ${task.memory.toGiga()}g -t ${task.cpus}",
minimap2_options_raw: "-t ${task.cpus}",
pa_correction: "3",
correction_options: "-p 15",
minimap2_options_cns: "-t ${task.cpus}",
nextgraph_options: "-a 1"
] + nextdenovo_parameters
if ( !nextdonovo_config.read_type ) error('Please provide the read type for nextDenovo')
def yamlBuilder = new groovy.yaml.YamlBuilder()
yamlBuilder(nextdonovo_config)
def yaml_content = yamlBuilder.toString().tokenize('\n').join("\n ").replace(":", " =").replace('"','').replace("---\n ", "")

"""
echo "parallel_jobs = ${task.cpus}" >> conf.cfg
cat $config >> conf.cfg
Comment on lines +26 to +27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "parallel_jobs = ${task.cpus}" >> conf.cfg
cat $config >> conf.cfg
cat <<- END_YAML_PARAMS > ${meta.id}_nextdenovo.cfg
${yaml_content}
END_YAML_PARAMS

echo ${reads} > input.fofn
nextDenovo \\
conf.cfg \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
conf.cfg \\
${meta.id}_nextdenovo.cfg \\

input.fofn \\

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$args

gzip -c ./03.ctg_graph/nd.asm.fasta > ${prefix}.assembly.fasta.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this path affected by workdir in the config? If so this probably should be something like:

Suggested change
gzip -c ./03.ctg_graph/nd.asm.fasta > ${prefix}.assembly.fasta.gz
gzip -c ${nextdenovo_config.work_dir}/03.ctg_graph/nd.asm.fasta > ${prefix}.assembly.fasta.gz

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if in the config the work dir is set to something other than the pwd (.) then it won't work, thanks for noticing this.


mv ./03.ctg_graph/nd.asm.fasta.stat ${prefix}.assembly_info.stat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above:

Suggested change
mv ./03.ctg_graph/nd.asm.fasta.stat ${prefix}.assembly_info.stat
mv ${nextdenovo_config.work_dir}/03.ctg_graph/nd.asm.fasta.stat ${prefix}.assembly_info.stat


cat <<-END_VERSIONS > versions.yml
"${task.process}":
\$( nextDenovo --version )
END_VERSIONS
"""

stub:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
echo stub | gzip -c > ${prefix}.assembly.fasta.gz
echo contig_1 > ${prefix}.assembly_info.stat

cat <<-END_VERSIONS > versions.yml
"${task.process}":
\$( nextDenovo --version )
END_VERSIONS
"""
}
68 changes: 68 additions & 0 deletions modules/nf-core/nextdenovo/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: "nextdenovo"
description: NextDenovo is a string graph-based de novo assembler for long reads (CLR,
HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu (no correction
step for PacBio HiFi reads), but requires significantly less computing resources
and storages
keywords:
- assembly
- genome
- de novo
- genome assembler
- single molecule
tools:
- "nextdenovo":
description: "NextDenovo is a string graph-based de novo assembler for long reads
(CLR, HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu
(no correction step for PacBio HiFi reads), but requires significantly less
computing resources and storages"
homepage: "https://github.com/fenderglass/Flye"
documentation: "https://nextdenovo.readthedocs.io/en/latest/"
tool_dev_url: "https://github.com/Nextomics/NextDenovo"
doi: "10.1186/s13059-024-03252-4"
licence: ["GPL-3.0-license"]
identifier: biotools:nextdenovo
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- reads:
type: file
description: Input reads from Oxford Nanopore or PacBio data in FASTA/FASTQ
format.
pattern: "*.{fasta,fastq,fasta.gz,fastq.gz,fa,fq,fa.gz,fq.gz}"
- - config:
type: file
description: Input config file for nextDenovo
pattern: "*"
Comment on lines +35 to +38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be type map and list the required, and optional configuration parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I list those in the description?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At minimum provide a link to the docs, but I think listing them would be better instead of making people click-through.

output:
- fasta:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- "*.fasta.gz":
type: file
description: Assembled FASTA file
pattern: "*.fasta.gz"
- stat:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- "*.stat":
type: file
description: Extra information and statistics about resulting contigs
pattern: "*.stat"
- versions:
- versions.yml:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@elmedjadjirayane"
maintainers:
- "@elmedjadjirayane"
25 changes: 25 additions & 0 deletions modules/nf-core/nextdenovo/tests/config.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 1 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = ont # clr, ont, hifi
input_fofn = input.fofn
workdir = .

[correct_option]
read_cutoff = 500
genome_size = 100k # estimated genome size

minimap2_options_raw = -t 8
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1 -z 1 -l 1 -q 0 -N 2 -u 2 -w 3 -B 50 -C 5 -L 1 -t 50

# see https://nextdenovo.readthedocs.io/en/latest/OPTION.html for a detailed introduction about all the parameters
45 changes: 45 additions & 0 deletions modules/nf-core/nextdenovo/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// nf-core modules test nextdenovo
nextflow_process {

name "Test Process NEXTDENOVO"
script "../main.nf"
process "NEXTDENOVO"

tag "modules"
tag "modules_nfcore"
tag "nextdenovo"


test("nextdenovo_ont") {


when {
process {
"""


input[0] = [
[ id:'test' ], // meta map
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/pacbio/fastq/test_hifi.fastq.gz', checkIfExists: true),
]
input[1] = file("${moduleDir}/tests/config.cfg")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point to the config in test-data i think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to the test-data, I must edit that test file too in the repo to put 1 for parallel jobs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you make use of the proposed nextdenovo_parameters input map in your test to set those parameters.

"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(
file(process.out.fasta.get(0).get(1)).name,
path(process.out.stat.get(0).get(1)).readLines()[1].contains("N50"),
process.out.versions
).match() }
)
}

}



}
16 changes: 16 additions & 0 deletions modules/nf-core/nextdenovo/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"nextdenovo_ont": {
"content": [
"test.assembly.fasta.gz",
false,
[
"versions.yml:md5,7ec3ec49cbe9e0a06d3d0be767e4cc0c"
]
],
"meta": {
"nf-test": "0.9.2",
"nextflow": "25.04.2"
},
"timestamp": "2025-05-16T18:13:58.045501794"
}
}
Loading