Skip to content

Commit

Permalink
Merge pull request #6 from BCCDC-PHL/unicycler-circularization-tag
Browse files Browse the repository at this point in the history
Add support for unicycler-style circularization tags
  • Loading branch information
dfornika authored Sep 12, 2023
2 parents 33eea76 + 46e3b8b commit b4e3c56
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 1 deletion.
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,33 @@ sample-02,/path/to/sample-02_R1.fastq.gz,/path/to/sample-02_R2.fastq.gz,/path/to
sample-03,/path/to/sample-03_R1.fastq.gz,/path/to/sample-03_R2.fastq.gz,/path/to/sample-03_RL.fastq.gz
```

By default, dragonflye will tag circularized contigs with a `circular=Y` annotation in the fasta header, and `circular=N` for linear contigs. For example:

```
>contig00001 len=5202987 cov=191.0 origname=contig_1_polypolish polish=racon:1 round(s);polypolish:short_reads,1 round(s); sw=dragonflye-flye/1.1.0 date=20230912 circular=Y
>contig00002 len=3964 cov=155.0 origname=contig_4_polypolish polish=racon:1 round(s);polypolish:short_reads,1 round(s); sw=dragonflye-flye/1.1.0 date=20230912 circular=N
...
```

In contrast, [unicycler](https://github.com/rrwick/Unicycler) adds a `circular=true` tag to circularized contigs and no circularization tag to linear contigs. For example:

```
>1 length=5202987 depth=191.0x circular=true
>2 length=3964 depth=155.0
```

Both this pipeline and our [BCCDC-PHL/routine-assembly](https://github.com/BCCDC-PHL/routine-assembly) edit the fasta header to add the sample ID to the front. In addition, this
pipeline accepts a `--use_unicycler_circularization_tag` flag that will convert `circular=Y` to `circular=true` and will remove `circular=N`.

```
nextflow run BCCDC-PHL/dragonflye-nf \
--hybrid \
--use_unicycler_circularization_tag \
--fastq_input <short-read fastq input directory> \
--fastq_input_long <long-read fastq input directory> \
--outdir <output directory>
```

## Output
An output directory will be created for each sample under the directory provided with the `--outdir` flag. The directory will be named by sample ID, inferred from
the fastq files (all characters before the first underscore in the fastq filenames), or the `ID` field of the samplesheet, if one is used.
Expand Down
5 changes: 5 additions & 0 deletions modules/dragonflye.nf
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ process dragonflye {
--outdir ${sample_id}_assembly
sed 's/^>/>${sample_id}_/' ${sample_id}_assembly/contigs.fa > ${sample_id}_dragonflye_${assembly_mode}.fa
if [ "${params.use_unicycler_circularization_tag}" == "true" ]; then
echo 'Switching circularization tags to unicycler-style...' >&2
sed -i 's/circular\\=Y/circular\\=true/' ${sample_id}_dragonflye_${assembly_mode}.fa
sed -i 's/ circular\\=N//' ${sample_id}_dragonflye_${assembly_mode}.fa
fi
cp ${sample_id}_assembly/flye-unpolished.gfa ${sample_id}_dragonflye_${assembly_mode}_unpolished.gfa
cp ${sample_id}_assembly/dragonflye.log ${sample_id}_dragonflye_${assembly_mode}.log
"""
Expand Down
3 changes: 2 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
manifest {
author = 'Dan Fornika'
name = 'BCCDC-PHL/dragonflye-nf'
version = '0.1.0'
version = '0.1.1'
description = 'Nextflow wrapper for Dragonflye Assembler'
mainScript = 'main.nf'
nextflowVersion = '>=20.01.0'
Expand All @@ -12,6 +12,7 @@ params {
bakta = false
hybrid = false
long_only = false
use_unicycler_circularization_tag = false
bakta_db = '/data/ref_databases/bakta/latest'
illumina_suffixes = ['*_R{1,2}_001', '*_R{1,2}', '*_{1,2}' ]
long_read_suffixes = ['*_RL', '*_L']
Expand Down

0 comments on commit b4e3c56

Please sign in to comment.