Nextflow pipeline for polishing an assembly with long reads and racon
Polishing a genome assembly with racon is pretty simple, except that it can take a while on large genomes and so you might want to distribute the tasks into one job per contig/scaffold/chromosome on a cluster. This pipeline does that distribution for you.
- Long reads
- An assembly created from those long reads
- nextflow — can be installed with the command
curl -s https://get.nextflow.io | bash
This pipeline is set up to use mamba to create an environment with these three programs in it, but you can always install them yourself or use modules. See configuration section below.
Nextflow configuration is handled by the file nextflow.config
in the directory
where you're running the nextflow command. The configuration file in this
repository is for running the pipeline on the lewis cluster at Mizzou using
SLURM and mamba, but you can adjust it to use any batch or cloud system you
want. Check out the nextflow docs
for more information.
If you don't have a conda environment already set up with racon, minimap2, and samtools installed in it, you can have nextflow make one for you by changing the conda line of the config to
conda = "racon minimap2 samtools"
Nextflow needs a filesystem where locking is allowed for keeping track of which
jobs are running, but not to actually store the data or temporary files you're
creating. On Lewis, HTC allows locking but is slow and HPC does not allow
locking but is fast. To take advantage of the best of both worlds, run this
pipeline from within a project directory in HTC, but set the environment
variable $NXF_WORK
to point to an empty directory on HPC.
To download the pipeline and run it on your assembly, just run the command:
nextflow run WarrenLab/longread-polish-nf \
--reference unpolished_assembly.fa \
--reads 'long_reads/*.fastq.gz'
This will align all the reads to your reference and then use the alignments to correct the assembly.
N.B. The single quotes around the --reads
parameter are necessary to keep
bash from expanding the asterisk; we want nextflow to do that expansion.
- Write help message
- Convert to DSL2
- Add option for ONT reads