nextflow-nanoproe is a Nextflow pipeline for analysis of Nanopore Whole Genome Sequencing.
- Basecalling (
Guppy
) - with GPU run option - Basecalling QC (
PycoQC
) - Alignment (
Guppy
withminimap2
) - Merge all aligned bam files into asingle file (
samtools
) - Haplotyping and phased variants calling (
PEPPER-Margin-DeepVariant
) - Depth calculation (
mosdepth
) - MultiQC (
MultiQC
) for Basecalling (PycoQC) and Depth (mosdepth)
- fast5 raw reads provided as a full path for the directory containing all fast5 files, either in a configuration file or as (--input path/to/fast5) command line parameter.
- Path for reference genome fasta file, either in a configuration file or as (--genome_fasta path/to/genome.fasta) command line parameter.
- This test run takes input of:
- few ".fast5" files (6 files: ~3.5 GB)
- chr22.fasta as reference genome
LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(mdivr/centos:v0.1)" "NXF_HOME=${PWD}/.nextflow ; nextflow run dhslab/nextflow-nanopore -r main -profile ris,dhslab_test"
git clone https://github.com/dhslab/nextflow-nanopore.git
cd nextflow-nanopore/
LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(mdivr/centos:v0.1)" "NXF_HOME=${PWD}/.nextflow ; nextflow run main.nf -profile ris,dhslab_test"
If the pipeline is intended to be run from local code (after being cloned), instead of running:
nextflow run main.nf -profile ris,dhslab_test
you can run:
nextflow run main.nf -profile ris -c conf/dhslab_test.config
The above two examples are interchangeable. As dhslab profile (defined in nextflow.config file) is basically just importing (or including in nextflow language) conf/dhslab_test.config file to the pipeline scope and append it to the configurations.
However "-profile ris
" is still required in both cases as it is important to define the LSF runtime commands.
-
Output:
- "results/" is the desired output from the test run
- "work/" is the working directory for all tasks, can be removed if the pipeline ran successfully
-
Example for results output for sample "aml476081" in the test workflow
results/
├── aligned_bams
│ ├── aml476081.bam
│ └── aml476081.bam.bai
├── basecall
│ └── fastq
│ └── aml476081.fastq.gz
├── multiqc
│ ├── multiqc_data
│ │ ├── mosdepth_cov_dist.txt
│ │ ├── mosdepth_cumcov_dist.txt
│ │ ├── mosdepth_perchrom.txt
│ │ ├── multiqc.log
│ │ ├── multiqc_citations.txt
│ │ ├── multiqc_data.json
│ │ ├── multiqc_general_stats.txt
│ │ ├── multiqc_sources.txt
│ │ └── pycoqc.txt
│ └── multiqc_report.html
├── pepper
│ ├── haplotagged_bam
│ │ ├── aml476081.haplotagged.bam
│ │ └── aml476081.haplotagged.bam.bai
│ └── vcf
│ ├── aml476081.phased.vcf.gz
│ ├── aml476081.phased.vcf.gz.tbi
│ ├── aml476081.vcf.gz
│ └── aml476081.vcf.gz.tbi
└── pipeline_info
├── pipeline_report.html
└── pipeline_timeline.html