Skip to content

nf-EXPLOR | Nextflow pipeline for EXPloring variation in LOng REad Sequencing

Notifications You must be signed in to change notification settings

RenscoHogers/nf-EXPLOR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project nf-EXPLOR | Nextflow pipeline for EXPloring variation in LOng REad Sequencing

*Currently deployed for Cattle SVs & SNPs Discovery in the Bovine Long-Read Consortium (BovLRC) *

Tuan Nguyen

image

Initial setup:

  1. Clone this Github
git clone https://github.com/tuannguyen8390/AgVic_CLRC.git

Pipeline developed for usage in the Bovine Long-Read Consortium (BovLRC). The pipeline deployed multiple bioinformatics software for the detection of Single Nucldeotide Polymorphism (SNPs) & Structural Variants (SV). The pipeline (version 0.0.2) currently deployed. It was designed to deal with data from both Oxford Nanopore as well as PacBio (However we only test at the moment with ONT). Written in Nextflow DSL2.

  1. Obtain & install Docker/Shifter/Singularity
  • Installation guide for Docker can be found here
  • Installation guide for Shifter can be found here
  • Installation guide for Singularity can be found here
  1. Pull assets (genome) and perform some initial setup
  • Run the following command to pull assets (genome) and perform some initial setup (choose 1 among Shifter/Docker/Singularity only)
nextflow run setup.nf -profile shifter/docker/singularity
  1. Test run the pipeline ((choose 1 among Shifter/Docker/Singularity only)

Edit the nextflow.config files to suit your local environment

nextflow run setup.nf -profile shifter/docker/singularity,test
  1. Run the pipeline
nextflow run main.nf -profile shifter/docker/singularity

5*. If you run AWS, you can use the following command to run the pipeline

nextflow run main.nf -profile shifter/docker/singularity,awsbatch

The pipeline works using 2 metadata spreadsheet in the meta folder, in which:

metadata_SR.csv : metadata for short-read data

metadata_LR.csv : metadata for long-read data

Please refer to these files for editing your own. You can run with your own files deploying --LR_MetaDir AND/OR --SR_MetaDir


Pipeline overview

  1. QC :
  • FiltLong : QC for both LongReads and ShortReads (DEFAULT)

  • NanoFilt + FMLRC2 : NanoFilt for QC of Long-Read samples, and FMLRC2 + NanoFilt for QC of Short-Read samples .

  1. Mapping:
  1. SNP Caller: All callers are run in parallel & deploy per chromosome (1 to 29 & X as the pipe currently deployed in cattle)
  • Clair3 : (RECOMMEND FOR DOWNSTREAM ANALYSIS)

  • PEPPER - By default, Flowcell < 10.4 will be analyzed with PEPPER

  • DEEPVARIANT - By default, Flowcell >= 10.4 will be analyzed with DEEPVARIANT & HIFI (RECOMMEND FOR DOWNSTREAM ANALYSIS)

  • Longshot

  1. SV Caller: All callers are run in parallel
  • Sniffles2 (RECOMMEND FOR DOWNSTREAM ANALYSIS)

  • DYSGU (RECOMMEND FOR DOWNSTREAM ANALYSIS)

  • CuteSV2 (RECOMMEND FOR DOWNSTREAM ANALYSIS)

  1. Reporting
  • PRE/POST QC : NanoPlot
  • Alignment Depth : Mosdepth
  1. Extra process for Nanopore
  • PorechopABI

I've absolutely no doubt that there should be some problems :). It runs on my end, but perhaps not yours. If that is the case, please email to Tuan Nguyen

About

nf-EXPLOR | Nextflow pipeline for EXPloring variation in LOng REad Sequencing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Nextflow 86.7%
  • Python 9.6%
  • Shell 3.7%