A pipeline for RNASeq analysis on paired-end reads implemented with NextFlow dsl2.
- Fastqc - Quality Check
- Trim_galore - Adapter trimming and fastqc - trimmed reads are used for the rest of the workflow
- Salmon - Index building and quantification
- Hisat2 - Index building and Alignment
- Samtools - sam to bam conversion, generate stats report with flagstat
- FeatureCounts - Count genes, mRNAs, and genes with multi-mapping reads
- Multiqc - Generate a multiqc report
- Nextflow
- Either Singularity or Docker to use containers. If not using containers, these software/modules are needed: fastqc, trimgalore, salmon, hisat2, samtools, subread, and multiqc.
- Git
Clone the repo using this code:
git clone [email protected]:SharuPaul/RNASeq.git
And run this command to get help statement:
nextflow run main.nf --help
Usage:
nextflow run main.nf --indir <input data directory> -profile <nextflow profile(s)>
Mandatory Arguments:
--indir Path to directory containing input data
Input data: [Will look for data in directory specified in --indir by default, one or more of following
need to be specified if in a different directory, a subdirectory, or in case of error in
finding the data (glob pattern mismatch)]
--reads Paired-end reads (glob pattern, e.g. "rawReads/*_{R1,R2}.fastq.gz")
--cdna Reference cDNA file
--fasta Reference genome fasta file
--gff Reference genome GFF file
Optional Arguments: [default value]
--threads Number of threads [16]
--outdir Output directory name [RNAseq_Results]
--trim_args Additional arguments for trim_galore ["--fastqc"]
--salmonindex Path to salmon index. Provide directory containing prebuilt salmon index files
[If not provided, index is built by default]
--sal_quant_args Additional arguments for salmon quant ["--libType=A --validateMappings"]
--hisatindex Path to hisat index. Provide directory containing prebuilt Hisat2 index files
[If not provided, Hisat will build an index by default]
Nextflow Arguments: (notice single "-" instead of double "--")
-profile Nextflow profiles available: singularity, docker, slurm
-resume Resume last run
--help Print this help statement
Run the pipeline using this command:
nextflow run main.nf --indir <input data directory> -profile <nextflow profile(s)>
Prebuilt indexes for salmon and hisat can be supplied, and addtitional nextflow arguments can also be used. The program will look for input data in directory specified by --indir
by default. If some data is in a different folder or a subfolder, and it cannot be located automatically, then you can specify that using the appropriate arguments (e.g. --reads
or --cdna
).