Repository for scripts used to perform the following analysis of short paired-end RNA-seq reads:
- Differential expression (DE)
- Gene ontology (GO) term enrichment
- Gene set enrichment (GSE)
- The input and output paths need to be set using the inputPaths.txt and outputPaths.txt files in the InputData directory.
- Be sure to read the usage notes at the beginning of the file for any script that you intend to run.
- To submit a BASH job script to the queue: qsub SCRIPTNAME.sh INPUT_1 ... INPUT_N
- To view the jobs you have submitted and corresponding task ID numbers: qstat -u USERNAME
- To delete a job from the queue: qdel TASKIDNUMBER
bash SCRIPTNAME.sh INPUT_1 ... INPUT_N
- To compile the script before running: chmod +x SCRIPTNAME.sh
- To run a compiled trimming script: ./SCRIPTNAME.sh INPUT_1 ... INPUT_N
- Bioinformatics Analysis of Omics Data with the Shell & R
- Downstream Bioinformatics Analysis of Omics Data with edgeR
- Gene Transcription at Real-Time Speed
- RNA-seq Library Types and Methods
- Exact or t-Tests Tutorial
- ANOVA Tutorial
- How to Save the Console in RStudio
- FastQC: A quality control tool for high throughput raw sequence data. It generates quality reports for NGS data and gives pass/fail results for the following checks: Per base sequence quality, Per sequence quality scores, Per base sequence content, Per base GC content, Per sequence GC content, Per base N content, Sequence length distribution, Sequence duplication levels, Overrepresented sequences, Kmer content. It also has a Graphic User Interface.
- Trimmomatic: A flexible read trimming tool for Illumina NGS data. It can trim adapter sequences, remove low-quality reads and bases.
- HISAT2: A fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). The algorithm is based on HISAT and Bowtie2; uses a graph FM index (GFM) to index the genome before read mapping.
- Tophat2: A spliced read mapper for RNA-Seq. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
- Bowtie2: An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Bowtie2 first extracts "seed" substrings in reads, aligns seeds in an ungapped way, and then performs extension in a gapped way.
- Cufflinks: It assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. Assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It can be used in the pipeline with a protocol paper.
- Cuffdiff: Differential analysis of gene regulation at transcript resolution with RNA-seq. An algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries.
- Samtools: Utilities for the Sequence Alignment/Map (SAM) format. SAMtools has multiple commands for processing SAM/BAM files. The sub-command "SAMtools-flagstat" can be used to print statistics for SAM/BAM files using the FLAG field.
- HTSeq-count: A package to count mapped reads for genomic features. It counts mapped reads for genomic features.
- EdgeR: Empirical Analysis of Digital Gene Expression Data. It performs differential expression analysis using read counts. It uses raw count data; implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests.