Skip to content

Latest commit

 

History

History
171 lines (117 loc) · 4.49 KB

README.md

File metadata and controls

171 lines (117 loc) · 4.49 KB

RNA-seq and differential expression 2022-03

Bash scripts to generate commands to submitted via slurm job scheduler for high-throughput differential gene expression analysis.

Tools Used

Cufflinks - gffread

BWA

BEDTools

SAMtools

HISAT2

STAR

Project Setup

Perform initial setup for project directories and files to make subsequent filtering and alignment run smoother.

Move to your Research_Project directory.

cd /lustre/projects/Research_Project-T110796

Download seq data in new subproject folder

# Create new folder and move into folder
mkdir Project_10762
cd Project_10762

# Run V0268 Raw Reads
curl URL.tar | tar -xv

# Run V0268 QC Files
curl URL.tar | tar -xv

# Run V0268 Trimmed Reads
curl URL.tar | tar -xv

Make a directory tree that is compatible with job generation and job scripts. First you will need to rename the main project directory after your project. This next command will make almost the entire tree. The -m 770 option will add full directory permissions to owner and group users.

mkdir -m 770 -p nys_RNA_seq/rRNA_filtering/{filtered_fastqs,STAR_alignment,HISAT2_alignment}

To make scripts look cleaner it is useful to link files that are stored else where into the project analysis directory. This is called a symbolic link. To link to the fastp_trimmed directory that contains the trimmed fastq files received from the sequencing centre use this command. Change the directory names as required. I ran this from the Research_Project-T110796 directory.

ln -s ./nys_project/Project_10558/V0210/11_fastp_trimmed/ ./nys_RNA_seq/fastqs

The reference genome for Pristionchus pacificus should already be in the el_paco_ref directory. If a new version is released upload them into this directory. You will then need to index the new version for HISAT2 or STAR using the scripts provided.

Downloading Scripts

To obtain the scripts from the GitHub repository first move to diff_expr_scripts/ then load the git module and use the following command.

git clone https://github.com/Harry-Pollitt/RNA-Seq-Projects.git . # . is current working directory 

git pull origin  # should download updated scripts if needed

Once you have made the directory tree and linked the fastq directory, it should look something like this.

  • is the symbolically linked directory
.
└── Research_Project-T110796/
    ├── nys_project/
    │   └── Project_10558/
    │       └── V0210/
    │           ├── 01_raw_reads
    │           ├── 09_QC_reports
    │           └── 11_fastp_trimmed*
    ├── nys_alignments/
    │   ├── fastqs*
    │   └── rRNA_filtering/
    │       ├── filtered_fastqs
    │       ├── STAR_alignment
    │       └── HISAT2_alignment
    ├── el_paco_ref/
    │   ├── El_Paco_V3_gene_annotations.gff3
    │   ├── El_Paco_genome.fa 
    │   ├── El_Paco_V3_gene_annotations.gtf 
    │   ├── STAR_Index
    │   └── HISAT_Index
    └── diff_expr_scripts/
        └── here be scripts...

Script Usage

Check each script and change the #SBATCH parameters and other lines as necessary.

--- Creating Index files for alignments ---

sbatch hisat-indexing-job.sh
sbatch star-indexing-job.sh

This only needs to be done once for each reference genome, reuse the index for each alignment.

--- Aligning rRNA reads ---

sh generate-bwa-rRNA-commands.sh
sbatch bwa-rRNA-job-script.sh

--- Converting non-rRNA bams to fastqs ---

sh generate-bamtofastq-commands.sh
sbatch bamtofastq-job-script.sh

--- Align filtered reads to genome with HISAT2 ---

sh generate-hisat2-commands.sh
sbatch hisat2-job-script.sh

--- Align filered reads to genome with STAR ---

sh generate-star-commands.sh
sbatch star-job-script.sh

--- featureCounts and DESeq2 in RStudio ---

deseq2-edgeR-protocol.R

To use this R script you will need to download your HISAT2/STAR aligned bam files and the gtf annotation file and place them into your working directory for RStudio. Then load this script and work through each step. Modifying the script to your needs.

Contributing

Harry Pollitt

Email: [email protected]

Rebekah White

Email: [email protected]

Cameron Weadick

Email: [email protected]

Acknowledgement

The authors would like to acknowledge the use of the University of Exeter High-Performance Computing (HPC) facility in in carrying out this work.