This pipeline facilitates the analysis of bulk RNA-seq data in UCloud HPC using Ubuntu-Terminal with Slurm workload manager:
- pe_fastq_qc.sh: it is designed to perform quality control (QC) on paired-end FASTQ files.
- pe_align_rnaseq_v2_multigenome.sh: it is designed to align paired-end FASTQ files using STAR and optionally Salmon.
- pe_postalign_RNA-seq_multigenome.sh: it is designed to process BAM files. It performs mapping QC, and data preprocessing steps to prepare the data for downstream analysis (read counting, bigWig coverage files,etc)
- reorganize_files_rnaseq.sh: it is designed to organize files by category.
Supported genomes: hg38, mm10, mm39.
-
Quality Control of fastq files Runs FASTQC, Fastq_Screen and AdapterRemoval2.
-
Alignment of RNA-seq data using STAR (and optionally Salmon)
-
Postalignment of BAM files - Read counting Read counting is done by FeatureCounts.
- Clone Repository and copy the script to your Scripts folder
git clone <repository-url>
cd <repository-directory>
-
Modify SLURM Parameters (Optional): Open a script (script.sh) and modify SLURM parameters at the beginning of the file, such as account, output file, email notifications, nodes, memory, CPU cores, and runtime. Alternatively, you can modify these parameters on-the-fly when executing the script.
-
On UCloud, start a Terminal Ubuntu run:
- Enable Slurm cluster
- To process several samples consider requesting nodes > 1
- Set the modules path to FGM > Utilities > App > easybuild
- Include the References folder FGM > References > References
-
Include your Scripts folder and the folder with the fastq.gz/bam files.
-
Notes:
- Match the job CPUs to the amounts requested in the script.
- If you modify the memory parameter in the script, specify 5-10% less than the memory available in the terminal run.
- Although it is not necessary to enable tmux, it is a good practise to always do it.
- The configuration file of Fastq_Sreen is also located in the /References folder.
-
Run the Script: Submit the script to the SLURM cluster:
sbatch -J <job_name> path_to/Scripts_folder/script.sh <input--file>
Replace input-file with the full path to your file.
For several samples you can use a for loop:
for i in *<file-pattern>; do sbatch -J <job_name> path_to/Scripts_folder/script.sh $i; sleep 1; done
-
Monitor Job: You can monitor the job using the SLURM commands, such as squeue, scontrol show job , and check the log files generated.
Notes: Find test data in UCloud at the FGM project (Utilities/Example_data/bulkRNA/Fastq)