Skip to content

This pipeline facilitates the analysis of bulk RNA-seq data in UCloud HPC using Ubuntu-Terminal with Slurm workload manager

License

Notifications You must be signed in to change notification settings

kristinekdl/Pipeline-bulk-RNA-seq_ucloud

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline for bulk RNA-seq in UCloud using Slurm

Overview of the Pipeline

This pipeline facilitates the analysis of bulk RNA-seq data in UCloud HPC using Ubuntu-Terminal with Slurm workload manager:

  1. pe_fastq_qc.sh: it is designed to perform quality control (QC) on paired-end FASTQ files.
  2. pe_align_rnaseq_v2_multigenome.sh: it is designed to align paired-end FASTQ files using STAR and optionally Salmon.
  3. pe_postalign_RNA-seq_multigenome.sh: it is designed to process BAM files. It performs mapping QC, and data preprocessing steps to prepare the data for downstream analysis (read counting, bigWig coverage files,etc)
  4. reorganize_files_rnaseq.sh: it is designed to organize files by category.

Supported genomes: hg38, mm10, mm39.

Access to the guides

  1. Quality Control of fastq files Runs FASTQC, Fastq_Screen and AdapterRemoval2.

  2. Alignment of RNA-seq data using STAR (and optionally Salmon)

  3. Postalignment of BAM files - Read counting Read counting is done by FeatureCounts.

  4. Organize files by category

General Usage

  1. Clone Repository and copy the script to your Scripts folder
git clone <repository-url> 
cd <repository-directory> 
  1. Modify SLURM Parameters (Optional): Open a script (script.sh) and modify SLURM parameters at the beginning of the file, such as account, output file, email notifications, nodes, memory, CPU cores, and runtime. Alternatively, you can modify these parameters on-the-fly when executing the script.

  2. On UCloud, start a Terminal Ubuntu run:

    • Enable Slurm cluster
    • To process several samples consider requesting nodes > 1
    • Set the modules path to FGM > Utilities > App > easybuild

  • Include the References folder FGM > References > References

  • Include your Scripts folder and the folder with the fastq.gz/bam files.

  • Notes:

    • Match the job CPUs to the amounts requested in the script.
    • If you modify the memory parameter in the script, specify 5-10% less than the memory available in the terminal run.
    • Although it is not necessary to enable tmux, it is a good practise to always do it.
    • The configuration file of Fastq_Sreen is also located in the /References folder.
  1. Run the Script: Submit the script to the SLURM cluster:

    sbatch -J <job_name> path_to/Scripts_folder/script.sh <input--file> 
    

    Replace input-file with the full path to your file.

    For several samples you can use a for loop:

    for i in *<file-pattern>; do sbatch -J <job_name> path_to/Scripts_folder/script.sh $i; sleep 1; done
    
  2. Monitor Job: You can monitor the job using the SLURM commands, such as squeue, scontrol show job , and check the log files generated.

Notes: Find test data in UCloud at the FGM project (Utilities/Example_data/bulkRNA/Fastq)

About

This pipeline facilitates the analysis of bulk RNA-seq data in UCloud HPC using Ubuntu-Terminal with Slurm workload manager

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%