Snakeamke wrapper scripts (located in the workflow
folder) enable for automatic RNA-seq data analysis in terms of quality control, assembly, quantification, gene ontology, differential gene expression and alternative splicing and it's effects on protein level. Additional 'RMarkdown' script enables final visualization for AS.
Additional Rmarkdown script allowas for Illumina microarrays analysis. Pipeline accepts either .fastq, fastq.gz or .fastq.dsrc files.
conda
for building the environmentinterproscan
andpcregrep
for finding protein domainsdsrc
if your files are dsrc compressed
git clone www.github.com/aagatam/Pipeline.git
conda env create -f environment.yml
conda activate Pipeline
You need to adjust two files configs/config.yaml
and configs/Description.csv
to match your data for RNA-seq data analysis. For microarray analysis adjust configs/config_Illumina.yaml
and configs/Description_Illumina.csv
All the fields in yaml configs are explained within the original file. Currently only analysis for samples with equal repetitions is available.
snakemake --cores all --use-conda --conda-frontend conda -p -j 1 -s workflow/trim.snakefile
- In
FINALOUTPUT
/PROJECT
/trim:- trimmed fq.gz files,
- quality report in
fastqc_after_trimming
folder
snakemake --cores all -p -s workflow/quality_control.snakefile
- In
FINALOUTPUT
/PROJECT
/fastqc:- summary report_quality_control.html,
- report for each sample
snakemake --cores all -p -s workflow/align_HiSat2.snakefile
- In
FINALOUTPUT
/PROJECT
/genome:- multiqc summary report report_align_count.html,
- results prepared for analysis with R in
Hisat_results
folder, - Strintie results in
countFile
folder, - index in
indexes
folder, - sorted BAM files in
bamFIleSort
, - qualimap alignment QC in
alignmentQC
folder
snakemake --cores all -p -s workflow/align_kallisto.snakefile
- In
FINALOUTPUT
/PROJECT
/trans:- multiqc summary report report_align_count.html,
- benchmarks folder,
- kallisto index in
indexes
folder, - log files in
kallisto
folder, - folders with Kallisto results for all samples in
kallisto
folder
This part will index all BAM files and run Spladder.
snakemake --cores all -p -s workflow/spladder_run.snakefile
- .bai index files for all BAM files,
- Spladder output files in
FINALOUTPUT
/PROJECT
/genome/spladder
This part will run Bisbbe and analyse its output files with Rmarkdown script producing report and also bunch of csv files with results and files needed for further analysis with InterProscan and visualization. Depending on dataset size, this step might take a few hours, especially during first run when necessary libraries are downloaded.
Install desired species release, for example:
pyensembl install --release 104 --species mouse
THENsnakemake --cores all -p -s workflow/bisbee_run.snakefile
- In
FINALOUTPUT
/PROJECT
/genome/bisbee: - csv files with bisbee results, - fasta files with transcripts including novel events. - In
FINALOUTPUT
/PROJECT
/genome/spladder:- To_plots.csv <- file with all common events from new+new and new+old group
- To_plots.RData <- for further visualizations with Plots.Rmd
- In
FINALOUTPUT
/PROJECT
/genome/spladder/Script_output:- .txt files with common genes for three groups and all events,
- .csv with GO terms detected for each group,
- .txt files with common GO terms between events within three groups,
- .pdf files showing how GO terms change for first 10 terms from old+old group when adding events first from new+old group, then new+new and also top 10 terms from each group and how significant are they in others.
- In
FINALOUTPUT
/PROJECT
/genome/bisbee/Filetered:- bisbee results filtered with respect to valid events.
- In
FINALOUTPUT
/PROJECT
/genome/bisbee:- files _to_grep.txt used for filtering fasta files for further InterProScan analysis.
- Spladder.pdf - report in
scripts
folder
snakemake --cores all -p -s workflow/interproscan_run.snakefile
- In
FINALOUTPUT
/PROJECT
/genome/bisbee: - .grepped.filtered.fasta - new fasta files with only intresting events, prepared for InterProScan analysis, - In
FINALOUTPUT
/PROJECT
/genome/InterProScan: - .tsv files with InterProScan results
R -e "rmarkdown::render('scripts/Plots.Rmd',params=list(event_type='event_type', event='event_no'),output_file='Out_name.pdf')"
Hereevent_no
is the event you want to visualize (for example mutex_exons_168) andevent_type
is one of: alt_3_prime, alt_5_prime, exon_skip, mult_exon_skip, mutex_exons (in this case mutex_exons). List of most interesting events is provided inTo_plots.csv
file from 6th step.
- Plots.pdf file with two plots for a given event. One for the whole transcript, and a close-up on the second one.
R -e "rmarkdown::render('scripts/Expression_HiSat.Rmd')"
OR
R -e "rmarkdown::render('scripts/Expression_Kallisto.Rmd')"
- pdf report in
scripts
folder, - results for limma, edgeR and DeSeq2 DEG (all and below given p-value) in
FINALOUTPUT
/PROJECT
/genome/Hisat_results orFINALOUTPUT
/PROJECT
/trans/kallisto, - results for GO terms analysis in folders like above.
- Change
config_Illumina.yaml
andDescription_Illumina.csv
, R -e "rmarkdown::render('scripts/Expression_Illumina_ microarrays.Rmd')"
- Expression_Illumina_ microarrays.pdf report in
scripts
folder