-
Notifications
You must be signed in to change notification settings - Fork 0
Lab 08: DGE using Salmon
NOTE: The R code for this section is also saved under the main page of the RNA_workshop github repo under the "R_materials" directory.
Copy over the salmon counts results to your laptop so we can look at them in R. You'll also need the gff3 file:
scp -r <your_netid>@sphinx.ag.utk.edu:/pickett_shared/teaching/RNASeq_workshop/final_outputs/07_pseudocount/salmon_results .
scp -r <your_netid>@sphinx.ag.utk.edu:/pickett_shared/teaching/RNASeq_workshop/raw_data/reference/Athaliana_447_Araport11.gene_exons.gff3 .
You'll also want a csv file of the sample names and their associated salmon results directories.
wget https://raw.githubusercontent.com/statonlab/RNA_workshop/main/R_materials/samples_file.csv
Wherever you just downloaded those files to should be set as the working directory in R. Go ahead and make a variable called "dir" to store a desired place on your computer to store results from this analysis.
setwd("/where/you/just/copied/the/files")
dir <- "/where/you/just/copied/the/files/salmon_results"
Go ahead and load in the libraries we installed in Lab 7.
library(tidyverse)
library(tximport)
library(GenomicFeatures)
library(DESeq2)
library(pheatmap)
Load the gff3 file, then create a transcript database/dataframe for use with deseq
txdb <- makeTxDbFromGFF("Athaliana_447_Araport11.gene_exons.gff3")
keytypes(txdb)
k <- keys(txdb, keytype = "TXNAME")
txdf = AnnotationDbi::select(txdb, k, "GENEID", "TXNAME")
Load in the metadata
samples <- read_csv("samples_file.csv")
Qfiles <- file.path(dir, samples$quant_file)
This step imports the count data from salmon
txi <- tximport(files = Qfiles, type = "salmon", tx2gene = txdf)
colnames(txi$counts) <- samples$sample_id
names(txi)
head(txi$counts)
summary(txi)
Now we convert the txi object into a deseq-formatted object
dds_data <- DESeqDataSetFromTximport(txi = txi, colData = samples, design = ~condition)
dds <- DESeq(dds_data)
Plot dispersion
plotDispEsts(dds)
Summarize results
res <- results(dds)
head(res)
Create a contrast with an alpha cutoff (first list item is condition from samples object)
res_sig <- results(dds, alpha = 0.05, contrast = c("condition", "max2", "control"))
summary(res_sig)
plotMA(res_sig, ylim=c(-12,12))
These steps can be used to find individual points on the graph.
After running "identify", click on the plot, then hit "finish" button in top right of plot
idx <- identify(res$baseMean, res$log2FoldChange)
rownames(res)[idx]
You can also check out the gene functional annotation by going to TAIR website and pasting the gene name into search.
Create a plot for a single gene
plotCounts(dds, gene="AT1G53480", intgroup="condition")
Create a PCA plot
rld <- rlog(dds_data, blind = FALSE)
plotPCA(rld, intgroup = c("condition"))
res_lfc <- subset(res_sig, abs(log2FoldChange) > 1)
Create a heatmap
vsd <- vst(dds)
genes <- order(res_lfc$log2FoldChange, decreasing=TRUE)[1:50]
pheatmap(assay(vsd)[genes, ], cluster_rows=TRUE, show_rownames=TRUE,
cluster_cols=TRUE)