Skip to content

Lab 08: DGE using Salmon

Ryan edited this page Aug 17, 2023 · 4 revisions

DESeq2 with Salmon pseudocount results

NOTE: The R code for this section is also saved under the main page of the RNA_workshop github repo under the "R_materials" directory.

Get the data for analysis:

Copy over the salmon counts results to your laptop so we can look at them in R. You'll also need the gff3 file:

scp -r <your_netid>@sphinx.ag.utk.edu:/pickett_shared/teaching/RNASeq_workshop/final_outputs/07_pseudocount/salmon_results .
scp -r <your_netid>@sphinx.ag.utk.edu:/pickett_shared/teaching/RNASeq_workshop/raw_data/reference/Athaliana_447_Araport11.gene_exons.gff3 .

You'll also want a csv file of the sample names and their associated salmon results directories.

wget https://raw.githubusercontent.com/statonlab/RNA_workshop/main/R_materials/samples_file.csv

Performing DESeq DGE analysis

Wherever you just downloaded those files to should be set as the working directory in R. Go ahead and make a variable called "dir" to store a desired place on your computer to store results from this analysis.

setwd("/where/you/just/copied/the/files")
dir <- "/where/you/just/copied/the/files/salmon_results"

Go ahead and load in the libraries we installed in Lab 7.

library(tidyverse)
library(tximport)
library(GenomicFeatures)
library(DESeq2)
library(pheatmap)

Load the gff3 file, then create a transcript database/dataframe for use with deseq

txdb <- makeTxDbFromGFF("Athaliana_447_Araport11.gene_exons.gff3")
keytypes(txdb)
k <- keys(txdb, keytype = "TXNAME")
txdf = AnnotationDbi::select(txdb, k, "GENEID", "TXNAME")

Load in the metadata

samples <- read_csv("samples_file.csv")
Qfiles <- file.path(dir, samples$quant_file)

This step imports the count data from salmon

txi <- tximport(files = Qfiles, type = "salmon", tx2gene = txdf)
colnames(txi$counts) <- samples$sample_id
names(txi)
head(txi$counts)
summary(txi)

Now we convert the txi object into a deseq-formatted object

dds_data <- DESeqDataSetFromTximport(txi = txi, colData = samples, design = ~condition)
dds <- DESeq(dds_data)

Plot dispersion

plotDispEsts(dds)

Summarize results

res <- results(dds)
head(res)

Create a contrast with an alpha cutoff (first list item is condition from samples object)

res_sig <- results(dds, alpha = 0.05, contrast = c("condition", "max2", "control"))
summary(res_sig)
plotMA(res_sig, ylim=c(-12,12))

These steps can be used to find individual points on the graph.

After running "identify", click on the plot, then hit "finish" button in top right of plot

idx <- identify(res$baseMean, res$log2FoldChange)
rownames(res)[idx]

You can also check out the gene functional annotation by going to TAIR website and pasting the gene name into search.

Create a plot for a single gene

plotCounts(dds, gene="AT1G53480", intgroup="condition")

premade plot

Create a PCA plot

rld <- rlog(dds_data, blind = FALSE)
plotPCA(rld, intgroup = c("condition"))
res_lfc <- subset(res_sig, abs(log2FoldChange) > 1) 

premade pca

Create a heatmap

vsd <- vst(dds)
genes <- order(res_lfc$log2FoldChange, decreasing=TRUE)[1:50]
pheatmap(assay(vsd)[genes, ], cluster_rows=TRUE, show_rownames=TRUE,
         cluster_cols=TRUE)

premade heatmap

Clone this wiki locally