Lab 08: DGE using Salmon

DESeq2 with Salmon pseudocount results

NOTE: The R code for this section is also saved under the main page of the RNA_workshop github repo under the "R_materials" directory.

Get the data for analysis:

Copy over the salmon counts results to your laptop so we can look at them in R. You'll also need the gff3 file:

scp -r <your_netid>@sphinx.ag.utk.edu:/pickett_shared/teaching/RNASeq_workshop/final_outputs/07_pseudocount/salmon_results .
scp -r <your_netid>@sphinx.ag.utk.edu:/pickett_shared/teaching/RNASeq_workshop/raw_data/reference/Athaliana_447_Araport11.gene_exons.gff3 .

You'll also want a csv file of the sample names and their associated salmon results directories.

wget https://raw.githubusercontent.com/statonlab/RNA_workshop/main/R_materials/samples_file.csv

Performing DESeq DGE analysis

Wherever you just downloaded those files to should be set as the working directory in R. Go ahead and make a variable called "dir" to store a desired place on your computer to store results from this analysis.

setwd("/where/you/just/copied/the/files")
dir <- "/where/you/just/copied/the/files/salmon_results"

Go ahead and load in the libraries we installed in Lab 7.

library(tidyverse)
library(tximport)
library(GenomicFeatures)
library(DESeq2)
library(pheatmap)

Load the gff3 file, then create a transcript database/dataframe for use with deseq

txdb <- makeTxDbFromGFF("Athaliana_447_Araport11.gene_exons.gff3")
keytypes(txdb)
k <- keys(txdb, keytype = "TXNAME")
txdf = AnnotationDbi::select(txdb, k, "GENEID", "TXNAME")

Load in the metadata

samples <- read_csv("samples_file.csv")
Qfiles <- file.path(dir, samples$quant_file)

This step imports the count data from salmon

txi <- tximport(files = Qfiles, type = "salmon", tx2gene = txdf)
colnames(txi$counts) <- samples$sample_id
names(txi)
head(txi$counts)
summary(txi)

Now we convert the txi object into a deseq-formatted object

dds_data <- DESeqDataSetFromTximport(txi = txi, colData = samples, design = ~condition)
dds <- DESeq(dds_data)

Plot dispersion

plotDispEsts(dds)

Summarize results

res <- results(dds)
head(res)

Create a contrast with an alpha cutoff (first list item is condition from samples object)

res_sig <- results(dds, alpha = 0.05, contrast = c("condition", "max2", "control"))
summary(res_sig)
plotMA(res_sig, ylim=c(-12,12))

These steps can be used to find individual points on the graph.

After running "identify", click on the plot, then hit "finish" button in top right of plot

idx <- identify(res$baseMean, res$log2FoldChange)
rownames(res)[idx]

You can also check out the gene functional annotation by going to TAIR website and pasting the gene name into search.

Create a plot for a single gene

plotCounts(dds, gene="AT1G53480", intgroup="condition")

premade plot

Create a PCA plot

rld <- rlog(dds_data, blind = FALSE)
plotPCA(rld, intgroup = c("condition"))
res_lfc <- subset(res_sig, abs(log2FoldChange) > 1)

premade pca

Create a heatmap

vsd <- vst(dds)
genes <- order(res_lfc$log2FoldChange, decreasing=TRUE)[1:50]
pheatmap(assay(vsd)[genes, ], cluster_rows=TRUE, show_rownames=TRUE,
         cluster_cols=TRUE)

premade heatmap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lab 08: DGE using Salmon

DESeq2 with Salmon pseudocount results

Get the data for analysis:

Performing DESeq DGE analysis

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally