-
Notifications
You must be signed in to change notification settings - Fork 3
/
analyze.Rmd
179 lines (141 loc) · 5.7 KB
/
analyze.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
title: "Analyze RNA-seq experiment with limma+voom"
output:
html_document:
toc: true
---
## Introduction
This example limma+voom code was adapted from the Bioconductor workflow
[RNAseq123](https://bioconductor.org/packages/release/workflows/html/RNAseq123.html)
to demonstrate how to use [OmicNavigator](https://github.com/abbvie-external/OmicNavigator). More information about this code and the data required to execute it can be found at [OmicNavigatorExample](https://github.com/abbvie-external/OmicNavigatorExample).
Below is the description of the experiment from the [RNAseq123
vignette][vignette]:
[vignette]: https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html
> The experiment analysed in this workflow is from Sheridan et al. (2015)
> (Sheridan et al. 2015) and consists of three cell populations (basal, luminal
> progenitor (LP) and mature luminal (ML)) sorted from the mammary glands of
> female virgin mice, each profiled in triplicate. RNA samples were sequenced
> across three batches on an Illumina HiSeq 2000 to obtain 100 base-pair
> single-end reads. The analysis outlined in this article assumes that reads
> obtained from an RNA-seq experiment have been aligned to an appropriate
> reference genome and summarised into counts associated with gene-specific
> regions. In this instance, reads were aligned to the mouse reference genome
> (mm10) using the R based pipeline available in the Rsubread package
> (specifically the align function (Liao, Smyth, and Shi 2013) followed by
> featureCounts (Liao, Smyth, and Shi 2014) for gene-level summarisation based on
> the in-built mm10 RefSeq-based annotation).
The results of this analysis were subsequently converted into an OmicNavigator study with [build.R](https://github.com/abbvie-external/OmicNavigatorExample/blob/main/build.R).
## Prepare data
Load required packages
```{r packages, message=FALSE}
library("limma")
library("edgeR")
library("Mus.musculus")
```
Import RNA-seq counts
```{r data}
files <- Sys.glob("data/*txt")
x <- readDGE(files, columns = c(1, 3))
```
Organize sample data
```{r samples}
samplenames <- make.names(basename(colnames(x)))
colnames(x) <- samplenames
x$samples <- cbind(samplenames, x$samples)
group <- as.factor(c("LP", "ML", "Basal", "Basal", "ML", "LP",
"Basal", "ML", "LP"))
x$samples$group <- group
lane <- as.factor(rep(c("L004", "L006", "L008"), c(3, 4, 2)))
x$samples$lane <- lane
```
Organize feature data
```{r features}
geneid <- rownames(x)
genes <- OrganismDbi::select(
Mus.musculus,
keys = geneid,
columns = c("SYMBOL", "TXCHROM"),
keytype = "ENTREZID"
)
genes <- genes[!duplicated(genes$ENTREZID), ]
x$genes <- genes
colnames(x$genes) <- c("entrez", "symbol", "chrom")
```
Organize metaFeature data
```{r metaFeatures}
metaFeatures <- OrganismDbi::select(
Mus.musculus,
keys = x$genes$entrez,
columns = c("ENSEMBL", "ENSEMBLTRANS", "ENSEMBLPROT"),
keytype = "ENTREZID"
)
colnames(metaFeatures)[1] <- "entrez"
```
Filter and normalize counts
```{r normalize}
cpm <- cpm(x)
keep.exprs <- rowSums(cpm > 1) >= 3
x <- x[keep.exprs, , keep.lib.sizes = FALSE]
x <- calcNormFactors(x, method = "TMM")
```
## Differential expression analysis
Specify linear model and contrasts to test
```{r model}
design <- model.matrix(~0 + group + lane, data = x$samples)
colnames(design) <- sub("(group|lane)", "", colnames(design))
contrastsMatrix <- makeContrasts(BasalvsLP = Basal - LP,
BasalvsML = Basal - ML,
LPvsML = LP - ML,
levels = colnames(design))
```
Fit the model
```{r fit}
v <- voom(x, design)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts = contrastsMatrix)
efit <- eBayes(vfit)
```
## Enrichment analysis
Load Mm.h, which contains the MSigDB Hallmark gene sets for the enrichment
analysis
```{r msigdb}
load("data/mouse_H_v5p2.rdata")
idx <- ids2indices(Mm.H, id = rownames(v))
```
Calculate enrichment for each of the 3 tests
```{r enrichment}
enrichedBasalvsLP <- camera(v, idx, design, contrast = contrastsMatrix[, 1])
enrichedBasalvsML <- camera(v, idx, design, contrast = contrastsMatrix[, 2])
enrichedLPvsML <- camera(v, idx, design, contrast = contrastsMatrix[, 3])
```
## Export results
```{r export}
dir.create("results", showWarnings = FALSE)
export <- function(x, file, row.names = FALSE, ...) {
write.table(x, file = file, quote = FALSE, sep = "\t", row.names = row.names, ...)
}
export(x$samples, "results/samples.txt")
export(x$genes, "results/features.txt")
export(metaFeatures, "results/metaFeatures.txt")
# Convert counts to log-counts-per-million
assays <- as.data.frame(cpm(x, log = TRUE))
export(assays, "results/assays.txt", row.names = TRUE)
# Test results
create_results_table <- function(fit, coef) {
topTable(fit, coef = coef, number = Inf, sort.by = "p")[, -2:-3]
}
export(create_results_table(efit, "BasalvsLP"), "results/BasalvsLP.txt")
export(create_results_table(efit, "BasalvsML"), "results/BasalvsML.txt")
export(create_results_table(efit, "LPvsML"), "results/LPvsML.txt")
# Enrichments
export(enrichedBasalvsLP, "results/enrichedBasalvsLP.txt", row.names = TRUE)
export(enrichedBasalvsML, "results/enrichedBasalvsML.txt", row.names = TRUE)
export(enrichedLPvsML, "results/enrichedLPvsML.txt", row.names = TRUE)
```
## Citations
> Law CW, Alhamdoosh M, Su S et al. RNA-seq analysis is easy as 1-2-3
> with limma, Glimma and edgeR [version 2; referees: 3 approved].
> F1000Research 2016, 5:1408 (doi: 10.12688/f1000research.9005.2)
> Sheridan JM, Ritchie ME, Best SA, et al.: A pooled shRNA screen for
> regulators of primary mammary stem and progenitor cells identifies
> roles for Asap1 and Prox1. BMC Cancer. 2015; 15(1): 221.