analyze.Rmd

---
title: "Analyze RNA-seq experiment with limma+voom"
output:
  html_document:
    toc: true
---

## Introduction

This example limma+voom code was adapted from the Bioconductor workflow
[RNAseq123](https://bioconductor.org/packages/release/workflows/html/RNAseq123.html)
to demonstrate how to use [OmicNavigator](https://github.com/abbvie-external/OmicNavigator). More information about this code and the data required to execute it can be found at [OmicNavigatorExample](https://github.com/abbvie-external/OmicNavigatorExample). 

Below is the description of the experiment from the [RNAseq123
vignette][vignette]:

[vignette]: https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html

> The experiment analysed in this workflow is from Sheridan et al. (2015)
> (Sheridan et al. 2015) and consists of three cell populations (basal, luminal
> progenitor (LP) and mature luminal (ML)) sorted from the mammary glands of
> female virgin mice, each profiled in triplicate. RNA samples were sequenced
> across three batches on an Illumina HiSeq 2000 to obtain 100 base-pair
> single-end reads. The analysis outlined in this article assumes that reads
> obtained from an RNA-seq experiment have been aligned to an appropriate
> reference genome and summarised into counts associated with gene-specific
> regions. In this instance, reads were aligned to the mouse reference genome
> (mm10) using the R based pipeline available in the Rsubread package
> (specifically the align function (Liao, Smyth, and Shi 2013) followed by
> featureCounts (Liao, Smyth, and Shi 2014) for gene-level summarisation based on
> the in-built mm10 RefSeq-based annotation).

The results of this analysis were subsequently converted into an OmicNavigator study with [build.R](https://github.com/abbvie-external/OmicNavigatorExample/blob/main/build.R). 

## Prepare data

Load required packages

```{r packages, message=FALSE}
library("limma")
library("edgeR")
library("Mus.musculus")
```

Import RNA-seq counts

```{r data}
files <- Sys.glob("data/*txt")
x <- readDGE(files, columns = c(1, 3))
```

Organize sample data

```{r samples}
samplenames <- make.names(basename(colnames(x)))
colnames(x) <- samplenames
x$samples <- cbind(samplenames, x$samples)
group <- as.factor(c("LP", "ML", "Basal", "Basal", "ML", "LP",
                     "Basal", "ML", "LP"))
x$samples$group <- group
lane <- as.factor(rep(c("L004", "L006", "L008"), c(3, 4, 2)))
x$samples$lane <- lane
```

Organize feature data

```{r features}
geneid <- rownames(x)
genes <- OrganismDbi::select(
  Mus.musculus,
  keys = geneid,
  columns = c("SYMBOL", "TXCHROM"),
  keytype = "ENTREZID"
)
genes <- genes[!duplicated(genes$ENTREZID), ]
x$genes <- genes
colnames(x$genes) <- c("entrez", "symbol", "chrom")
```

Organize metaFeature data

```{r metaFeatures}
metaFeatures <- OrganismDbi::select(
  Mus.musculus,
  keys = x$genes$entrez,
  columns = c("ENSEMBL", "ENSEMBLTRANS", "ENSEMBLPROT"),
  keytype = "ENTREZID"
)
colnames(metaFeatures)[1] <- "entrez"
```

Filter and normalize counts

```{r normalize}
cpm <- cpm(x)
keep.exprs <- rowSums(cpm > 1) >= 3
x <- x[keep.exprs, , keep.lib.sizes = FALSE]
x <- calcNormFactors(x, method = "TMM")
```

## Differential expression analysis

Specify linear model and contrasts to test

```{r model}
design <- model.matrix(~0 + group + lane, data = x$samples)
colnames(design) <- sub("(group|lane)", "", colnames(design))
contrastsMatrix <- makeContrasts(BasalvsLP = Basal - LP,
                                 BasalvsML = Basal - ML,
                                 LPvsML = LP - ML,
                                 levels = colnames(design))
```

Fit the model

```{r fit}
v <- voom(x, design)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts = contrastsMatrix)
efit <- eBayes(vfit)
```

## Enrichment analysis

Load Mm.h, which contains the MSigDB Hallmark gene sets for the enrichment
analysis

```{r msigdb}
load("data/mouse_H_v5p2.rdata")
idx <- ids2indices(Mm.H, id = rownames(v))
```

Calculate enrichment for each of the 3 tests
 
```{r enrichment}
enrichedBasalvsLP <- camera(v, idx, design, contrast = contrastsMatrix[, 1])
enrichedBasalvsML <- camera(v, idx, design, contrast = contrastsMatrix[, 2])
enrichedLPvsML <- camera(v, idx, design, contrast = contrastsMatrix[, 3])
```

## Export results

```{r export}
dir.create("results", showWarnings = FALSE)
export <- function(x, file, row.names = FALSE, ...) {
  write.table(x, file = file, quote = FALSE, sep = "\t", row.names = row.names, ...)
}

export(x$samples, "results/samples.txt")
export(x$genes, "results/features.txt")
export(metaFeatures, "results/metaFeatures.txt")

# Convert counts to log-counts-per-million
assays <- as.data.frame(cpm(x, log = TRUE))
export(assays, "results/assays.txt", row.names = TRUE)

# Test results
create_results_table <- function(fit, coef) {
  topTable(fit, coef = coef, number = Inf, sort.by = "p")[, -2:-3]
}
export(create_results_table(efit, "BasalvsLP"), "results/BasalvsLP.txt")
export(create_results_table(efit, "BasalvsML"), "results/BasalvsML.txt")
export(create_results_table(efit, "LPvsML"), "results/LPvsML.txt")

# Enrichments
export(enrichedBasalvsLP, "results/enrichedBasalvsLP.txt", row.names = TRUE)
export(enrichedBasalvsML, "results/enrichedBasalvsML.txt", row.names = TRUE)
export(enrichedLPvsML, "results/enrichedLPvsML.txt", row.names = TRUE)
```

## Citations

> Law CW, Alhamdoosh M, Su S et al. RNA-seq analysis is easy as 1-2-3
> with limma, Glimma and edgeR [version 2; referees: 3 approved].
> F1000Research 2016, 5:1408 (doi: 10.12688/f1000research.9005.2)

> Sheridan JM, Ritchie ME, Best SA, et al.: A pooled shRNA screen for
> regulators of primary mammary stem and progenitor cells identifies
> roles for Asap1 and Prox1. BMC Cancer. 2015; 15(1): 221.