-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathrepro_example.Rmd
111 lines (81 loc) · 6.25 KB
/
repro_example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: "Knitr / Rmarkdown example using _GenomicInteractions_"
author: "Liz Ing-Simmons"
output:
word_document: default
pdf_document:
toc: yes
html_document:
toc: yes
---
```{r global_options, echo=FALSE}
short=TRUE #if short==TRUE, do not echo code chunks
debug=FALSE
knitr::opts_chunk$set(echo=!short, warning=debug, message=debug, error=FALSE,
cache.path = "repro_example_cache/", fig.path = "repro_example_figures/")
```
## Introduction
This vignette shows you how GenomicInteractions can be used to investigate significant interactions in HiC data that has been analysed using [HOMER](http://homer.salk.edu/homer/) software [1]. GenomicInteractions can take a HOMER [interaction file](http://homer.salk.edu/homer/interactions/HiCinteractions.html) as input.
HiC data comes from [chromosome conformation capture](http://en.wikipedia.org/wiki/Chromosome_conformation_capture) followed by high-throughput sequencing. Unlike 3C, 4C or 5C, which target specific regions, it can provide genome-wide information about the spatial proximity of regions of the genome. The raw data takes the form of paired-end reads connecting restriction fragments. The resolution of a HiC experiment is limited by the number of paired-end sequencing reads produced and by the sizes of restriction fragments. To increase the power to distinguish real interactions from random noise, HiC data is commonly analysed in bins from 20kb - 1Mb. There are a variety of tools available for binning the data, controlling for noise (e.g. self-ligations of restriction fragments), and finding significant interactions.
The data we are using comes from [this paper](http://genome.cshlp.org/content/23/12/2066.full) [2] and can be downloaded from [GEO](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48763). It is HiC data from wild type mouse double positive thymocytes. The experiment was carried out using the HindIII restriction enzyme. The paired-end reads were aligned to the mm9 mouse genome assembly and HOMER software was used to filter reads and detect significant interactions at a resolution of 100kb. For the purposes of this vignette, we will consider only data from chromosomes 14 and 15.
## Making a GenomicInteractions object
Load the data by specifying the file location and experiment type. You can also include an experiment name and description.
```{r load_packages, echo=FALSE, cache=FALSE}
library(GenomicInteractions)
library(GenomicRanges)
```
```{r functions}
```
```{r load_hic, cache=TRUE}
hic_file <- system.file("extdata", "Seitan2013_WT_100kb_interactions.txt",
package="GenomicInteractions")
hic_data <- makeGenomicInteractionsFromFile(hic_file,
type="homer",
experiment_name = "HiC 100kb",
description = "HiC 100kb resolution")
seqlengths(hic_data) <- c(chr15 = 103494974, chr14 = 125194864)
```
The `GenomicInteractions` object consists of two linked `GenomicRanges` objects containing the anchors of each interaction, and the number of reads supporting each interaction. Metadata for each interaction (in this case, p-values and FDRs) is stored as a `DataFrame` accessed by `mcols()` or `elementMetadata()`, similar to the metadata of a simple `GRanges`. You can also access single metadata columns using `$`.
```{r subset_hic, fig.height=8, fig.width=8, cache=TRUE}
hic_data$p.value <- exp(hic_data$LogP)
hic_data$fdr <- hic_data$FDR.Benjamini..based.on.3.68e.08.total.tests.
hic_data_subset <- hic_data[hic_data$fdr < 0.1]
hic_data_subset
plotSummaryStats(hic_data_subset)
```
## Annotation
One of the most powerful features of GenomicInteractions is that it allows you to annotate interactions by whether the anchors overlap genomic features of interest, such as promoters or enhancers. However this process can be slow for large datasets, so here we annotate with just promoters.
Genome annotation data can be obtained from, for example, UCSC databases using the GenomicFeatures package. We will use promoters of Refseq genes extended to a width of 5kb. Downloading all the data can be a slow process, so the data for promoters for chromosomes 14 and 15 is provided with this package.
```{r get_annotations, eval=FALSE}
## Not run
library(GenomicFeatures)
mm9.refseq.db <- makeTxDbFromUCSC(genome="mm9", table="refGene")
refseq.genes = genes(mm9.refseq.db)
refseq.transcripts = transcriptsBy(mm9.refseq.db, by="gene")
refseq.transcripts = refseq.transcripts[ names(refseq.transcripts) %in% unlist(refseq.genes$gene_id) ]
mm9_refseq_promoters <- promoters(refseq.transcripts, 2500,2500)
mm9_refseq_promoters <- unlist(mm9_refseq_promoters[seqnames(mm9_refseq_promoters) %in% c("chr14", "chr15")])
```
`annotateInteractions` takes a list of features in GRanges or GRangesList format and annotates the interaction anchors based on overlap with these features. The list of annotation features should have descriptive names, as these names are stored in the annotated GenomicInteractions object and used to assign anchor (node) classes.
```{r annotate, cache=TRUE}
data("mm9_refseq_promoters")
annotation.features <- list(promoter = mm9_refseq_promoters)
annotateInteractions(hic_data_subset, annotation.features)
```
### Interaction types
Interaction types are determined by the classes of the interacting nodes. As we only have two node classes, we have three possible interaction classes, summarised in the plot below. Most of the interactions are between promoters. We can subset the data to look at interaction types that are of particular interest.
```{r plot_ints, cache=TRUE}
plotInteractionAnnotations(hic_data_subset)
```
You can also summarise interaction types in a table.
```{r table_ints, cache=TRUE}
knitr::kable(categoriseInteractions(hic_data_subset))
```
## References
1. Heinz S, Benner C, Spann N, Bertolino E et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol Cell 2010 May 28;38(4):576-589.
2. Seitan, V. C. et al. Cohesin-based chromatin interactions enable regulated gene expression within pre-existing architectural compartments. Genome Res. 23, 2066-77 (2013).
## Session details
This report was generated on `r format(Sys.time(), "%a %b %d %Y at %X")`.
```{r session_info, include=TRUE}
sessionInfo()
```