-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy pathindex.Rmd
355 lines (223 loc) · 12.8 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
---
title: "Interactive Q-Q and Manhattan Plots Using Plotly.js"
author: "Sahir Bhatnagar"
date: "`r Sys.Date()`"
output:
html_document:
number_sections: yes
self_contained: yes
toc: true
---
```{r setup, include=FALSE, echo=TRUE, message=FALSE}
library(plotly)
#library(manhattanly)
library(knitr)
knitr::opts_chunk$set(echo = TRUE,
tidy = FALSE,
cache = FALSE,
warning = FALSE,
message = FALSE)#,
# dpi = 60)
#comment = "#>")
#knitr::opts_knit$set(eval.after = 'fig.cap')
```
**Author**: [Sahir Bhatnagar](http://sahirbhatnagar.com/) ([email protected])
**Notes**:
* This vignette was built with `R markdown` and `knitr`. The source code for this vignette can be found [here](https://raw.githubusercontent.com/sahirbhatnagar/manhattanly/master/index.Rmd).
**************
# Introduction
[Manhattan](https://en.wikipedia.org/wiki/Manhattan_plot), [Q-Q](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot) and [volcano](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC154570/) plots are popular graphical methods for visualizing results from high-dimensional data analysis such as a (epi)genome wide asssociation study (GWAS or EWAS), in which p-values, Z-scores, test statistics are plotted on a scatter plot against their genomic position. Manhattan plots are used for visualizing
potential regions of interest in the genome that are associated with a phenotype. Q-Q plots tell us about the distributional assumptions of the observed test statistics. Volcano plots are the negative log10 p-values plotted against their effect size, odds ratio or log fold-change. They are used to identify clinically meaningful markers in genomic experiments, i.e., markers that are statistically significant and have an effect size greater than some threshold.
Interactive manhattan, Q-Q and volcano plots allow the inspection of a specific value (e.g. rs number or gene name) by hovering the mouse over a point, as well as zooming into a region of the
genome (e.g. a chromosome) by dragging a rectangle around the relevant area.
This pacakge creates interactive Q-Q, manhattan and volcano plots that are usable from the R console, the [`RStudio`](https://www.rstudio.com/) viewer pane, [`R Markdown`](http://rmarkdown.rstudio.com/) documents, in [`Shiny`](http://shiny.rstudio.com/) apps, embeddable in websites and can be exported as `.png` files. By hovering the mouse over a point, you can see annotation information such as the SNP identifier and GENE name. You can also drag a rectangle to
zoom in on a region of interest and then export the image as a `.png` file.
This work is based on the [`qqman`](https://github.com/stephenturner/qqman) by [Stephen Turner](http://stephenturner.us/) and the [`plotly.js`](https://plot.ly/)
engine. It produces similar manhattan and Q-Q plots as the `qqman::manhattan` and `qqman::qq` functions; the main difference here is being able to interact with the plot,
including extra annotation information and seamless integration with HTML.
***************
***************
# Installation
You can install `manhattanly` from CRAN:
```R
install.packages("manhattanly")
```
Alternatively, you can install the development version of `manhattanly` from [GitHub](https://github.com/sahirbhatnagar/manhattanly) with:
```R
install.packages("devtools")
devtools::install_github("sahirbhatnagar/manhattanly", build_vignettes = TRUE)
```
***************
***************
# Quick Start
The `manhattanly` package ships with an example dataset called `HapMap`. See `help(HapMap)` for more details about how this dataset was created. Here is what the `HapMap` dataset looks like:
```{r, message=TRUE}
# load the manhattanly library
library(manhattanly)
head(HapMap)
dim(HapMap)
```
The required columns to create a manhattan plot are the chromosome, base-pair position and p-value. By default, the `manhattanly` function assumes these columns are named `CHR`, `BP` and `P` (but these can be specified by the user if they are different)
Create an interactive manhattan plot using one command:
```{r}
manhattanly(HapMap, snp = "SNP", gene = "GENE")
```
***************
The arguments `snp = "SNP"` and `gene = "GENE"` specify that we want to add snp and gene information to each point. This information is found in the columns names `"SNP"` and `"GENE"` in the `HapMap` dataset. See `help(manhattanly)` for a full list of options.
Similarly, we can create an interactive Q-Q plot using one command (See `help(qqly)` for a full list of options):
```{r}
qqly(HapMap, snp = "SNP", gene = "GENE")
```
***************
You can then save the plot as a `.png` file by clicking on the camera icon in the toolbar (which appears when you hover your mouse over it).
***************
You can also make a volcano plot which by default, highlights the points greater than the default `genomewideline` and `effect_size_line` arguments (See `help(volcanoly)` for a full list of options):
```{r}
volcanoly(HapMap, snp = "SNP", gene = "GENE")
```
***************
***************
## Highlighting SNPs of Interest
We can also highlight SNPs of interest using the `highlight` argument. This package comes with a list of SNPs of interest called `significantSNP` (see `help(significantSNP)` for more details). To highlight these SNPs we simply pass this vector to the `highlight` argument (note that these SNPs need to be present in the `"SNP"` column of your data):
```{r, eval = TRUE}
manhattanly(HapMap, snp = "SNP", gene = "GENE", highlight = significantSNP)
```
***************
***************
## More annotations
You can add up to 4 annotations. In the following plot we add the snp, gene, the distance to the nearest gene and the effect size:
```{r, eval = TRUE}
manhattanly(HapMap, snp = "SNP", gene = "GENE",
annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE",
highlight = significantSNP)
```
***************
***************
## Adding Text Annotations to the Plot
The annotations in the previous plots only appear when we hover the mouse over the point. Once we have identified a SNP, or a few SNPs of interest we want to explicitly show the annotation information and save the plot. The output of the `manhattanly` function is an object which can be further manipulated using the `%>%` operator from the `magrittr` package:
```{r, eval = TRUE}
library(magrittr)
p <- manhattanly(HapMap, snp = "SNP", gene = "GENE",
annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE",
highlight = significantSNP)
# get the x and y coordinates from the pre-processed data
plotData <- manhattanr(HapMap, snp = "SNP", gene = "GENE")[["data"]]
# annotate the smallest p-value
annotate <- plotData[which.min(plotData$P),]
# x and y coordinates of SNP with smallest p-value
xc <- annotate$pos
yc <- annotate$logp
p %>% plotly::layout(annotations = list(
list(x = xc, y = yc,
text = paste0(annotate$SNP,"<br>","GENE: ",annotate$GENE),
font = list(family = "serif", size = 10))))
```
***************
You can then save the plot as a `.png` file by clicking on the camera icon in the toolbar (which appears when you hover your mouse over it).
***************
***************
## Sharing your Plots
By default, plotly for R runs locally in your web browser or in the R Studio viewer. It would be useful to be able to easily share these plot with for example, your collaborators or supervisor, especially in the during the exploratory data analysis stage of your project.
You can publish your charts to the web with [plotly's web service](https://plot.ly/) in three steps.
***************
### Step 1: Signup for a free plotly account
Create a free plotly account [here](https://plot.ly/). A plotly account is required to publish charts online. It's free to get started, and you control the privacy of your charts.
***************
### Step 2: Save your authentication credentials
Find your authentication API keys in your online settings. Set them in your R session with:
```{r, eval=FALSE}
Sys.setenv("plotly_username"="your_plotly_username")
Sys.setenv("plotly_api_key"="your_api_key")
```
***************
### Step 3: Publish your graphs to plotly with `plotly_POST`
```{r, eval=FALSE}
library(plotly)
# p is the interactive manhattan plot we saved earlier
plotly_POST(p, filename = "r-docs/manhattan", world_readable=TRUE)
```
* `filename` sets the name of the file inside your online plotly account.
* `world_readable` sets the privacy of your chart. If `TRUE`, the graph is publically viewable, if `FALSE`, only you can view it.
***************
***************
# Dynamic Documents with `knitr` and `R Markdown`
[`R Markdown`](http://rmarkdown.rstudio.com/) is a an `R` software package that allows the creation of dynamic documents, i.e., embed `R` code with text to create fully reproducible reports. Furthermore it allows easy creation of `HTML` reports without knowing how to code in `HTML` (such as this vignette). This means you can embed interactive manhattan and qq plots in `HTML` reports using the `manhattanly` package. For example, to embed the above manhattan plot I included the following code chunk in the [`.Rmd`](https://raw.githubusercontent.com/sahirbhatnagar/manhattanly/master/vignettes/manhattanly.Rmd) document:
<pre><code>```{r}
# plotly library needs to be loaded
library(plotly)
manhattanly(HapMap, snp = "SNP", gene = "GENE")
```</code></pre>
*****************
*****************
# Data Pre-Processing
The `manhattanly` package splits up the data pre-processing from the rendering of the plot object (inspired by the [`heatmaply`](https://github.com/talgalili/heatmaply) package by [Tal Galili](http://www.r-statistics.com/)). These are done by the `manhattanr` and `qqr` and functions:
```{r}
# create an object of class `manhattanr`
manhattanrObject <- manhattanr(HapMap)
# whats in there
str(manhattanrObject)
# the data used for plotting is a data.frame
# this data.frame can be used with any graphics function such as in base R
# you do not need to use plotly
head(manhattanrObject[["data"]])
is.data.frame(manhattanrObject[["data"]])
```
This `manhattanrObject` which is of class `manhattanr` can also be passed to the `manhattanly` function (we omit the plot here for the sake of size of the rendered vignette):
```{r, eval=FALSE}
manhattanly(manhattanrObject)
```
************
We can specify more annotations in the data using the `snp`, `gene`, `annotation1` and `annotation2` arguments:
```{r}
# create an object of class `manhattanr`
manhattanrObject <- manhattanr(HapMap, snp = "SNP", gene = "GENE",
annotation1 = "DISTANCE", annotation2 = "EFFECTSIZE")
# the annotation columns are now part of the data.frame
head(manhattanrObject[["data"]])
is.data.frame(manhattanrObject[["data"]])
```
***************
Similarly the data used for the Q-Q plot can be created using the `qqr` function:
```{r}
qqrObject <- qqr(HapMap)
str(qqrObject)
head(qqrObject[["data"]])
```
This `qqrObject` which is of class `qqr` can also be passed to the `qqly` function:
```{r, eval=TRUE}
qqly(qqrObject)
```
# Volcano Plots
By default, the points greater than the default `genomewideline` and `effect_size_line` arguments are highlighted. The defaults are `genomewideline = -log10(1e-5)` and `effect_size_line = c(-1,1)`. The `effect_size_line` argument must be a numeric vector of length 2 and the first argument must be smaller than the second. To highlight more points, you simply need to change those thresholds:
```{r}
volcanoly(HapMap, snp = "SNP", genomewideline = -log10(1e-2), effect_size_line = c(-0.5, 0.5))
```
Note that in order to highlight points, the `volcanoly` function requires an identifier for each point given by the `snp` argument. If the `snp` argument is left blank, then no highlighting will be done.
***************
Set either of the `genomewideline` and `effect_size_line` arguments to `FALSE` to remove that threshold:
```{r}
volcanoly(HapMap, snp = "SNP", genomewideline = -log10(1e-2), effect_size_line = FALSE)
```
***************
Instead of automatic highlighting, you can specify the points you want to highlight using the `highlight` argument:
```{r}
volcanoly(HapMap, snp = "SNP", highlight = significantSNP)
```
***************
Finally, to remove highlighting altogether while keeping the threshold lines set `highlight = FALSE` :
```{r}
volcanoly(HapMap, snp = "SNP", highlight = FALSE)
```
The `snp` argument is stil necessary however.
***************
Similar to the `manhattanr` and `qqr` functions, there is a `volcanor` function which is the pre-pocessing step and creates an object of class `volcanor`:
```{r}
volcanorObject <- volcanor(HapMap, snp = "SNP")
str(volcanorObject)
head(volcanorObject[["data"]])
```
***************
This `volcanorObject` which is of class `volcanor` can also be passed to the `volcanoly` function:
```{r, eval=TRUE}
volcanoly(volcanorObject)
```