This repository has been archived by the owner on Apr 12, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME.Rmd
244 lines (179 loc) · 11.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
---
title: "LncPipeReporter"
output: github_document
always_allow_html: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, base.dir = '.', eval = FALSE)
```
[![Build Status](https://travis-ci.org/bioinformatist/LncPipeReporter.svg?branch=master)](https://travis-ci.org/bioinformatist/LncPipeReporter)
[![codecov](https://codecov.io/gh/bioinformatist/LncPipeReporter/branch/master/graph/badge.svg)](https://codecov.io/gh/bioinformatist/LncPipeReporter)
An R package for automatically aggregating and summarizing lncRNA analysis results.
## Overview
Most of bioinformatics tools, such as aligners like [STAR](https://github.com/alexdobin/STAR),
[TopHat](http://ccb.jhu.edu/software/tophat/index.shtml)
and [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) generate log files by default. A lastest nextflow-based lncRNA sequenceing data analysis pipeline, known as [LncPipe](https://github.com/likelet/LncPipe), produces a file containing lncRNA basic features.
This project is a part of LncPipe (but can also be used solely) that take charge of automatically generating reports in `HTML` format with interactive plots based on pipeline output. It contains several ploting functions as well as analysis scripts to perform comparison analysis and differential expression analysis when experimental design information was available. We speculated this tools can facilitate understanding the underlining machanism of known and novel lncRNAs in their experiment.
## Gallery
Gif animations were recorded using [phw/peek](https://github.com/phw/peek).
LncPipeReporter generated interactive plots support **arbitrary scaling**, **filtering** with tags refer to **real value** implemented via [plotly](https://github.com/ropensci/plotly).
![](imgs/f1.gif)
There are also interactive tables exhibiting **the first 80 lines** of the `data.frame`/`data.table`, which could be exported as **many forms**, allowing for **searching**, **filtering** and **ordering**.
![](imgs/f2.gif)
The **user-adjusted** plots can always be saved as **static figures**, then could be temporarily placed in your manuscripts for peer-review. Once time comes to publication, you may use [publish-deserved version](#results) instead.
![](imgs/f3.gif)
## Features
- **Common result files in lncRNA sequencing data analysis pipeline are well suppoted.** The package is designed to handle with several types of files (click to see the example file content):
- [STAR log file](inst/extdata/demo_results/LWS2.Log.final.out)
- [HISAT2 log file](inst/extdata/demo_results/N1037.log)
- [TopHat log file](inst/extdata/demo_results/align_summary.txt)
- [Experimental design information](inst/extdata/demo_results/design.file)
- [RSEM or expression matrix from other tools](inst/extdata/demo_results/lncRNA.rsem.count.txt)
- [Basic features of lncRNAs](inst/extdata/demo_results/basic_charac.txt)
- **File can be found anywhere.** Users can put all up-stream analysis result files simply in a folder (even with other files). They will be found out **recursively** from the folder and its subdirectories.
- **File types can be guessed.** Users **never** need designate file types explicitly or even send a file containing name list as a paramter when use LncPipe reporter.
- **Flexible use.** User can send **arbitrary type or number** of files at a time, for instance, more than one STAR log files, or both STAR and HISAT2 log files, or even without any alignment log files.
- **More themes available.** Users can apply for a series of pretty theme brought by ggsci. See [Parameters] for details.
- **Multiple differential expression analysis method supported.** Up to now, users can choose one of [edgeR](http://www.bioconductor.org/packages/release/bioc/html/edgeR.html), [DESeq2](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html) or [NOISeq](http://www.bioconductor.org/packages/release/bioc/html/NOISeq.html) as differential expression analysis tool.
- **High resolution static figures with detailed results in *csv* is provided.** Users will get figures which can be used for publication in *tiff* format (with *300 ppi resolution* and *lzw compression* performed) and *pdf* format (could be modified in *AI*, etc.). Also, LncPipeReporter always brings you analysis result tables (comma-separated, can be opened/edited by *MS Excel*, etc.), for details, see [Results].
## Installation
LncPipeReporter currently only support **Unix-like operation system**.
> Because it contains several lines of *Perl 5 one-liner* for parsing multiple log files.
I'll use pure R code instead in the future to make it a cross-platform package.
The main reporter *Rmd* file is constructed from Rmarkdown files of **R Markdown v2 document**,
so **you must install `pandoc` first**:
For Arch Linux:
```bash
$ sudo pacman -S pandoc
```
For other operation systems or Linux distributions, see [pandoc's official documentation](https://pandoc.org/installing.html).
> You can't build from source in **Microsoft-R-Open** early than v3.4.2, due to [its bug](https://github.com/Microsoft/microsoft-r-open/issues/26).
For some packages need `fortran` for compiling, you should install fortran compiler first:
```bash
$ sudo apt-get install gfortran
```
Run in R session:
```{r}
install.packages("devtools")
devtools::install_github("bioinformatist/LncPipeReporter")
```
If there's any problem during installation, please refer to [FAQ].
## How to use
> Caution: Though users never need specify file types, the sample name should be embedded in the **first part** (use both `.` and `_` as file name delimiter) of file name's prefix, for example, the sample name of *LWS2.Log.final.out* and *N1037.log* will be obtained as *LWS2* and *N1037*.
> If you use DESeq2 or NOISeq as differentially expression analysis tool, the order of sample names in experimental design information file should be consistent with the expression matrix columns.
> It is highly recommended that users should use **Chrome** web browser for looking through reports produced by LncPipeReporter.
### Try the simplest run with default parameters
```{r}
library(LncPipeReporter)
run_reporter()
```
### Specify the parameter values with user-interface
```{r}
library(LncPipeReporter)
# DO NOT use T as short name of TRUE
run_reporter(ask = TRUE)
```
### Call with user-defined parameter values
```{r}
library(LncPipeReporter)
run_reporter(input = system.file(file.path("extdata", "demo_results"),package = "LncPipeReporter"),
output = 'reporter.html',
theme = 'npg',
cdf.percent = 10,
max.lncrna.len = 10000,
min.expressed.sample = 50,
ask = FALSE)
```
### Call in shell scripts or command line (Nextflow, etc.)
List the paramters with values as a R `list` object:
```bash
$ Rscript -e "library(LncPipeReporter); run_reporter(input = '.', ...)"
```
> `...` stands for other arguments. You should use **single-quotes** here.
Parameters with their names and default values were listed below:
### Parameters
| Name | Default value | Description |
|-----------|--------------|-------------|
| input | `extdata/demo_results` | Absolute path of input directory (results of up-stream analysis) |
| output | `~/reporter.html` | index file name (In HTML format) |
| output_dir | `~/LncPipeReports` | output directory (who holds all results and dependencies) |
| de.method | 'edger' | Differential expression analysis method, could be 'edger'(default), 'noiseq' or 'deseq2' |
| theme | `npg` | Journal palette applied to all plots supplied by [ggsci](https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html#discrete-color-palettes) |
| cdf.percent | `10%` | Percentage of values to display when calculating coding potential |
| max.lncrna.len | `10000` | Maximum length of lncRNAs to display when calculating distribution |
| min.expressed.sample | `50%` | Minimal percentage of expressed samples |
| ask | FALSE | need set parameters with graphical user-interface in browser? |
For details and examples, please type `help(run_reporter)` or `?run_reporter` in R session for documentation.
## Results
By default, LncPipeReporter will generate a directory named as `LncPipeReports` at your `$HOME` (**you can [set another place](#parameters) yourself**) that holds all results as well as dependencies, so you should always move/copy the **whole** folder. The contents of the output directory seems like:
```pre
LncPipeReports/
├── figures
│ ├── CDF.pdf
│ ├── CDF.tiff
│ ├── compare_density.pdf
│ ├── compare_density.tiff
│ ├── compare_violin.pdf
│ ├── compare_violin.tiff
│ ├── HISAT2.pdf
│ ├── HISAT2.tiff
│ ├── lncRNA_length_distribution.pdf
│ ├── lncRNA_length_distribution.tiff
│ ├── lncRNA_length_distribution_with_type.pdf
│ ├── lncRNA_length_distribution_with_type.tiff
│ ├── pca.pdf
│ ├── pca.tiff
│ ├── STAR.pdf
│ ├── STAR.tiff
│ ├── TopHat2.pdf
│ ├── TopHat2.tiff
│ ├── vocano.pdf
│ └── vocano.tiff
├── libs
│ ├── bootstrap-3.3.5
│ ├── crosstalk-1.0.0
│ ├── datatables-binding-0.2
│ ├── dt-core-1.10.12
│ ├── dt-ext-buttons-1.10.12
│ ├── dt-plugin-searchhighlight-1.10.12
│ ├── htmlwidgets-0.9
│ ├── ionicons-2.0.1
│ ├── jquery-1.12.4
│ ├── jszip-1.10.12
│ ├── pdfmake-1.10.12
│ ├── plotly-binding-4.7.1.9000
│ ├── plotlyjs-1.31.2.9000
│ ├── stickytableheaders-0.1.19
│ └── typedarray-0.1
├── reporter.html
└── tables
├── DE.csv
├── HISAT2.csv
├── STAR.csv
└── TopHat2.csv
18 directories, 25 files
```
> This tree thumbnail is represented for output with differentially expression analysis via edgeR. The results from the other tools may be slightly different.
## FAQ
If `devtools::install_github()` raise `Installation failed: Problem with the SSL CA cert (path? access rights?)` error, try:
```{r}
install.packages(c("curl", "httr"))
```
During installation there may be some configuration error (lack of libraries):
```pre
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libcurl was not found. Try installing:
* deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
* rpm: libcurl-devel (Fedora, CentOS, RHEL)
* csw: libcurl_dev (Solaris)
If libcurl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
```
Just follow the instruction to satisfy the dependencies. For instance, you can run `sudo apt-get install libcurl4-openssl-dev` in *Ubuntu* to fix the problem above.
> LncPipeReporter use Bioconductor package *edgeR* to perform differential expression analysis, so if you get `'BiocInstaller' must be installed to install Bioconductor packages.`, please choose `1 (Yes)`. Since then you may see `Installation failed: cannot open the connection to 'https://bioconductor.org/biocLite.R'`, run `source('http://bioconductor.org/biocLite.R')`, finally try the installation commands above again.
> Please wait for minutes then **try again** if solving some dependencies from *GitHub* fails with `Connection timed out after 100001 milliseconds`.
## License
This package is free and open source software, licensed under [GPL v3.0](LICENSE).