-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
321 lines (220 loc) · 10.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
---
output:
github_document:
#toc: true
---
# cloudos <img src="man/figures/logo/hexlogo.png" align="right" height=140/>
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
<!-- badges: start -->
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![R build status](https://github.com/lifebit-ai/cloudos/workflows/R-CMD-check/badge.svg)](https://github.com/lifebit-ai/cloudos/actions)
<!-- badges: end -->
**cloudos** R package makes it easy to interact with Lifebit's CloudOS platform in an R environment.
## Installation
You can install the latest release of **cloudos** from:
+ [CRAN](https://cran.r-project.org/package=cloudos):
```{r, eval=FALSE}
install.packages("cloudos")
```
+ [conda-forge](https://anaconda.org/conda-forge/r-cloudos):
```{shell, eval=FALSE}
conda install -c conda-forge r-cloudos
```
+ [GitHub](https://github.com/lifebit-ai/cloudos/):
```{r, eval=FALSE}
if (!require(remotes)) { install.packages("remotes") }
remotes::install_github("lifebit-ai/cloudos")
```
Alternatively, you can install the latest development version of **cloudos**:
```{shell, eval=FALSE}
git clone https://github.com/lifebit-ai/cloudos
cd cloudos
git checkout origin/devel
Rscript -e 'devtools::install(".")'
```
## Usage
Below is a demonstration of how the **cloudos** package can be used.
### Load the library
```{r}
library(cloudos)
library(knitr) # For better visualization of wide dataframes in this README examples
library(magrittr) # For pipe
```
### Configure CloudOS
This package is primarily a means of communicating with a CloudOS instance using it's API. Before it can communicate with the CloudOS instance, the package must be configured with some key information:
- The CloudOS base URL. This is the URL in your browser when you navigate to the Cohort Browser in CloudOS. Often of the form `https://my_instance.lifebit.ai/app/cohort-browser`.
- The CloudOS token. Navigate to settings page in CloudOS to generate an API key you can use as your token (see image below).
- The CloudOS team ID. Also found in the settings page in CloudOS labelled as the "Workspace ID" (see image below).
![CloudOS settings page](man/figures/settings_page.png)
The package will look for this information in the following locations in this order:
1. From environment variables `CLOUDOS_BASEURL`, `CLOUDOS_TOKEN`, and `CLOUDOS_TEAMID`.
2. From a cloudos configuration file.
There are three ways to configure the package:
1. Add them to `~/.Renviron` in the following way, which will load the environment variables on beginning of the R-session
```{r, eval=FALSE}
CLOUDOS_BASEURL="xxx"
CLOUDOS_TOKEN="xxx"
CLOUDOS_TEAMID="xxx"
```
2. Add them during an R session using `Sys.setenv(ENV_VAR = "env_var_value")`
```{r, eval=FALSE}
Sys.setenv(CLOUDOS_BASEURL = "xxx")
Sys.setenv(CLOUDOS_TOKEN = "xxx")
Sys.setenv(CLOUDOS_TEAMID = "xxx")
```
3. Use the function `cloudos_configure()`, which will create a `~/.cloudos/config` that will persist between R sessions and be read from each time (Recommended way if you are using multiple cloudos clients).
```{r, eval=FALSE}
cloudos_configure(base_url = "xxx",
token = "xxx",
team_id = "xxx")
```
## Application - Cohort Browser
**Below information is out of date, please refer to the [latest function docs](https://lifebit-ai.github.io/cloudos/reference/index.html)**.
Cohort Browser is part of Lifebit's CloudOS offering. Let's explore how to interact with this in R environment.
### List Cohorts
To check list of available cohorts in a workspace.
```{r}
cohorts <- cb_list_cohorts()
cohorts %>% head(n=5) %>% kable()
```
### Create a cohort
To create a new cohort.
```{r}
my_cohort <- cb_create_cohort(cohort_name = "Cohort-R",
cohort_desc = "This cohort is for testing purpose, created from R.")
my_cohort
```
### Get a cohort
Get a available cohort in to a cohort R object. This cohort object can be used in many different other functions.
```{r}
other_cohort <- cb_load_cohort(cohort_id = "610ac00edb7c7a1d9d0c309f")
other_cohort
```
### Explore available phenotypes
#### Search phenotypes
Search for phenotypes based on a term. Searching with `term = ""` will return all the available phenotypes.
```{r}
disease_phenotypes <- cb_search_phenotypes(term = "disease")
disease_phenotypes %>% head(n=5) %>% kable()
```
Let's choose a phenotype from the above table. The "id" is the most important part as it will allow us
to use this phenotype for cohort queries and other functions.
```{r}
# get the first row/phenotype in the table
my_phenotype <- disease_phenotypes[5,]
my_phenotype %>% kable()
```
#### Get distribution of cohort participants for a phenotype
Let's check the numbers of participants across the categories of this phenotype.
```{r}
# phenotype
my_pheno_data <- cb_get_phenotype_statistics(cohort = my_cohort,
pheno_id = my_phenotype$id)
my_pheno_data %>% head(n=10) %>% kable()
```
### Update a cohort with a new query
A query defines what particpants are included in a cohort based on phenotypes.
Phenotypes can be continuous - in which case a selected range needs to be specified, or they can be categorical - in which case selected categories need to be specified.
#### Continuous phenotype
For phenotype "Year of birth" (with id = 8)
```{r}
# cb_get_phenotype_metadata(8)$name
# "Year of birth"
cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 8) %>% head(n=10) %>% kable()
```
#### Categorical phenotype
For phenotype "Total full brothers" (with id = 48).
```{r}
# cb_get_phenotype_metadata(48)$name
# "Total full brothers"
cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 48) %>% kable()
```
#### Simple query
Now let's restrict our cohort to a set of participants based on the phenotypes we explored above.
Simple queries allow us to create a cohort query that combines a list of phenotype criteria according to a logical AND. Let's define a simple query using the following named list-based format (note the phenotype id is quoted):
```{r}
# year of birth: 1965 - 1995 ; AND total full brothers: 1 or 2
simple_query = list("8" = list("from" = 1965, "to" = 1995),
"48" = c(1, 2))
```
Let's check how many participants would be in the cohort if we applied this query, but without actually applying it.
```{r}
cb_participant_count(cohort = my_cohort, simple_query = simple_query, keep_query = F)
```
If we're happy that this is a sensible query to apply, we can apply the query to the cohort, making sure to override the previous query by setting `keep_query` to `FALSE`. If we wanted to keep the criteria from the pre-exisitng query and add our new phenotype-based criteria to them we would leave `keep_query` set to the defualt value of `TRUE`.
```{r}
# apply the query
cb_apply_query(cohort = my_cohort, simple_query = simple_query, keep_query = F)
# update the local cohort object with info from the changed version on the server
my_cohort <- cb_load_cohort(my_cohort@id)
# double check that the cohort has th number of participants we expected
cb_participant_count(cohort = my_cohort)
```
We could now further restrict our cohort to include only females (phenotype "Participant phenotypic sex", id = 10) by using `keep_query = TRUE`. In other words, this argument applies a query that looks like "old query AND new query".
```{r}
# apply the query
cb_apply_query(cohort = my_cohort, simple_query = list("10" = "Female"), keep_query = T)
# update the local cohort object with info from the changed version on the server
my_cohort <- cb_load_cohort(my_cohort@id)
# check the number of participants
cb_participant_count(my_cohort)
```
#### Advanced query
We could adjust our previous filter to restrict participants to those who are born from 1965 to 1995 OR have 1 or 2 full brothers as well as being female (`( Total full brothers = 1, 2 OR 1965 < Year of birth < 1995 ) AND Participant phenotypic sex = Female`). We could achieve this with an advanced query which uses a more complicated (but more flexible) nested list format (note the phenotype id is not quoted):
```{r}
adv_query <- list(
"operator" = "AND",
"queries" = list(
list("id" = 10, "value" = "Female"),
list(
"operator" = "OR",
"queries" = list(
list("id" = 8, "value" = list("from" = 1965, "to" = 1995)),
list("id" = 48, "value" = c(1, 2))
)
)
)
)
```
Available operators in advanced queries: `"AND"`, `"OR"`, `"NOT"`.
Lets apply this query to our cohort and inspect the distribution of our phenotype of interest in the cohort.
```{r}
# apply the query
cb_apply_query(cohort = my_cohort, adv_query = adv_query, keep_query = F)
# update the local cohort object with info from the changed version on the server
my_cohort <- cb_load_cohort(my_cohort@id)
# view the distribution of disease groups in our cohort
cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 206) %>% head(n=10) %>% kable()
```
### Retreive the participant table
Now lets get a participant phenotype table with the columns of interest for our cohort.
First we have to update the cohort on the cohort browser server to set what columns will be in the table. Currently the best way to do this is to use (counterintuitively) `cb_apply_query` to add the IDs of the phenotypes of interest as columns.
```{r}
cb_apply_query(my_cohort, column_ids = c(208, 10, 8, 48), keep_columns = T)
my_cohort <- cb_load_cohort(my_cohort@id)
```
Now we can fetch the participant phenotype table which includes these columns.
```{r}
pheno_df <- cb_get_participants_table(cohort = my_cohort,
page_size = cb_participant_count(my_cohort)$count)
pheno_df %>% head(n=10) %>% kable()
```
### Get genotypic table
Get the genotypic table for a cohort (currently only cohort browser version 1 is supported).
```{r, eval=F}
cohort_genotype <- cb_get_genotypic_table(cohort = my_cohort)
cohort_genotype %>% head(n=2) %>% kable()
```
## Additional notes
This package is under active development. If you find any issues, please reach out here - https://github.com/lifebit-ai/cloudos/issues
For documentation visit - https://lifebit-ai.github.io/cloudos/
## License
MIT © [Lifebit](https://lifebit.ai/)