-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
86 lines (45 loc) · 5.12 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: "Randomization-based causal inference framework to analyze 16s rRNA gut microbiome data."
output:
md_document:
variant: markdown_github
---
# Randomization-based causal inference framework to analyze 16s rRNA gut microbiome data.
## Framework
![Image of Graphical abstract](misc/Fig1_graphical_abstract.png)
## Data access
The KORA cohort study [[Holle et al. (2005)](https://pubmed.ncbi.nlm.nih.gov/16032513/)] data is only accessible after applying to a research project on the [KORA-passt](https://epi.helmholtz-muenchen.de) platform.
In the [`microbiome_ASV_data`](microbiome_ASV_data) folder are the details about the data pre-processing and the data files handed-in after successful application to a KORA project.
## Stage 2: Design
The R code for our pair matching implementation and diagnostic plots generation can be found in the [`design`](design) file. The matrix of 10,000 possible randomization of the intervention assignment is also generated directly after matching.
*Note 1*: the matching functions [Stephane_matching.R](misc/Stephane_matching.R) were written in Rcpp by Stéphane Shao.
*Note 2*: other matching strategies are valid. The researcher should take the conceptual hypothetical experiment into account when choosing its strategy.
*Note 3*: to make the matching easier we re-formated/coded the original KORA variables. See [`misc/format_KORA_variables.R`](misc/format_KORA_variables.R) file.
## Stage 3: Analysis
The ASV (or OTU) data table, matched dataset, and phylogenetic tree are combined in a [phyloseq](https://joey711.github.io/phyloseq/) object before making statistical analyses. Thus, the following code can be used for any other data combined in a phyloseq object.
### Diversity
#### Richness and alpha-diversity
R code in [`1_alpha_diversity`](1_alpha_diversity) folder.
We used Amy Willis' R packages [`breakaway`](https://github.com/adw96/breakaway) for richness estimation [[Willis and Bunge, 2015](https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12332)] and [`DivNet`](https://github.com/adw96/DivNet) for Shannon index estimation [[Willis, 2020](https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxaa015/5841114)].
#### Beta-diversity
R code in [`2_beta_diversity`](2_beta_diversity) folder.
The distance calculations where done with the phyloseq package and we used Anna Plantinga's R package [`MiRKAT`](https://cran.r-project.org/web/packages/MiRKAT/index.html) for the test statistic calculations [[Zhao et al., 2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570290/)].
### Compostion
#### Compositional equivalence
R code in [`3_mean_diff_test`](3_mean_diff_test) folder.
Cao, Lin, and Li's github repository: [`composition-two-sampe-test`](https://github.com/yuanpeicao/composition-two-sampe-test) [[Cao, Lin, and Li, 2018](https://academic.oup.com/biomet/article/105/1/115/4591648)].
#### Differential abundance
R code in [`4_differential_abundance`](4_differential_abundance) folder.
We use the function `dacomp.test()` of Barak Brill' R package: [`dacomp`](https://github.com/barakbri/dacomp) to calculate the test statistic for all taxa at once [[Brill, Amir, and Heller, 2020](https://arxiv.org/abs/1904.08937)].
#### Correlation structure
R code in [`5_networks`](5_networks) folder.
[Peschel et al.'s (2020)](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbaa290/6017455) R package [`NetCoMi`](https://github.com/stefpeschel/NetCoMi) enables the estimation and comparision of networks for compositional data.
#### Further analyses (metabolites)
## References
[Holle et al., 2005] Holle R, Happich M, Löwel H, Wichmann HE (2005); [MONICA/KORA Study Group. KORA--a research platform for population based health research.](https://pubmed.ncbi.nlm.nih.gov/16032513/) *Gesundheitswesen*, 67.
[Willis and Bunge, 2015] Willis A and Bunge J (2015); [Estimating diversity via frequency ratios.](https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12332) *Biometric Methodology*, 71:1042-1049.
[Willis and Bryan, 2020] Willis A and Bryan DM (2020); [Estimating diversity in networked ecological communities](https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxaa015/5841114) *Biostatistics*, kxaa015.
[Zhao et al., 2015] Zhao N, Chen J, Carroll IM et al. (2015); [Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570290/) *Am J Hum Genet.*, 96(5):797-807.
[Cao, Lin, Li, 2018] Cao Y, Lin W, and Li H (2018); [Two-sample tests of high-dimensional means for compositional data.](https://academic.oup.com/biomet/article/105/1/115/4591648) *Biometrika*, 105:115-132.
[Brill, Amir, and Heller, 2020] Brill B, Amir A, and Heller R (2020) [Testing for differential abundance in compositional counts data, with application to microbiome studies.](https://arxiv.org/abs/1904.08937)] *arXiv*
[Peschel et al., 2020] Peschel et al. (2020) [NetCoMi: network construction and comparison for microbiome data in R.](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbaa290/6017455) *Briefings in Bioinformatics*, bbaa290.