-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
109 lines (64 loc) · 5.54 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
title: "Causal Microbiome Tutorial."
output:
md_document:
variant: markdown_github
---
# Randomization-based causal inference framework to analyze 16s rRNA gut microbiome data.
Causal inference framework for environment-microbiome data applied to American Gut data.
## Framework
![Image of Graphical abstract](misc/Fig1_graphical_abstract.png)
## Data access
American Gut Data subset [paper in preparation, Mishra and Müller 2021].
## Stage 2: Design
The R code for our pair matching implementation and diagnostic plots generation can be found in the [`design_AG`](design_AG) file. The matrix of 10,000 possible randomization of the intervention assignment is also generated directly after matching.
*Note 1*: the matching functions [Stephane_matching.R](misc/Stephane_matching.R) were written in Rcpp by Stéphane Shao.
*Note 2*: other matching strategies are valid. The researcher should take the conceptual hypothetical experiment into account when choosing its strategy.
## Stage 3: Analysis
The ASV (or OTU) data table and matched dataset are combined in a [phyloseq](https://joey711.github.io/phyloseq/) object before making statistical analyses. Thus, the following code can be used for any other data combined in a phyloseq object.
### Diversity
#### Richness and alpha-diversity
R code in [`1_alpha_diversity_AG`](1_alpha_diversity_AG) folder.
We used Amy Willis' R packages [`breakaway`](https://github.com/adw96/breakaway) for richness estimation [[Willis and Bunge, 2015](https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12332)] and [`DivNet`](https://github.com/adw96/DivNet) for Shannon index estimation [[Willis, 2020](https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxaa015/5841114)].
<ins>Richness result: </ins>
estimate: 108.3931; p-value: 0.133
<ins>Shannon index result: </ins>
estimate: -0.008072164; p-value: 0.659
#### Beta-diversity
R code in [`2_beta_diversity_AG`](2_beta_diversity_AG) folder.
The distance calculations where done with the phyloseq package and we used Anna Plantinga's R package [`MiRKAT`](https://cran.r-project.org/web/packages/MiRKAT/index.html) for the test statistic calculations [[Zhao et al., 2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570290/)].
<ins>Results: </ins>
- Aitchison:
estimate: 822866.9; p-value(adj.): 0.002
- Jaccard:
estimate: 132.9856; p-value(adj.): 0.002
- Gower:
estimate: 0.3761873; p-value(adj.): 0.501
### Compostion
#### Compositional equivalence
R code in [`3_mean_diff_test_AG`](3_mean_diff_test_AG) folder.
Cao, Lin, and Li's github repository: [`composition-two-sampe-test`](https://github.com/yuanpeicao/composition-two-sampe-test) [[Cao, Lin, and Li, 2018](https://academic.oup.com/biomet/article/105/1/115/4591648)].
<ins>Result: </ins>
estimate: 50.0806; p-value: 0.001
#### Differential abundance
R code in [`4_differential_abundance_AG`](4_differential_abundance_AG) folder.
We use the function `dacomp.test()` of Barak Brill' R package: [`dacomp`](https://github.com/barakbri/dacomp) to calculate the test statistic for all taxa at once [[Brill, Amir, and Heller, 2020](https://arxiv.org/abs/1904.08937)].
<ins>Reference set: </ins>
k_Bacteria;p_Firmicutes;c_Clostridia;o_Clostridiales;f_Lachnospiraceae;g_Dorea\
k_Bacteria;p_Firmicutes;c_Clostridia;o_Clostridiales;f_Lachnospiraceae;g_NA
<ins>Results: </ins>
Genera with p-value <= 0.02. \
k_Bacteria;p_Proteobacteria;c_Gammaproteobacteria;o_Enterobacteriales;f_Enterobacteriaceae;g_Raoultella\
k_Bacteria;p_Firmicutes;c_Clostridia;o_Clostridiales;f_Lachnospiraceae;g_Anaerostipes\
k_Bacteria;p_Proteobacteria;c_Alphaproteobacteria;o_Rickettsiales;f_mitochondria;g_Sarcandra
#### Correlation structure
R code in [`5_networks_AG`](5_networks_AG) folder.
[Peschel et al.'s (2020)](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbaa290/6017455) R package [`NetCoMi`](https://github.com/stefpeschel/NetCoMi) enables the estimation and comparision of networks for compositional data.
## References
[Holle et al., 2005] Holle R, Happich M, Löwel H, Wichmann HE (2005); [MONICA/KORA Study Group. KORA--a research platform for population based health research.](https://pubmed.ncbi.nlm.nih.gov/16032513/) *Gesundheitswesen*, 67.
[Willis and Bunge, 2015] Willis A and Bunge J (2015); [Estimating diversity via frequency ratios.](https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12332) *Biometric Methodology*, 71:1042-1049.
[Willis and Bryan, 2020] Willis A and Bryan DM (2020); [Estimating diversity in networked ecological communities](https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxaa015/5841114) *Biostatistics*, kxaa015.
[Zhao et al., 2015] Zhao N, Chen J, Carroll IM et al. (2015); [Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570290/) *Am J Hum Genet.*, 96(5):797-807.
[Cao, Lin, Li, 2018] Cao Y, Lin W, and Li H (2018); [Two-sample tests of high-dimensional means for compositional data.](https://academic.oup.com/biomet/article/105/1/115/4591648) *Biometrika*, 105:115-132.
[Brill, Amir, and Heller, 2020] Brill B, Amir A, and Heller R (2020) [Testing for differential abundance in compositional counts data, with application to microbiome studies.](https://arxiv.org/abs/1904.08937)] *arXiv*
[Peschel et al., 2020] Peschel et al. (2020) [NetCoMi: network construction and comparison for microbiome data in R.](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbaa290/6017455) *Briefings in Bioinformatics*, bbaa290.