Skip to content

AliceSommer/Pipeline_Microbiome

Repository files navigation

Randomization-based causal inference framework to analyze 16s rRNA gut microbiome data.

Framework

Image of Graphical abstract

Data access

The KORA cohort study [Holle et al. (2005)] data is only accessible after applying to a research project on the KORA-passt platform.

In the microbiome_ASV_data folder are the details about the data pre-processing and the data files handed-in after successful application to a KORA project.

Stage 2: Design

The R code for our pair matching implementation and diagnostic plots generation can be found in the design file. The matrix of 10,000 possible randomization of the intervention assignment is also generated directly after matching.

Note 1: the matching functions Stephane_matching.R were written in Rcpp by Stéphane Shao.

Note 2: other matching strategies are valid. The researcher should take the conceptual hypothetical experiment into account when choosing its strategy.

Note 3: to make the matching easier we re-formated/coded the original KORA variables. See misc/format_KORA_variables.R file.

Stage 3: Analysis

The ASV (or OTU) data table, matched dataset, and phylogenetic tree are combined in a phyloseq object before making statistical analyses. Thus, the following code can be used for any other data combined in a phyloseq object.

Diversity

Richness and alpha-diversity

R code in 1_alpha_diversity folder.

We used Amy Willis’ R packages breakaway for richness estimation [Willis and Bunge, 2015] and DivNet for Shannon index estimation [Willis, 2020].

Beta-diversity

R code in 2_beta_diversity folder.

The distance calculations where done with the phyloseq package and we used Anna Plantinga’s R package MiRKAT for the test statistic calculations [Zhao et al., 2015].

Compostion

Compositional equivalence

R code in 3_mean_diff_test folder.

Cao, Lin, and Li’s github repository: composition-two-sampe-test [Cao, Lin, and Li, 2018].

Differential abundance

R code in 4_differential_abundance folder.

We use the function dacomp.test() of Barak Brill’ R package: dacomp to calculate the test statistic for all taxa at once [Brill, Amir, and Heller, 2020].

Correlation structure

R code in 5_networks folder.

Peschel et al.’s (2020) R package NetCoMi enables the estimation and comparision of networks for compositional data.

Further analyses (metabolites)

References

[Holle et al., 2005] Holle R, Happich M, Löwel H, Wichmann HE (2005); MONICA/KORA Study Group. KORA–a research platform for population based health research. Gesundheitswesen, 67.

[Willis and Bunge, 2015] Willis A and Bunge J (2015); Estimating diversity via frequency ratios. Biometric Methodology, 71:1042-1049.

[Willis and Bryan, 2020] Willis A and Bryan DM (2020); Estimating diversity in networked ecological communities Biostatistics, kxaa015.

[Zhao et al., 2015] Zhao N, Chen J, Carroll IM et al. (2015); Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test. Am J Hum Genet., 96(5):797-807.

[Cao, Lin, Li, 2018] Cao Y, Lin W, and Li H (2018); Two-sample tests of high-dimensional means for compositional data. Biometrika, 105:115-132.

[Brill, Amir, and Heller, 2020] Brill B, Amir A, and Heller R (2020) Testing for differential abundance in compositional counts data, with application to microbiome studies.] arXiv

[Peschel et al., 2020] Peschel et al. (2020) NetCoMi: network construction and comparison for microbiome data in R. Briefings in Bioinformatics, bbaa290.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages