Skip to content

human-nature-lab/Phenotype-paper

Repository files navigation

Authors: Shivkumar Vishnempet Shridhar†, Francesco Beghini†, Marcus Alexander, Adarsh Singh, Rigoberto Matute Juárez, Ilana L. Brito*, and Nicholas A. Christakis* By: Human Nature Lab (Yale), and Brito Group (Cornell), United States of America

Honduras Microbiome phenotype project

This github repo describes workflow and codes used in Honduras microbiome study:

Abstract:

Despite a growing interest in the gut microbiome of non-industrialized countries, data linking deeply sequenced microbiome from such settings to diverse host phenotypes and situational factors remains uncommon. Using metagenomic data from a community-based cohort of 1,871 people from isolated villages in the Mesoamerican highlands of western Honduras, we report novel associations between bacterial species and human phenotypes and factors. Among them, socioeconomic factors account for 51.44 % of the total associations. Meta-analysis of species-level profiles across several datasets identified several species associated with Body Mass Index, consistent with previous findings. Furthermore, the inclusion of strain-phylogenetic information modifies the overall relationship between the gut microbiome and the phenotypes, especially for some factors like household wealth (e.g., wealthier individuals harbor different strains of Eubacterium rectale). Our analysis suggests new roles that gut microbiome surveillance can play in understanding broad features of individual and public health.

Contents:

  • Phenotype-microbiome associations
  • Phenotype-Pathway associations
  • Meta-analysis of BMI
  • Polymorphic sites
  • Supplementary figures
  • Supplementary data
  • Miscellaneous

Species abundance profiles

Relative abundances of species and pathways generated by Metaphlan4 and Humann3 respectively after pre-processing.

Microbiome-phenotype association

We evaluated associations between gut microbiome and phenotypes using linear mixed model.

Species abundance ~ age + sex + BMI + batch effect + bristol stool scale + DNA concentration + Sampling date + 1|village + phenotype

where the relative abundance of species was transformed using the centred additive log-ratio (CLR) transformation.

Microbiome-pathway association

We evaluated associations between metabolic pathways and phenotypes using linear mixed model.

Pathway abundance ~ age + sex + BMI + batch effect + bristol stool scale + DNA concentration + Sampling date + 1|village + phenotype

Calculation of microbiome variance explained by phenotypes and pathways

The microbiome composition variance explained by phenotypes was calculated by permutational multivariate analysis of variance using distance matrices, implemented in the adonis function for R package vegan (v.2.6), using 1000 permutations and a Bray-Curtis distance matrix calculated using relative abundances of microbial species. Variance explained was also performed using relative abundances of MetaCyc microbial biochemical pathways separately.

Meta-analysis of BMI (Body Mass Index)

We screened publicly available datasets using the curatedMetagenomicData package (v3.6.2) to look for cohorts from similar populations and sharing the most number of available metadata. We identified a total of 5 non-western studies having in common BMI (Asnicar F et.al (2021), HMP (2012), HMP (2019), Qin N et.al (2014)) along with 4 western cohorts (Kaur K et.al (2020), Lokmer A et.al (2019), Obregon-Tito AJ (2015), Pasolli E et.al (2019), Rubel MA et.al (2020)) amounting to 5,001 samples. Data was downloaded from NCBI SRA using the accessions available through curatedMetagenomicData and processed using the same pipeline described beforehand. We then performed a meta-analysis on BMI values using species-level relative abundances using. Age, gender, and lifestyle category (non-western or not) were used as controls. We discretized age by binning the value into three levels: child-adolescent (< 18), adult (18-60), and senior (> 60). A random effect meta-analysis was performed using species-level relative abundances normalized with CLR using the meta package (v 4.9-9). After using linear model to obtain correlation coefficients, the metacor function (from meta package) was used to Random effects using Paule-Mandel estimator method. P-values obtained were adjusted using FDR (Benjamini-Hochberg corrected). In total, 21 species were found significant after corrections.

Strain phenotype analysis

For strain-level analysis, we used the Almer function in “evolvability” package (v 2.0.0). Almer incorporates phylogenetic trees of species as correlated random effects structure. This aspect is written in as the A argument, which can be taken in from the generated sparse matrix of the phylogenetic tree from ape package.

Species abundance ~ age+sex+BMI+batch effect+bristol stool scale+ DNA concentration+Sampling date+phenotype+1|village+1|phyl, A=list(phyl=A1)

Here, “phyl” is the sample names present in the phylogenetic tree. A1 is the sparse matrix generated from phylogenetic tree. In order to evaluate the strain-phylogenetic effect, results from lmer model (previous section) and the results from Almer model were compared and contrasted. Overall, among the 78,597 species-phenotype pairs (639 species and 123 phenotypes), 52,864 pairs were chosen after filtering for phylogenetic signal. Phylogenetic signal was estimated using “phylosig” function in “phytools” package (v 1.9-23) using ‘lambda’ method. The phylogenetic signal was estimated for the phylogenetic tree of each species vs phenotype of interest.

Polymorphic sites

For polymorphic sites, files suffixed with “.polymorphic” in StrainPhlAn4 output were used after discarding 0’s in the “percentage of polymorphic sites” column. Following this, linear regression was used to investigate the relationship between polymorphic site percentage and individual host phenotypes.

% of polymorphic sites ~ Phenotype