Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft update alpha diversity chapter #657

Merged
merged 15 commits into from
Jan 25, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions inst/assets/bibliography.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,51 @@
@article{Ma2019,
title = {Diversity-disease relationships and shared species analyses for human microbiome-associated diseases},
volume = {13},
ISSN = {1751-7370},
url = {http://dx.doi.org/10.1038/s41396-019-0395-y},
DOI = {10.1038/s41396-019-0395-y},
number = {8},
journal = {{The ISME Journal}},
publisher = {Oxford University Press (OUP)},
author = {Ma, Zhanshan (Sam) and Li, Lianwei and Gotelli, Nicholas J},
year = {2019},
month = {mar},
pages = {1911–1919}
}

@article{Valles-Colomer2019GBMs,
author = {Valles-Colomer, Mireia and Falony, Gwen and Darzi, Youssef and Tigchelaar, Ettje F. and Wang, Jun and Tito, Raul Y. and Schiweck, Carmen and Kurilshikov, Alexander and Joossens, Marie and Wijmenga, Cisca and Claes, Stephan and Van Oudenhove, Lukas and Zhernakova, Alexandra and Vieira-Silva, Sara and Raes, Jeroen},
title = {The neuroactive potential of the human gut microbiota in quality of life and depression},
journal = {{Nature Microbiology}},
ISSN = {2058-5276},
DOI = {10.1038/s41564-018-0337-x},
url = {https://doi.org/10.1038/s41564-018-0337-x},
year = {2019},
type = {Journal Article}
}

@article{vandeputte2017quantitative,
title={Quantitative microbiome profiling links gut community variation to microbial load},
author={Vandeputte, Doris and Kathagen, Gunter and D’hoe, Kevin and Vieira-Silva, Sara and Valles-Colomer, Mireia and Sabino, Jo{\~a}o and Wang, Jun and Tito, Raul Y and De Commer, Lindsey and Darzi, Youssef and Vermeire, Séverine and Falony, Gwen and Raes, Jeroen},
journal={{Nature}},
volume={551},
number={7681},
pages={507--511},
year={2017},
publisher={{Nature Publishing Group UK London}}
}

@article{bastiaanssen2023bugs1,
title={Bugs as features (part 1): concepts and foundations for the compositional data analysis of the microbiome--gut--brain axis},
author={Bastiaanssen, Thomaz FS and Quinn, Thomas P and Loughman, Amy},
journal={{Nature Mental Health}},
volume={1},
number={12},
pages={930--938},
doi = {10.1038/s44220-023-00148-3},
year={2023},
publisher={{Nature Publishing Group US New York}}
}

@Article{Xu2023,
author = {Shuangbin Xu and Li Zhan and Wenli Tang and Qianwen Wang and Zehan Dai and Lang Zhou and Tingze Feng and Meijun Chen and Tianzhi Wu and Erqiang Hu and Guangchuang Yu},
Expand Down
211 changes: 83 additions & 128 deletions inst/pages/alpha_diversity.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,94 +5,55 @@ library(rebook)
chapterPreamble()
```

Community diversity is a central concept in microbiome research. Several
diversity indices are available in the ecological literature.

The main categories of diversity indices include species richness,
evenness, and diversity: each of these emphasizes different aspects of
the community heterogeneity [@Whittaker1960], [@Willis2019]. The _Hill
coefficient_ combines many standard indices into a single equation
that provides observed richness, inverse Simpson, Shannon diversity,
and generalized diversity as special cases, with varying levels of
emphasis on species abundance values. Thus, the term _alpha diversity_
is often used to collectively refer to all these variants.

**Diversity** summarizes the distribution of
species abundances in a given sample into a single number that
depends on both species richness and evenness (see below). Diversity
indices measure the overall community heterogeneity that considers
both of these aspects simultaneously. A number of ecological
diversity measures are available. In general, diversity increases
together with increasing richness and evenness. **Phylogenetic
diversity** (PD), [@Faith1992], is a variant that incorporates
information from phylogenetic relationships between species, unlike
most other commonly used diversity indices. The `addAlpha()`
function uses a faster reimplementation of the widely used function
in _`picante`_ [@R_picante, @Kembel2010]. The method uses the
default rowTree from the `TreeSummarizedExperiment` object (`tse`).

**Richness** refers to the total number of species in a community
(sample). The simplest richness index is the number of species
observed in a sample (observed richness). Assuming limited sampling
from the community, however, this may underestimate the true species
richness. Several estimators have been developed to address this,
including for instance ACE [@Chao1992] and Chao1 [@Chao1984]
indices. Richness estimates do not aim to characterize variations in
species abundances.

Nonparametric richness estimators such as Chao1 and ACE, however, must not be
used with amplicon sequence variant (ASV) data. Algorithms that generate ASVs,
like DADA2 and Deblur, typically remove singletons, which are essential for
these richness calculations. This removal leads to meaningless results.
Although ASVs offer higher resolution than operational taxonomic units (OTUs)
and are increasingly used, the removal of singletons invalidates the application
of Chao1 and ACE. Therefore, alternative alpha diversity metrics that do not
depend on singletons or doubletons should be considered, or OTUs could be
used specifically for alpha diversity analysis to retain low-abundance taxa.
Additionally, the inability of denoising algorithms to distinguish true
singleton sequences from artifacts further complicates the issue, making
traditional richness estimators unsuitable for ASV datasets, which are often
standardized for sequencing depth.[@Deng2024]

**Evenness** focuses on the distribution of species abundances, and it
can thus complement the number of species. Pielou's evenness is a
commonly used index, obtained by normalizing Shannon diversity by
(the natural logarithm of) observed richness.

These main classes of alpha diversity are sometimes complemented with
indices of dominance or rarity:

**Dominance** indices are in general negatively correlated with alpha
diversity. A high dominance is obtained when one or a few species have
a high share of the total species abundance in the community. Note
that dominance indices are generally inversely correlated with other
alpha diversity indices.

**Rarity** indices characterize the concentration of species at low
abundance. Prevalence and detection thresholds determine rare
species whose total concentration will determine the value of a
rarity index.
thomazbastiaanssen marked this conversation as resolved.
Show resolved Hide resolved

## Alpha diversity estimation in practice

### Calculate diversity measures {#sec-estimate-diversity}

Alpha diversity can be estimated with `addAlpha()` wrapper function that interact
with other packages implementing the calculation, such as `vegan`
[@R_vegan].

These functions calculate the given indices, and add them to the
`colData` slot of the `SummarizedExperiment` object with the given
`name`.

The estimated values can then be retrieved and analyzed directly from
the `colData`, for example, by plotting them using `plotColData()` from
the `scater` package [@R_scater]. Here, we use the `observed`
species as a measure of richness.
## Background

Certain indices have additional options, here observed has `detection` parameter
that control the detection threshold. Species over this threshold is considered
as detected. See full list of options from from `help(addAlpha)`.
Alpha diversity, or within-sample diversity, is a central concept in microbiome
research. In ecological literature, several distinct but related alpha diversity
indices, often referring to richness, evenness and diversity, are commonly used
[@Willis2019;@Whittaker1960]. The term _alpha diversity_ is often used to
collectively refer to all these indices.

### Applications

Alpha diversity is predominantly used to quantify complexity in the microbiome.
In the general adult population, lower alpha diversity and lower bacterial load
have been associated to worse overall physical and mental health [@Valles-Colomer2019GBMs;@vandeputte2017quantitative]. However, this principle
may not generalize to other populations, most notably in early life
and in patient cohorts [@Ma2019].

### Approaches

The majority of alpha diversity metrics are closely related, though this is not
evident from their names. Bastiaanssen et al. [-@bastiaanssen2023bugs1] lay out
this relationship across two factors. First, alpha diversity metrics can be
defined as special cases of a unifying equation where the _Hill coefficient_
determines whether the equation captures for instance richness, inverse Simpson
or Shannon entropy. Second, some alpha diversity metrics are weighed based on
phylogeny, like Faith's PD [-@Faith1992] and PhILR [@Silverman2017].

![Alpha Diversity metrics are numerically related and can be classified along two axes. Here, we show Hill coefficient on the x-axis and whether the index considers phylogeny on the y-axis](figures/fig_13_1_alphadiv.png)


thomazbastiaanssen marked this conversation as resolved.
Show resolved Hide resolved

::: {.callout-note}
## Note: Richness estimators and denoising

Several estimators have been developed to address the confounding effect of
limited sampling size on observed richness, most notably ACE [@Chao1992] and
Chao1 [@Chao1984]. Notably, these approaches may yield misleading results for
modern 16S data, which commonly features denoising and removal of singletons
[@Deng2024].
:::

## Examples

### Calculate alpha diversity measures {#sec-estimate-diversity}

Alpha diversity can be estimated with the `addAlpha()` function, which interacts
with other packages implementing the calculation, such as `vegan` [@R_vegan] and
_`picante`_ [@R_picante; @Kembel2010].
These functions calculate the given indices, and add them to the `colData` slot
of the `SummarizedExperiment` object with the given `name`.

```{r plot-richness, message=FALSE, cache=TRUE}
#| context: setup
Expand All @@ -102,26 +63,35 @@ library(mia)
data("GlobalPatterns", package="mia")
tse <- GlobalPatterns

# Estimate (observed) richness
# Compute one or multiple indices simultaneously through the index 'parameter'.
tse <- addAlpha(
tse, assay.type = "counts", index = "observed", name = "observed",
tse, assay.type = "counts", index = c("observed", "shannon", "faith"),
detection = 10)

# Check some of the first values in colData
tse$observed |> head()
tse$shannon |> head()
```
Certain indices have additional options, here observed has `detection` parameter
that control the detection threshold. Species over this threshold is considered
as detected. See full list of options from from `help(addAlpha)`.

::: {.callout-tip}
## Tip

You can calculate multiple indices simultaneously by specifying multiple indices
in the `index` parameter.
::: {.callout-note}
## Note: Phylogenetic distances require a tree

For example: `index = c("observed", "shannon")`
Because `tse` is a `TreeSummarizedExperiment` object, its phylogenetic tree is
used by default. However, the optional argument `tree` must be provided if `tse`
does not contain a rowTree.
:::

Let's visualize the results against selected `colData` variables (sample
type and final barcode).
### Visualize alpha diversity measures {#sec-plot-diversity}

As alpha diversity metrics typically summarize high-dimensional samples into
singular values, many visualization approaches are available. Once calculated,
these metrics can be analyzed directly from the `colData`, for example, by
plotting them using `plotColData()` from the `scater` package [@R_scater]. Here,
we use the `observed` species as a measure of richness. Let's visualize the
results against selected `colData` variables (sample type and final barcode).

```{r plot-div-obs, message=FALSE, fig.cap="Shannon diversity estimates plotted grouped by sample type with colour-labeled barcode.", cache=TRUE}
library(scater)
Expand All @@ -134,36 +104,7 @@ plotColData(
labs(x = "Sample types", y = expression(Richness[Observed]))
```

We can then analyze the statistical significance. We use the non-parametric
Wilcoxon or Mann-Whitney test, as it is more flexible than the commonly used
Student's t-Test, since it does not assume normality.

```{r}
#| label: test_alpha1

pairwise.wilcox.test(
tse[["observed"]], tse[["SampleType"]], p.adjust.method = "fdr")
```

### Faith phylogenetic diversity {#sec-faith-diversity}

The Faith index is returned by the function `addAlpha()`. It utilizes the widely
used function in _`picante`_ [@R_picante, @Kembel2010].

```{r phylo-div-1}
tse <- addAlpha(tse, assay.type = "counts", index = "faith")
tse$faith |> head()
```

::: {.callout-note}
## Note

Because `tse` is a `TreeSummarizedExperiment` object, its phylogenetic tree is
used by default. However, the optional argument `tree` must be provided if
`tse` does not contain one.
:::

## Alpha diversity measure comparisons {#sec-compare-alpha}
#### Alpha diversity measure comparisons {#sec-compare-alpha}

We can compare alpha diversities for example by calculating correlation between
them. Below, a visual comparison between shannon and faith indices is shown
Expand Down Expand Up @@ -211,7 +152,20 @@ wrap_plots(plots, ncol = 1) +
plot_layout(guides = "collect")
```

## Visualizing significance in group-wise comparisons
### Statistical analysis of alpha diversity measures {#sec-stats-diversity}

We can then analyze the statistical significance. We use the non-parametric
Wilcoxon or Mann-Whitney test, as it is more flexible than the commonly used
Student's t-Test, since it does not assume normality.

```{r}
#| label: test_alpha1

pairwise.wilcox.test(
tse[["observed"]], tse[["SampleType"]], p.adjust.method = "fdr")
```

#### Visualizing significance in group-wise comparisons

Next, let's compare the Shannon index between sample groups and visualize the
statistical significance. Using the `stat_compare_means` function from the
Expand Down Expand Up @@ -274,6 +228,7 @@ p <- plotColData(
p
```

## Further reading
Article on
[`ggpubr` package](http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/)
provides further examples for estimating and highlighting significances.
Binary file added inst/pages/figures/fig_13_1_alphadiv.png
TuomasBorman marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading