Skip to content

Commit

Permalink
Add plotRDA function to beta diversity chapter (#343)
Browse files Browse the repository at this point in the history
* Add link to benchmarking and minor polish

* Simplify section on supervised ordination

* Add clarifications to DAA with confounding

* Fix beta diversity bug

* Minor change

* Add exercise on DAA method comparison

* Minor fix

* Minor fix

* Streamline RDA section with new plotRDA function

* Fix rmd table in beta diversity chapter

* Fix miaTime missing error

* Fix miaTime missing error

* Add dendextend to DESCRIPTION

* Add other missing deps

* Fix dep names

* Add multiassay analyses deps

* Remove reticulate from deps

* Add deps for extra materials

* Improve PCoA example

* Fix deployment

* Implement pseudocount = TRUE and minor fixes

* Update 30_differential_abundance (#348)

* Replace pseudocount 1 with TRUE throughout book

* Add table of typical beta div combinations

* Fix pseudocount bug

* Update beta diversity table

---------

Co-authored-by: Elina297 <[email protected]>
  • Loading branch information
RiboRings and Elina297 authored Sep 27, 2023
1 parent bcf0961 commit 1bcbabe
Show file tree
Hide file tree
Showing 6 changed files with 99 additions and 126 deletions.
2 changes: 1 addition & 1 deletion 04_containers.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Let us load example data and rename it as tse.

```{r}
library(mia)
data(hitchip1006, package="miaTime")
data("hitchip1006", package = "miaTime")
tse <- hitchip1006
```

Expand Down
199 changes: 83 additions & 116 deletions 20_beta_diversity.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,29 @@ knitr::opts_chunk$set(

# Community Similarity {#community-similarity}

Whereas alpha diversity focuses on community variation within a community
(one sample), beta diversity quantifies the dissimilarity between communities
(multiple samples). In microbiome research, the most popular metrics of beta
Beta diversity quantifies the dissimilarity between communities (multiple
samples), as opposed to alpha diversity which focuses on variation within a
community (one sample). In microbiome research, commonly used metrics of beta
diversity include the Bray-Curtis index (for compositional data), Jaccard index
(for presence / absence data, ignoring abundance information), Aitchison distance
(for presence/absence data, ignoring abundance information), Aitchison distance
(Euclidean distance for clr transformed abundances, aiming to avoid the
compositionality bias), and the Unifrac distance (that takes into account the
phylogenetic tree information). Notably, only some of these measures are actual
_distances_, as this is a mathematical concept whose definition is not satisfied
by certain ecological measure, such as the Bray-Curtis index. Therefore, the terms
dissimilarity and beta diversity are preferred.

| Method description | Assay type | Beta diversity metric |
|:---------------------------:|:-------------------:|:---------------------:|
| Quantitative profiling | Absolute counts | Bray-Curtis |
| Relative profiling | Relative abundances | Bray-Curtis |
| Aitchison distance | Absolute counts | Aitchison |
| Aitchison distance | clr | Euclidean |
| Robust Aitchison distance | rclr | Euclidean |
| Presence/Absence similarity | Relative abundances | Jaccard |
| Presence/Absence similarity | Absolute counts | Jaccard |
| Phylogenetic distance | Rarefied counts | Unifrac |

In practice, beta diversity is usually represented as a `dist` object, a
triangular matrix where the distance between each pair of samples is encoded by
a specific cell. This distance matrix can then undergo ordination, which is an
Expand All @@ -47,23 +58,6 @@ Reduction (UMAP), whereas the latter is mainly represented by distance-based
Redundancy Analysis (dbRDA). We will first discuss unsupervised ordination
methods and then proceed to supervised ones.

To run the examples in this chapter, the following packages should be imported:

* mia: microbiome analysis framework
* scater: plotting reduced dimensions
* vegan: ecological distances
* ggplot2: plotting
* patchwork: combining plots
* dplyr: pipe operator

```{r betadiv-packages, include = FALSE}
library(mia)
library(scater)
library(vegan)
library(ggplot2)
library(patchwork)
library(dplyr)
```

## Unsupervised ordination {#unsupervised-ordination}

Expand All @@ -75,16 +69,22 @@ demonstration we will analyse beta diversity in GlobalPatterns, and observe the
variation between stool samples and those with a different origin.

```{r prep-tse}
# Example data
# Load mia and import sample dataset
library(mia)
data("GlobalPatterns", package = "mia")
# Data matrix (features x samples)
tse <- GlobalPatterns
# some beta diversity metrics are usually applied to relative abundances
# Beta diversity metrics like Bray-Curtis are often applied to relabundances
tse <- transformAssay(tse,
assay.type = "counts",
method = "relabundance")
# Other metrics like Aitchison to clr-transformed data
tse <- transformAssay(tse,
assay.type = "relabundance",
method = "clr",
pseudocount = TRUE)
# Add group information Feces yes/no
tse$Group <- tse$SampleType == "Feces"
```
Expand All @@ -106,12 +106,15 @@ dimensions via an ordination method, the results of which can be stored in the
and `runNMDS` functions.

```{r runMDS}
# Perform PCoA
# Load package to plot reducedDim
library(scater)
# Run PCoA on relabundance assay with Bray-Curtis distances
tse <- runMDS(tse,
FUN = vegan::vegdist,
method = "bray",
name = "PCoA_BC",
assay.type = "relabundance")
assay.type = "relabundance",
name = "MDS_bray")
```

Sample dissimilarity can be visualized on a lower-dimensional display (typically
Expand All @@ -121,11 +124,11 @@ size and other aesthetics. Can you find any difference between the groups?

```{r plot-mds-bray-curtis, fig.cap = "MDS plot based on the Bray-Curtis distances on the GlobalPattern dataset."}
# Create ggplot object
p <- plotReducedDim(tse, "PCoA_BC",
p <- plotReducedDim(tse, "MDS_bray",
colour_by = "Group")
# Calculate explained variance
e <- attr(reducedDim(tse, "PCoA_BC"), "eig")
e <- attr(reducedDim(tse, "MDS_bray"), "eig")
rel_eig <- e / sum(e[e > 0])
# Add explained variance for each axis
Expand All @@ -135,32 +138,54 @@ p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep
p
```

With additional tools from the ggplot2 package, ordination methods can be
compared to find similarities between them or select the most suitable one to
visualize beta diversity in the light of the research question.
A few combinations of beta diversity metrics and assay types are typically
used. For instance, Bray-Curtis dissimilarity and Euclidean distance are often
applied to the relative abundance and the clr assays, respectively. Besides
**beta diversity metric** and **assay type**, the **PCoA algorithm** is also a
variable that should be considered. Below, we show how the choice of these three
factors can affect the resulting lower-dimensional data.

```{r mds-nmds-comparison, results='hide'}
# Run NMDS on relabundance assay with Bray-Curtis distances
tse <- runNMDS(tse,
FUN = vegan::vegdist,
method = "bray",
assay.type = "relabundance",
name = "NMDS_bray")
```{r plot-mds-nmds-comparison, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or euclidean distances on the GlobalPattern dataset."}
# Run MDS on clr assay with Aitchison distances
tse <- runMDS(tse,
FUN = vegan::vegdist,
name = "MDS_euclidean",
method = "euclidean",
assay.type = "counts")
assay.type = "clr",
name = "MDS_aitchison")
# Run NMDS on clr assay with Euclidean distances
tse <- runNMDS(tse,
FUN = vegan::vegdist,
name = "NMDS_BC")
method = "euclidean",
assay.type = "clr",
name = "NMDS_aitchison")
```

tse <- runNMDS(tse,
FUN = vegan::vegdist,
name = "NMDS_euclidean",
method = "euclidean")
Multiple ordination plots are combined into a multi-panel plot with the
patchwork package, so that different methods can be compared to find similarities
between them or select the most suitable one to visualize beta diversity in the
light of the research question.

```{r, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or Aitchison distances on the GlobalPattern dataset."}
# Load package for multi-panel plotting
library(patchwork)
plots <- lapply(c("PCoA_BC", "MDS_euclidean", "NMDS_BC", "NMDS_euclidean"),
# Generate plots for all 4 reducedDims
plots <- lapply(c("MDS_bray", "MDS_aitchison",
"NMDS_bray", "NMDS_aitchison"),
plotReducedDim,
object = tse,
colour_by = "Group")
((plots[[1]] | plots[[2]]) / (plots[[3]] | plots[[4]])) +
# Generate multi-panel plot
wrap_plots(plots) +
plot_layout(guides = "collect")
```

Expand All @@ -169,16 +194,14 @@ relationship of features in form on a `phylo` tree. `calculateUnifrac`
performs the calculation to return a `dist` object, which can again be
used within `runMDS`.

```{r}
```{r plot-unifrac, fig.cap = "Unifrac distances scaled by MDS of the GlobalPattern dataset."}
tse <- runMDS(tse,
FUN = mia::calculateUnifrac,
name = "Unifrac",
tree = rowTree(tse),
ntop = nrow(tse),
assay.type = "counts")
```
```{r plot-unifrac, fig.cap = "Unifrac distances scaled by MDS of the GlobalPattern dataset."}
plotReducedDim(tse, "Unifrac",
colour_by = "Group")
```
Expand Down Expand Up @@ -240,6 +263,9 @@ would report relative stress, which varies in the unit interval and is better
if smaller. This can be calculated as shown below.

```{r relstress}
# Load vegan package
library(vegan)
# Quantify dissimilarities in the original feature space
x <- assay(tse, "relabundance") # Pick relabunance assay separately
d0 <- as.matrix(vegdist(t(x), "bray"))
Expand Down Expand Up @@ -282,10 +308,10 @@ them. The result shows how much each covariate affects beta diversity. The table
below illustrates the relation between supervised and unsupervised ordination
methods.

| | supervised ordination | unsupervised ordination |
|:-------------------------:|:----------------------:|:------------------------:|
| Euclidean distance | RDA | PCA |
| non-Euclidean distance | dbRDA | PCoA |
| | supervised ordination | unsupervised ordination |
|:------------------------:|:----------------------:|:------------------------:|
| Euclidean distance | RDA | PCA |
| non-Euclidean distance | dbRDA | PCoA/MDS, NMDS and UMAP |

We demonstrate the usage of dbRDA with the enterotype dataset, where samples
correspond to patients. The colData contains the clinical status of each patient
Expand Down Expand Up @@ -325,7 +351,7 @@ function. We see that both clinical status and age explain more than 10% of the
variance, but only age shows statistical significance.

```{r rda-permanova-res}
rda_info$permanova %>%
rda_info$permanova |>
knitr::kable()
```

Expand All @@ -334,79 +360,20 @@ information from the results of RDA. In this case, none of the p-values is lower
than the significance threshold, and thus homogeneity is observed.

```{r rda-homogeneity-res}
rda_info$homogeneity %>%
rda_info$homogeneity |>
knitr::kable()
```

Next, we proceed to visualize the weight and significance of each variable on
the similarity between samples with an RDA plot, which can be generated with
the following custom function.
the `plotRDA` function from the miaViz package.

```{r plot-rda}
# Load packages for plotting function
library(stringr)
library(ggord)
rda <- attr(reducedDim(tse2, "RDA"), "rda")
# Covariates that are being analyzed
variable_names <- c("ClinicalStatus", "Gender", "Age")
# Since na.exclude was used, if there were rows missing information, they were
# dropped off. Subset coldata so that it matches with rda.
coldata <- colData(tse2)[ rownames(rda$CCA$wa), ]
# Adjust names
# Get labels of vectors
vec_lab_old <- rownames(rda$CCA$biplot)
# Loop through vector labels
vec_lab <- sapply(vec_lab_old, FUN = function(name){
# Get the variable name
variable_name <- variable_names[ str_detect(name, variable_names) ]
# If the vector label includes also group name
if( !any(name %in% variable_names) ){
# Get the group names
group_name <- unique( coldata[[variable_name]] )[
which( paste0(variable_name, unique( coldata[[variable_name]] )) == name ) ]
# Modify vector so that group is separated from variable name
new_name <- paste0(variable_name, " \U2012 ", group_name)
} else{
new_name <- name
}
# Add percentage how much this variable explains, and p-value
new_name <- expr(paste(!!new_name, " (",
!!format(round( rda_info$permanova[variable_name, "Explained variance"]*100, 1), nsmall = 1),
"%, ",italic("P"), " = ",
!!gsub("0\\.","\\.", format(round( rda_info$permanova[variable_name, "Pr(>F)"], 3),
nsmall = 3)), ")"))
return(new_name)
})
# Add names
names(vec_lab) <- vec_lab_old
# Create labels for axis
xlab <- paste0("RDA1 (", format(round( rda$CCA$eig[[1]]/rda$CCA$tot.chi*100, 1), nsmall = 1 ), "%)")
ylab <- paste0("RDA2 (", format(round( rda$CCA$eig[[2]]/rda$CCA$tot.chi*100, 1), nsmall = 1 ), "%)")
# Create a plot
plot <- ggord(rda, grp_in = coldata[["ClinicalStatus"]], vec_lab = vec_lab,
alpha = 0.5,
size = 4, addsize = -4,
#ext= 0.7,
txt = 3.5, repel = TRUE,
#coord_fix = FALSE
) +
# Adjust titles and labels
guides(colour = guide_legend("ClinicalStatus"),
fill = guide_legend("ClinicalStatus"),
group = guide_legend("ClinicalStatus"),
shape = guide_legend("ClinicalStatus"),
x = guide_axis(xlab),
y = guide_axis(ylab)) +
theme( axis.title = element_text(size = 10) )
plot
library(miaViz)
# Generate RDA plot coloured by clinical status
plotRDA(tse2, "RDA", colour_by = "ClinicalStatus")
```

From the plot above, we can see that only age significantly describes
Expand Down
6 changes: 3 additions & 3 deletions 23_multi-assay_analyses.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ bacterium X is present, is the concentration of metabolite Y lower or higher"?
# Agglomerate microbiome data at family level
mae[[1]] <- mergeFeaturesByPrevalence(mae[[1]], rank = "Family")
# Does log10 transform for microbiome data
mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = 1)
mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = TRUE)
# Give unique names so that we do not have problems when we are creating a plot
rownames(mae[[1]]) <- getTaxonomyLabels(mae[[1]])
Expand Down Expand Up @@ -193,8 +193,8 @@ mae[[2]] <- transformAssay(mae[[2]], assay.type = "nmr",
# Transforming biomarker data with z-transform
mae[[3]] <- transformAssay(mae[[3]], assay.type = "signals",
MARGIN = "features",
method = "z", pseudocount = 1)
MARGIN = "features",
method = "z", pseudocount = 1)
# Removing assays no longer needed
assay(mae[[1]], "counts") <- NULL
Expand Down
1 change: 0 additions & 1 deletion 30_differential_abundance.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,6 @@ zicoseq_res %>%
```{r plot-zicoseq}
## x-axis is the effect size: R2 * direction of coefficient
ZicoSeq.plot(ZicoSeq.obj = zicoseq_out,
meta.dat = as.data.frame(colData(tse)),
pvalue.type = 'p.adj.fdr')
```

Expand Down
5 changes: 1 addition & 4 deletions 97_extra_materials.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -232,23 +232,20 @@ plot(posterior, par="Lambda", focus.cov = rownames(X)[c(2,4)])
## Interactive 3D Plots

```{r, message=FALSE, warning=FALSE}
# Installing libraryd packages
# Load libraries
library(rgl)
library(plotly)
```

```{r setup2, warning=FALSE, message=FALSE}
library(knitr)
library(rgl)
knitr::knit_hooks$set(webgl = hook_webgl)
```


In this section we make a 3D version of the earlier Visualizing the most dominant genus on PCoA (see \@ref(quality-control)), with the help of the plotly [@Sievert2020].

```{r, message=FALSE, warning=FALSE}
# Installing the package
library(curatedMetagenomicData)
# Importing necessary libraries
library(curatedMetagenomicData)
library(dplyr)
Expand Down
Loading

0 comments on commit 1bcbabe

Please sign in to comment.