Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plotRDA function to beta diversity chapter #343

Merged
merged 29 commits into from
Sep 27, 2023
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
1a1f6c7
Add link to benchmarking and minor polish
RiboRings Aug 22, 2023
7daf432
Simplify section on supervised ordination
RiboRings Aug 22, 2023
f8c26df
Add clarifications to DAA with confounding
RiboRings Aug 22, 2023
8baac91
Fix beta diversity bug
RiboRings Aug 23, 2023
8195dee
Minor change
RiboRings Aug 23, 2023
2951452
Add exercise on DAA method comparison
RiboRings Aug 26, 2023
6aa13ee
Fix conflict
RiboRings Aug 26, 2023
49adb3f
Minor fix
RiboRings Aug 26, 2023
4222694
Minor fix
RiboRings Aug 26, 2023
65014ef
Streamline RDA section with new plotRDA function
RiboRings Sep 11, 2023
58fcf3f
Solve conflicts
RiboRings Sep 11, 2023
0a3e503
Fix rmd table in beta diversity chapter
RiboRings Sep 11, 2023
092d9c2
Fix miaTime missing error
RiboRings Sep 11, 2023
84b0e88
Fix miaTime missing error
RiboRings Sep 12, 2023
a555513
Add dendextend to DESCRIPTION
RiboRings Sep 12, 2023
079e23c
Add other missing deps
RiboRings Sep 12, 2023
67e32e5
Fix dep names
RiboRings Sep 12, 2023
f0764e8
Add multiassay analyses deps
RiboRings Sep 12, 2023
7f8190f
Remove reticulate from deps
RiboRings Sep 12, 2023
db78f67
Add deps for extra materials
RiboRings Sep 12, 2023
dc6b188
Improve PCoA example
RiboRings Sep 14, 2023
a59b829
Fix deployment
RiboRings Sep 14, 2023
1dd02ff
Implement pseudocount = TRUE and minor fixes
RiboRings Sep 23, 2023
a7a4bf8
Update 30_differential_abundance (#348)
Elina297 Sep 25, 2023
a857606
Replace pseudocount 1 with TRUE throughout book
RiboRings Sep 25, 2023
af6f052
Add table of typical beta div combinations
RiboRings Sep 25, 2023
2ba62ec
Merge remote-tracking branch 'origin/next_stage' into next_stage
RiboRings Sep 25, 2023
a9f91f3
Fix pseudocount bug
RiboRings Sep 26, 2023
f460fc5
Update beta diversity table
RiboRings Sep 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 04_containers.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Let us load example data and rename it as tse.

```{r}
library(mia)
data(hitchip1006, package="miaTime")
data("hitchip1006", package = "miaTime")
tse <- hitchip1006
```

Expand Down
197 changes: 81 additions & 116 deletions 20_beta_diversity.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,27 @@ knitr::opts_chunk$set(

# Community Similarity {#community-similarity}

Whereas alpha diversity focuses on community variation within a community
(one sample), beta diversity quantifies the dissimilarity between communities
(multiple samples). In microbiome research, the most popular metrics of beta
Beta diversity quantifies the dissimilarity between communities (multiple
samples), as opposed to alpha diversity which focuses on variation within a
community (one sample). In microbiome research, commonly used metrics of beta
diversity include the Bray-Curtis index (for compositional data), Jaccard index
(for presence / absence data, ignoring abundance information), Aitchison distance
(for presence/absence data, ignoring abundance information), Aitchison distance
(Euclidean distance for clr transformed abundances, aiming to avoid the
compositionality bias), and the Unifrac distance (that takes into account the
phylogenetic tree information). Notably, only some of these measures are actual
_distances_, as this is a mathematical concept whose definition is not satisfied
by certain ecological measure, such as the Bray-Curtis index. Therefore, the terms
dissimilarity and beta diversity are preferred.

| Method description | Assay type | Beta diversity metric |
|:--------------------------:|:-------------------:|:---------------------:|
| Quantitative profiling | Absolute counts | Bray-Curtis |
| Relative profiling | Relative abundances | Bray-Curtis |
| Aitchison distance | clr | Euclidean |
| Robust Aitchison distance | rclr | Euclidean |
| Present/Absence similarity | Absolute counts | Jaccard |
| Phylogenetic distance | Absolute counts | Unifrac |

RiboRings marked this conversation as resolved.
Show resolved Hide resolved
In practice, beta diversity is usually represented as a `dist` object, a
triangular matrix where the distance between each pair of samples is encoded by
a specific cell. This distance matrix can then undergo ordination, which is an
Expand All @@ -47,23 +56,6 @@ Reduction (UMAP), whereas the latter is mainly represented by distance-based
Redundancy Analysis (dbRDA). We will first discuss unsupervised ordination
methods and then proceed to supervised ones.

To run the examples in this chapter, the following packages should be imported:

* mia: microbiome analysis framework
* scater: plotting reduced dimensions
* vegan: ecological distances
* ggplot2: plotting
* patchwork: combining plots
* dplyr: pipe operator

```{r betadiv-packages, include = FALSE}
library(mia)
library(scater)
library(vegan)
library(ggplot2)
library(patchwork)
library(dplyr)
```

## Unsupervised ordination {#unsupervised-ordination}

Expand All @@ -75,16 +67,22 @@ demonstration we will analyse beta diversity in GlobalPatterns, and observe the
variation between stool samples and those with a different origin.

```{r prep-tse}
# Example data
# Load mia and import sample dataset
library(mia)
data("GlobalPatterns", package = "mia")

# Data matrix (features x samples)
tse <- GlobalPatterns

# some beta diversity metrics are usually applied to relative abundances
# Beta diversity metrics like Bray-Curtis are often applied to relabundances
tse <- transformAssay(tse,
assay.type = "counts",
method = "relabundance")

# Other metrics like Aitchison to clr-transformed data
tse <- transformAssay(tse,
assay.type = "relabundance",
method = "clr",
pseudocount = TRUE)

# Add group information Feces yes/no
tse$Group <- tse$SampleType == "Feces"
```
Expand All @@ -106,12 +104,15 @@ dimensions via an ordination method, the results of which can be stored in the
and `runNMDS` functions.

```{r runMDS}
# Perform PCoA
# Load package to plot reducedDim
library(scater)

# Run PCoA on relabundance assay with Bray-Curtis distances
tse <- runMDS(tse,
FUN = vegan::vegdist,
method = "bray",
name = "PCoA_BC",
assay.type = "relabundance")
assay.type = "relabundance",
name = "MDS_bray")
```

Sample dissimilarity can be visualized on a lower-dimensional display (typically
Expand All @@ -121,11 +122,11 @@ size and other aesthetics. Can you find any difference between the groups?

```{r plot-mds-bray-curtis, fig.cap = "MDS plot based on the Bray-Curtis distances on the GlobalPattern dataset."}
# Create ggplot object
p <- plotReducedDim(tse, "PCoA_BC",
p <- plotReducedDim(tse, "MDS_bray",
colour_by = "Group")

# Calculate explained variance
e <- attr(reducedDim(tse, "PCoA_BC"), "eig")
e <- attr(reducedDim(tse, "MDS_bray"), "eig")
rel_eig <- e / sum(e[e > 0])

# Add explained variance for each axis
Expand All @@ -135,32 +136,54 @@ p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep
p
```

With additional tools from the ggplot2 package, ordination methods can be
compared to find similarities between them or select the most suitable one to
visualize beta diversity in the light of the research question.
A few combinations of beta diversity metrics and assay types are tipically
RiboRings marked this conversation as resolved.
Show resolved Hide resolved
used. For instance, Bray-Curtis dissimilarity and Euclidean distance are often
applied to the relative abundance and the clr assays, respectively. Besides
**beta diversity metric** and **assay type**, the **PCoA algorithm** is also a
variable that should be considered. Below, we show how the choice of these three
factors can affect the resulting lower-dimensional data.

```{r mds-nmds-comparison, results='hide'}
# Run NMDS on relabundance assay with Bray-Curtis distances
tse <- runNMDS(tse,
FUN = vegan::vegdist,
method = "bray",
assay.type = "relabundance",
name = "NMDS_bray")

```{r plot-mds-nmds-comparison, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or euclidean distances on the GlobalPattern dataset."}
# Run MDS on clr assay with Aitchison distances
tse <- runMDS(tse,
FUN = vegan::vegdist,
name = "MDS_euclidean",
method = "euclidean",
assay.type = "counts")
assay.type = "clr",
name = "MDS_aitchison")

# Run NMDS on clr assay with Euclidean distances
tse <- runNMDS(tse,
FUN = vegan::vegdist,
name = "NMDS_BC")
method = "euclidean",
assay.type = "clr",
name = "NMDS_aitchison")
```

tse <- runNMDS(tse,
FUN = vegan::vegdist,
name = "NMDS_euclidean",
method = "euclidean")
Multiple ordination plots are combined into a multi-panel plot with the
patchwork package, so that different methods can be compared to find similarities
between them or select the most suitable one to visualize beta diversity in the
light of the research question.

```{r, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or Aitchison distances on the GlobalPattern dataset."}
# Load package for multi-panel plotting
library(patchwork)

plots <- lapply(c("PCoA_BC", "MDS_euclidean", "NMDS_BC", "NMDS_euclidean"),
# Generate plots for all 4 reducedDims
plots <- lapply(c("MDS_bray", "MDS_aitchison",
"NMDS_bray", "NMDS_aitchison"),
plotReducedDim,
object = tse,
colour_by = "Group")

((plots[[1]] | plots[[2]]) / (plots[[3]] | plots[[4]])) +
# Generate multi-panel plot
wrap_plots(plots) +
plot_layout(guides = "collect")
```

Expand All @@ -169,16 +192,14 @@ relationship of features in form on a `phylo` tree. `calculateUnifrac`
performs the calculation to return a `dist` object, which can again be
used within `runMDS`.

```{r}
```{r plot-unifrac, fig.cap = "Unifrac distances scaled by MDS of the GlobalPattern dataset."}
tse <- runMDS(tse,
FUN = mia::calculateUnifrac,
name = "Unifrac",
tree = rowTree(tse),
ntop = nrow(tse),
assay.type = "counts")
```

```{r plot-unifrac, fig.cap = "Unifrac distances scaled by MDS of the GlobalPattern dataset."}
plotReducedDim(tse, "Unifrac",
colour_by = "Group")
```
Expand Down Expand Up @@ -240,6 +261,9 @@ would report relative stress, which varies in the unit interval and is better
if smaller. This can be calculated as shown below.

```{r relstress}
# Load vegan package
library(vegan)

# Quantify dissimilarities in the original feature space
x <- assay(tse, "relabundance") # Pick relabunance assay separately
d0 <- as.matrix(vegdist(t(x), "bray"))
Expand Down Expand Up @@ -282,10 +306,10 @@ them. The result shows how much each covariate affects beta diversity. The table
below illustrates the relation between supervised and unsupervised ordination
methods.

| | supervised ordination | unsupervised ordination |
|:-------------------------:|:----------------------:|:------------------------:|
| Euclidean distance | RDA | PCA |
| non-Euclidean distance | dbRDA | PCoA |
| | supervised ordination | unsupervised ordination |
|:------------------------:|:----------------------:|:------------------------:|
| Euclidean distance | RDA | PCA |
| non-Euclidean distance | dbRDA | PCoA/MDS, NMDS and UMAP |

We demonstrate the usage of dbRDA with the enterotype dataset, where samples
correspond to patients. The colData contains the clinical status of each patient
Expand Down Expand Up @@ -325,7 +349,7 @@ function. We see that both clinical status and age explain more than 10% of the
variance, but only age shows statistical significance.

```{r rda-permanova-res}
rda_info$permanova %>%
rda_info$permanova |>
knitr::kable()
```

Expand All @@ -334,79 +358,20 @@ information from the results of RDA. In this case, none of the p-values is lower
than the significance threshold, and thus homogeneity is observed.

```{r rda-homogeneity-res}
rda_info$homogeneity %>%
rda_info$homogeneity |>
knitr::kable()
```

Next, we proceed to visualize the weight and significance of each variable on
the similarity between samples with an RDA plot, which can be generated with
the following custom function.
the `plotRDA` function from the miaViz package.

```{r plot-rda}
# Load packages for plotting function
library(stringr)
library(ggord)

rda <- attr(reducedDim(tse2, "RDA"), "rda")

# Covariates that are being analyzed
variable_names <- c("ClinicalStatus", "Gender", "Age")

# Since na.exclude was used, if there were rows missing information, they were
# dropped off. Subset coldata so that it matches with rda.
coldata <- colData(tse2)[ rownames(rda$CCA$wa), ]

# Adjust names
# Get labels of vectors
vec_lab_old <- rownames(rda$CCA$biplot)

# Loop through vector labels
vec_lab <- sapply(vec_lab_old, FUN = function(name){
# Get the variable name
variable_name <- variable_names[ str_detect(name, variable_names) ]
# If the vector label includes also group name
if( !any(name %in% variable_names) ){
# Get the group names
group_name <- unique( coldata[[variable_name]] )[
which( paste0(variable_name, unique( coldata[[variable_name]] )) == name ) ]
# Modify vector so that group is separated from variable name
new_name <- paste0(variable_name, " \U2012 ", group_name)
} else{
new_name <- name
}
# Add percentage how much this variable explains, and p-value
new_name <- expr(paste(!!new_name, " (",
!!format(round( rda_info$permanova[variable_name, "Explained variance"]*100, 1), nsmall = 1),
"%, ",italic("P"), " = ",
!!gsub("0\\.","\\.", format(round( rda_info$permanova[variable_name, "Pr(>F)"], 3),
nsmall = 3)), ")"))

return(new_name)
})
# Add names
names(vec_lab) <- vec_lab_old

# Create labels for axis
xlab <- paste0("RDA1 (", format(round( rda$CCA$eig[[1]]/rda$CCA$tot.chi*100, 1), nsmall = 1 ), "%)")
ylab <- paste0("RDA2 (", format(round( rda$CCA$eig[[2]]/rda$CCA$tot.chi*100, 1), nsmall = 1 ), "%)")

# Create a plot
plot <- ggord(rda, grp_in = coldata[["ClinicalStatus"]], vec_lab = vec_lab,
alpha = 0.5,
size = 4, addsize = -4,
#ext= 0.7,
txt = 3.5, repel = TRUE,
#coord_fix = FALSE
) +
# Adjust titles and labels
guides(colour = guide_legend("ClinicalStatus"),
fill = guide_legend("ClinicalStatus"),
group = guide_legend("ClinicalStatus"),
shape = guide_legend("ClinicalStatus"),
x = guide_axis(xlab),
y = guide_axis(ylab)) +
theme( axis.title = element_text(size = 10) )
plot
library(miaViz)

# Generate RDA plot coloured by clinical status
plotRDA(tse2, "RDA", colour_by = "ClinicalStatus")
```

From the plot above, we can see that only age significantly describes
Expand Down
6 changes: 3 additions & 3 deletions 23_multi-assay_analyses.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ bacterium X is present, is the concentration of metabolite Y lower or higher"?
# Agglomerate microbiome data at family level
mae[[1]] <- mergeFeaturesByPrevalence(mae[[1]], rank = "Family")
# Does log10 transform for microbiome data
mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = 1)
mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = TRUE)

# Give unique names so that we do not have problems when we are creating a plot
rownames(mae[[1]]) <- getTaxonomyLabels(mae[[1]])
Expand Down Expand Up @@ -193,8 +193,8 @@ mae[[2]] <- transformAssay(mae[[2]], assay.type = "nmr",

# Transforming biomarker data with z-transform
mae[[3]] <- transformAssay(mae[[3]], assay.type = "signals",
MARGIN = "features",
method = "z", pseudocount = 1)
MARGIN = "features",
method = "z", pseudocount = 1)

# Removing assays no longer needed
assay(mae[[1]], "counts") <- NULL
Expand Down
1 change: 0 additions & 1 deletion 30_differential_abundance.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,6 @@ zicoseq_res %>%
```{r plot-zicoseq}
## x-axis is the effect size: R2 * direction of coefficient
ZicoSeq.plot(ZicoSeq.obj = zicoseq_out,
meta.dat = as.data.frame(colData(tse)),
pvalue.type = 'p.adj.fdr')
```

Expand Down
5 changes: 1 addition & 4 deletions 97_extra_materials.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -232,23 +232,20 @@ plot(posterior, par="Lambda", focus.cov = rownames(X)[c(2,4)])
## Interactive 3D Plots

```{r, message=FALSE, warning=FALSE}
# Installing libraryd packages
# Load libraries
library(rgl)
library(plotly)
```

```{r setup2, warning=FALSE, message=FALSE}
library(knitr)
library(rgl)
knitr::knit_hooks$set(webgl = hook_webgl)
```


In this section we make a 3D version of the earlier Visualizing the most dominant genus on PCoA (see \@ref(quality-control)), with the help of the plotly [@Sievert2020].

```{r, message=FALSE, warning=FALSE}
# Installing the package
library(curatedMetagenomicData)
# Importing necessary libraries
library(curatedMetagenomicData)
library(dplyr)
Expand Down
Loading