diff --git a/04_containers.Rmd b/04_containers.Rmd index 630069e8b..e3b460592 100644 --- a/04_containers.Rmd +++ b/04_containers.Rmd @@ -68,7 +68,7 @@ Let us load example data and rename it as tse. ```{r} library(mia) -data(hitchip1006, package="miaTime") +data("hitchip1006", package = "miaTime") tse <- hitchip1006 ``` diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 718b8cd4c..ac0c74de2 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -16,11 +16,11 @@ knitr::opts_chunk$set( # Community Similarity {#community-similarity} -Whereas alpha diversity focuses on community variation within a community -(one sample), beta diversity quantifies the dissimilarity between communities -(multiple samples). In microbiome research, the most popular metrics of beta +Beta diversity quantifies the dissimilarity between communities (multiple +samples), as opposed to alpha diversity which focuses on variation within a +community (one sample). In microbiome research, commonly used metrics of beta diversity include the Bray-Curtis index (for compositional data), Jaccard index -(for presence / absence data, ignoring abundance information), Aitchison distance +(for presence/absence data, ignoring abundance information), Aitchison distance (Euclidean distance for clr transformed abundances, aiming to avoid the compositionality bias), and the Unifrac distance (that takes into account the phylogenetic tree information). Notably, only some of these measures are actual @@ -28,6 +28,17 @@ _distances_, as this is a mathematical concept whose definition is not satisfied by certain ecological measure, such as the Bray-Curtis index. Therefore, the terms dissimilarity and beta diversity are preferred. +| Method description | Assay type | Beta diversity metric | +|:---------------------------:|:-------------------:|:---------------------:| +| Quantitative profiling | Absolute counts | Bray-Curtis | +| Relative profiling | Relative abundances | Bray-Curtis | +| Aitchison distance | Absolute counts | Aitchison | +| Aitchison distance | clr | Euclidean | +| Robust Aitchison distance | rclr | Euclidean | +| Presence/Absence similarity | Relative abundances | Jaccard | +| Presence/Absence similarity | Absolute counts | Jaccard | +| Phylogenetic distance | Rarefied counts | Unifrac | + In practice, beta diversity is usually represented as a `dist` object, a triangular matrix where the distance between each pair of samples is encoded by a specific cell. This distance matrix can then undergo ordination, which is an @@ -47,23 +58,6 @@ Reduction (UMAP), whereas the latter is mainly represented by distance-based Redundancy Analysis (dbRDA). We will first discuss unsupervised ordination methods and then proceed to supervised ones. -To run the examples in this chapter, the following packages should be imported: - -* mia: microbiome analysis framework -* scater: plotting reduced dimensions -* vegan: ecological distances -* ggplot2: plotting -* patchwork: combining plots -* dplyr: pipe operator - -```{r betadiv-packages, include = FALSE} -library(mia) -library(scater) -library(vegan) -library(ggplot2) -library(patchwork) -library(dplyr) -``` ## Unsupervised ordination {#unsupervised-ordination} @@ -75,16 +69,22 @@ demonstration we will analyse beta diversity in GlobalPatterns, and observe the variation between stool samples and those with a different origin. ```{r prep-tse} -# Example data +# Load mia and import sample dataset +library(mia) data("GlobalPatterns", package = "mia") - -# Data matrix (features x samples) tse <- GlobalPatterns -# some beta diversity metrics are usually applied to relative abundances +# Beta diversity metrics like Bray-Curtis are often applied to relabundances tse <- transformAssay(tse, + assay.type = "counts", method = "relabundance") +# Other metrics like Aitchison to clr-transformed data +tse <- transformAssay(tse, + assay.type = "relabundance", + method = "clr", + pseudocount = TRUE) + # Add group information Feces yes/no tse$Group <- tse$SampleType == "Feces" ``` @@ -106,12 +106,15 @@ dimensions via an ordination method, the results of which can be stored in the and `runNMDS` functions. ```{r runMDS} -# Perform PCoA +# Load package to plot reducedDim +library(scater) + +# Run PCoA on relabundance assay with Bray-Curtis distances tse <- runMDS(tse, FUN = vegan::vegdist, method = "bray", - name = "PCoA_BC", - assay.type = "relabundance") + assay.type = "relabundance", + name = "MDS_bray") ``` Sample dissimilarity can be visualized on a lower-dimensional display (typically @@ -121,11 +124,11 @@ size and other aesthetics. Can you find any difference between the groups? ```{r plot-mds-bray-curtis, fig.cap = "MDS plot based on the Bray-Curtis distances on the GlobalPattern dataset."} # Create ggplot object -p <- plotReducedDim(tse, "PCoA_BC", +p <- plotReducedDim(tse, "MDS_bray", colour_by = "Group") # Calculate explained variance -e <- attr(reducedDim(tse, "PCoA_BC"), "eig") +e <- attr(reducedDim(tse, "MDS_bray"), "eig") rel_eig <- e / sum(e[e > 0]) # Add explained variance for each axis @@ -135,32 +138,54 @@ p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep p ``` -With additional tools from the ggplot2 package, ordination methods can be -compared to find similarities between them or select the most suitable one to -visualize beta diversity in the light of the research question. +A few combinations of beta diversity metrics and assay types are typically +used. For instance, Bray-Curtis dissimilarity and Euclidean distance are often +applied to the relative abundance and the clr assays, respectively. Besides +**beta diversity metric** and **assay type**, the **PCoA algorithm** is also a +variable that should be considered. Below, we show how the choice of these three +factors can affect the resulting lower-dimensional data. + +```{r mds-nmds-comparison, results='hide'} +# Run NMDS on relabundance assay with Bray-Curtis distances +tse <- runNMDS(tse, + FUN = vegan::vegdist, + method = "bray", + assay.type = "relabundance", + name = "NMDS_bray") -```{r plot-mds-nmds-comparison, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or euclidean distances on the GlobalPattern dataset."} +# Run MDS on clr assay with Aitchison distances tse <- runMDS(tse, FUN = vegan::vegdist, - name = "MDS_euclidean", method = "euclidean", - assay.type = "counts") + assay.type = "clr", + name = "MDS_aitchison") +# Run NMDS on clr assay with Euclidean distances tse <- runNMDS(tse, FUN = vegan::vegdist, - name = "NMDS_BC") + method = "euclidean", + assay.type = "clr", + name = "NMDS_aitchison") +``` -tse <- runNMDS(tse, - FUN = vegan::vegdist, - name = "NMDS_euclidean", - method = "euclidean") +Multiple ordination plots are combined into a multi-panel plot with the +patchwork package, so that different methods can be compared to find similarities +between them or select the most suitable one to visualize beta diversity in the +light of the research question. + +```{r, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or Aitchison distances on the GlobalPattern dataset."} +# Load package for multi-panel plotting +library(patchwork) -plots <- lapply(c("PCoA_BC", "MDS_euclidean", "NMDS_BC", "NMDS_euclidean"), +# Generate plots for all 4 reducedDims +plots <- lapply(c("MDS_bray", "MDS_aitchison", + "NMDS_bray", "NMDS_aitchison"), plotReducedDim, object = tse, colour_by = "Group") -((plots[[1]] | plots[[2]]) / (plots[[3]] | plots[[4]])) + +# Generate multi-panel plot +wrap_plots(plots) + plot_layout(guides = "collect") ``` @@ -169,16 +194,14 @@ relationship of features in form on a `phylo` tree. `calculateUnifrac` performs the calculation to return a `dist` object, which can again be used within `runMDS`. -```{r} +```{r plot-unifrac, fig.cap = "Unifrac distances scaled by MDS of the GlobalPattern dataset."} tse <- runMDS(tse, FUN = mia::calculateUnifrac, name = "Unifrac", tree = rowTree(tse), ntop = nrow(tse), assay.type = "counts") -``` -```{r plot-unifrac, fig.cap = "Unifrac distances scaled by MDS of the GlobalPattern dataset."} plotReducedDim(tse, "Unifrac", colour_by = "Group") ``` @@ -240,6 +263,9 @@ would report relative stress, which varies in the unit interval and is better if smaller. This can be calculated as shown below. ```{r relstress} +# Load vegan package +library(vegan) + # Quantify dissimilarities in the original feature space x <- assay(tse, "relabundance") # Pick relabunance assay separately d0 <- as.matrix(vegdist(t(x), "bray")) @@ -282,10 +308,10 @@ them. The result shows how much each covariate affects beta diversity. The table below illustrates the relation between supervised and unsupervised ordination methods. -| | supervised ordination | unsupervised ordination | -|:-------------------------:|:----------------------:|:------------------------:| -| Euclidean distance | RDA | PCA | -| non-Euclidean distance | dbRDA | PCoA | +| | supervised ordination | unsupervised ordination | +|:------------------------:|:----------------------:|:------------------------:| +| Euclidean distance | RDA | PCA | +| non-Euclidean distance | dbRDA | PCoA/MDS, NMDS and UMAP | We demonstrate the usage of dbRDA with the enterotype dataset, where samples correspond to patients. The colData contains the clinical status of each patient @@ -325,7 +351,7 @@ function. We see that both clinical status and age explain more than 10% of the variance, but only age shows statistical significance. ```{r rda-permanova-res} -rda_info$permanova %>% +rda_info$permanova |> knitr::kable() ``` @@ -334,79 +360,20 @@ information from the results of RDA. In this case, none of the p-values is lower than the significance threshold, and thus homogeneity is observed. ```{r rda-homogeneity-res} -rda_info$homogeneity %>% +rda_info$homogeneity |> knitr::kable() ``` Next, we proceed to visualize the weight and significance of each variable on the similarity between samples with an RDA plot, which can be generated with -the following custom function. +the `plotRDA` function from the miaViz package. ```{r plot-rda} # Load packages for plotting function -library(stringr) -library(ggord) - -rda <- attr(reducedDim(tse2, "RDA"), "rda") - -# Covariates that are being analyzed -variable_names <- c("ClinicalStatus", "Gender", "Age") - -# Since na.exclude was used, if there were rows missing information, they were -# dropped off. Subset coldata so that it matches with rda. -coldata <- colData(tse2)[ rownames(rda$CCA$wa), ] - -# Adjust names -# Get labels of vectors -vec_lab_old <- rownames(rda$CCA$biplot) - -# Loop through vector labels -vec_lab <- sapply(vec_lab_old, FUN = function(name){ - # Get the variable name - variable_name <- variable_names[ str_detect(name, variable_names) ] - # If the vector label includes also group name - if( !any(name %in% variable_names) ){ - # Get the group names - group_name <- unique( coldata[[variable_name]] )[ - which( paste0(variable_name, unique( coldata[[variable_name]] )) == name ) ] - # Modify vector so that group is separated from variable name - new_name <- paste0(variable_name, " \U2012 ", group_name) - } else{ - new_name <- name - } - # Add percentage how much this variable explains, and p-value - new_name <- expr(paste(!!new_name, " (", - !!format(round( rda_info$permanova[variable_name, "Explained variance"]*100, 1), nsmall = 1), - "%, ",italic("P"), " = ", - !!gsub("0\\.","\\.", format(round( rda_info$permanova[variable_name, "Pr(>F)"], 3), - nsmall = 3)), ")")) - - return(new_name) -}) -# Add names -names(vec_lab) <- vec_lab_old - -# Create labels for axis -xlab <- paste0("RDA1 (", format(round( rda$CCA$eig[[1]]/rda$CCA$tot.chi*100, 1), nsmall = 1 ), "%)") -ylab <- paste0("RDA2 (", format(round( rda$CCA$eig[[2]]/rda$CCA$tot.chi*100, 1), nsmall = 1 ), "%)") - -# Create a plot -plot <- ggord(rda, grp_in = coldata[["ClinicalStatus"]], vec_lab = vec_lab, - alpha = 0.5, - size = 4, addsize = -4, - #ext= 0.7, - txt = 3.5, repel = TRUE, - #coord_fix = FALSE - ) + - # Adjust titles and labels - guides(colour = guide_legend("ClinicalStatus"), - fill = guide_legend("ClinicalStatus"), - group = guide_legend("ClinicalStatus"), - shape = guide_legend("ClinicalStatus"), - x = guide_axis(xlab), - y = guide_axis(ylab)) + - theme( axis.title = element_text(size = 10) ) -plot +library(miaViz) + +# Generate RDA plot coloured by clinical status +plotRDA(tse2, "RDA", colour_by = "ClinicalStatus") ``` From the plot above, we can see that only age significantly describes diff --git a/23_multi-assay_analyses.Rmd b/23_multi-assay_analyses.Rmd index c88e1153e..3f04a1599 100644 --- a/23_multi-assay_analyses.Rmd +++ b/23_multi-assay_analyses.Rmd @@ -102,7 +102,7 @@ bacterium X is present, is the concentration of metabolite Y lower or higher"? # Agglomerate microbiome data at family level mae[[1]] <- mergeFeaturesByPrevalence(mae[[1]], rank = "Family") # Does log10 transform for microbiome data -mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = 1) +mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = TRUE) # Give unique names so that we do not have problems when we are creating a plot rownames(mae[[1]]) <- getTaxonomyLabels(mae[[1]]) @@ -193,8 +193,8 @@ mae[[2]] <- transformAssay(mae[[2]], assay.type = "nmr", # Transforming biomarker data with z-transform mae[[3]] <- transformAssay(mae[[3]], assay.type = "signals", - MARGIN = "features", - method = "z", pseudocount = 1) + MARGIN = "features", + method = "z", pseudocount = 1) # Removing assays no longer needed assay(mae[[1]], "counts") <- NULL diff --git a/30_differential_abundance.Rmd b/30_differential_abundance.Rmd index 9eaf19382..c3dc58929 100755 --- a/30_differential_abundance.Rmd +++ b/30_differential_abundance.Rmd @@ -422,7 +422,6 @@ zicoseq_res %>% ```{r plot-zicoseq} ## x-axis is the effect size: R2 * direction of coefficient ZicoSeq.plot(ZicoSeq.obj = zicoseq_out, - meta.dat = as.data.frame(colData(tse)), pvalue.type = 'p.adj.fdr') ``` diff --git a/97_extra_materials.Rmd b/97_extra_materials.Rmd index b35fc462d..55ee38696 100644 --- a/97_extra_materials.Rmd +++ b/97_extra_materials.Rmd @@ -232,14 +232,13 @@ plot(posterior, par="Lambda", focus.cov = rownames(X)[c(2,4)]) ## Interactive 3D Plots ```{r, message=FALSE, warning=FALSE} -# Installing libraryd packages +# Load libraries library(rgl) library(plotly) ``` ```{r setup2, warning=FALSE, message=FALSE} library(knitr) -library(rgl) knitr::knit_hooks$set(webgl = hook_webgl) ``` @@ -247,8 +246,6 @@ knitr::knit_hooks$set(webgl = hook_webgl) In this section we make a 3D version of the earlier Visualizing the most dominant genus on PCoA (see \@ref(quality-control)), with the help of the plotly [@Sievert2020]. ```{r, message=FALSE, warning=FALSE} -# Installing the package -library(curatedMetagenomicData) # Importing necessary libraries library(curatedMetagenomicData) library(dplyr) diff --git a/DESCRIPTION b/DESCRIPTION index 401fc3e37..a4f850668 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -29,8 +29,12 @@ Suggests: ANCOMBC, benchdamic, BiocCheck, + biclust, bookdown, + cobiclust, curatedMetagenomicData, + dendextend, + DT, fido, ggpubr, HDF5Array, @@ -40,14 +44,20 @@ Suggests: matrixStats, mia, miaViz, + miaTime, MicrobiotaProcess, MicrobiomeStat, microbiomeDataSets, mikropml, + MOFA2, + NbClust, patchwork, + pheatmap, philr, - picante, + picante, + plotly, rebook, + rgl, rmarkdown, Rtsne, scater,