From 1a1f6c779ba18e248b9dac588e27f8911b039061 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 22 Aug 2023 14:36:43 +0300 Subject: [PATCH 01/26] Add link to benchmarking and minor polish --- 20_beta_diversity.Rmd | 2 +- 23_multi-assay_analyses.Rmd | 2 +- 30_differential_abundance.Rmd | 3 +-- 40_machine_learning.Rmd | 2 +- 95_resources.Rmd | 32 ++++++++++++++++++++------------ 5 files changed, 24 insertions(+), 17 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 44259398b..de5fdd94d 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -272,7 +272,7 @@ ggplot(df, aes(x = d0, y = dmds)) + theme_bw() ``` -## Supervized ordination +## Supervised ordination dbRDA is a supervised counterpart of PCoA, that is, it takes into account the covariates specified by the user to maximize the variance with respect to the diff --git a/23_multi-assay_analyses.Rmd b/23_multi-assay_analyses.Rmd index 1114d874a..c88e1153e 100644 --- a/23_multi-assay_analyses.Rmd +++ b/23_multi-assay_analyses.Rmd @@ -1,4 +1,4 @@ -# Multi-assay analyses {#multi-assay-analyses} +# Multi-Assay Analyses {#multi-assay-analyses} ```{r setup, echo=FALSE, results="asis"} library(rebook) diff --git a/30_differential_abundance.Rmd b/30_differential_abundance.Rmd index b6a7fd4b6..995390011 100755 --- a/30_differential_abundance.Rmd +++ b/30_differential_abundance.Rmd @@ -93,11 +93,10 @@ as a grouping variable in the sample data. We simplify the examples by only including two of the three groups. ```{r import-daa-data} -library(tidySummarizedExperiment) +library(mia) library(tidyverse) # Import dataset -library(mia) data("Tengeler2020", package = "mia") tse <- Tengeler2020 diff --git a/40_machine_learning.Rmd b/40_machine_learning.Rmd index d37923bd5..8d37de34a 100644 --- a/40_machine_learning.Rmd +++ b/40_machine_learning.Rmd @@ -1,4 +1,4 @@ -# Machine learning {#machine_learning} +# Machine Learning {#machine_learning} ```{r setup, echo=FALSE, results="asis"} library(rebook) diff --git a/95_resources.Rmd b/95_resources.Rmd index a2defaf89..3089c1700 100644 --- a/95_resources.Rmd +++ b/95_resources.Rmd @@ -28,7 +28,7 @@ [NGS Analysis Basics](http://girke.bioinformatics.ucr.edu/GEN242/tutorials/rsequences/rsequences/) provides a walk-through of the above-mentioned features with detailed examples. -### Alternative containers for microbiome data +### Phyloseq: an alternative container for microbiome data The `phyloseq` package and class became the first widely used data container for microbiome data science in R. Many methods for taxonomic @@ -48,15 +48,13 @@ each other. related to samples. `rowTree` : This slot is similar to `phy_tree` in `phyloseq` to store phylogenetic tree. -In this book, you will come across terms like `FeatureIDs` and +In this book, you will encounter terms such as `FeatureIDs` and `SampleIDs`. `FeatureIDs` : These are basically OTU/ASV ids which are row names in `assays` and `rowData`. `SampleIDs` : As the name suggests, these are sample ids which are column names in `assays` and -row names in `colData`. - -`FeatureIDs` and `SampleIDs` are used but the technical terms -`rownames` and `colnames` are encouraged to be used, since they relate -to actual objects we work with. +row names in `colData`. `FeatureIDs` and `SampleIDs` are used but the +technical terms `rownames` and `colnames` are encouraged to be used, since +they relate to actual objects we work with. -### Resources for phyloseq +#### Benchmarking TreeSE with phyloseq + +TreeSE objects can be converted into phyloseq objects and vice versa, therefore +it is possible to compare the two containers in terms of computational efficiency. +Remarkably, TreeSE and phyloseq were benchmarked against one another in mia v1.2.3 +and phyloseq v1.38.0, respectively. 5 standard microbiome analysis operationswere +applied to 4 datasets of varying size with both containers. In a nutshell, TreeSE +and phyloseq showed a similar performance for datasets of small and medium size +for most of the operations. However, TreeSE performed more efficiently as the size +of the datasets increased. Further details on such results can be found in the +[benchmarking repository](https://github.com/microbiome/benchmarking). + +#### Resources on phyloseq -The (Tree)SummarizedExperiment objects can be converted into the alternative phyloseq format, for which further methods are available. +The phyloseq container provides analogous methods to TreeSE. The following +material can be used to familiarize with such alternative methods: * [List of R tools for microbiome analysis](https://microsud.github.io/Tools-Microbiome-Analysis/) * phyloseq [@McMurdie2013] @@ -75,9 +86,6 @@ The (Tree)SummarizedExperiment objects can be converted into the alternative phy * Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses [@Callahan2016]. - - - ## R programming resources ### Base R and RStudio From 7daf43268ec63ebe332829d5d1ce103650e3932a Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 22 Aug 2023 16:48:16 +0300 Subject: [PATCH 02/26] Simplify section on supervised ordination --- 20_beta_diversity.Rmd | 164 ++++++++++++++-------------------- 30_differential_abundance.Rmd | 8 +- 2 files changed, 72 insertions(+), 100 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index de5fdd94d..c3ba0ab6c 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -37,7 +37,7 @@ much essential information from the data as possible and turn it into a lower dimensional representation. Dimension reduction is bound to lose information but commonly used ordination techniques can preserve relevant information of sample similarities in an optimal way, which is defined in different ways by different -methods. [TODO add references and/or link to ordination chapter instead?] +methods. Based on the type of algorithm, ordination methods in microbiome research can be generally divided in two categories: unsupervised and supervised ordination. @@ -63,7 +63,7 @@ library(ggplot2) library(patchwork) ``` -## Unsupervised ordination +## Unsupervised ordination {#unsupervised-ordination} Unsupervised ordination methods variation in the data without additional information on covariates or other supervision of the model. Among the different @@ -81,7 +81,7 @@ tse <- GlobalPatterns # some beta diversity metrics are usually applied to relative abundances tse <- transformAssay(tse, - method = "relabundance") + method = "relabundance") # Add group information Feces yes/no tse$Group <- tse$SampleType == "Feces" @@ -285,100 +285,72 @@ methods. | Euclidean distance | RDA | PCA | | non-Euclidean distance | dbRDA | PCoA | -We demonstrate the usage of dbRDA with the enterotype TreeSE, where samples +We demonstrate the usage of dbRDA with the enterotype dataset, where samples correspond to patients. The colData contains the clinical status of each patient -as well as a few covariates such as their gender and age, which can be included -in the supervised model together with the clinical status, the main outcome variable. -dbRDA can be perfomed with the `calculateRDA` function, which lies at the center -of the following workflow. - -```{r microbiome_RDA1} -# Load packages -library(stringr) +and a few covariates such as their gender and age. +```{r import-rda-dataset} # Load data data("enterotype", package = "mia") - -# Covariates that are being analyzed -variable_names <- c("ClinicalStatus", "Gender", "Age") +tse <- enterotype # Apply relative transform -enterotype <- transformAssay(enterotype, method = "relabundance") - -# Create a formula -formula <- as.formula(paste0("assay ~ ", str_c(variable_names, collapse = " + ")) ) - -# # Perform RDA -rda <- calculateRDA(enterotype, - assay.type = "relabundance", - formula = formula, - distance = "bray", - na.action = na.exclude) - -# Get the rda object -rda <- attr(rda, "rda") -# Calculate p-value and variance for whole model -# Recommendation: use 999 permutations instead of 99 -set.seed(436) -permanova <- anova.cca(rda, permutations = 99) -# Create a data.frame for results -rda_info <- as.data.frame(permanova)["Model", ] - -# Calculate p-value and variance for each variable -# by = "margin" --> the order or variables does not matter -set.seed(4585) -permanova <- anova.cca(rda, by = "margin", permutations = 99) -# Add results to data.frame -rda_info <- rbind(rda_info, permanova) - -# Add info about total variance -rda_info[ , "Total variance"] <- rda_info["Model", 2] + - rda_info["Residual", 2] - -# Add info about explained variance -rda_info[ , "Explained variance"] <- rda_info[ , 2] / - rda_info[ , "Total variance"] - -# Loop through variables, calculate homogeneity -homogeneity <- list() -# Get colDtaa -coldata <- colData(enterotype) -# Get assay -assay <- t(assay(enterotype, "relabundance")) -for( variable_name in rownames(rda_info) ){ - # If data is continuous or discrete - if( variable_name %in% c("Model", "Residual") || - length(unique(coldata[[variable_name]])) / - length(coldata[[variable_name]]) > 0.2 ){ - # Do not calculate homogeneity for continuous data - temp <- NA - } else{ - # Calculate homogeneity for discrete data - # Calculate homogeneity - set.seed(413) - temp <- anova( - betadisper( - vegdist(assay, method = "bray"), - group = coldata[[variable_name]] ), - permutations = permutations )["Groups", "Pr(>F)"] - } - # Add info to the list - homogeneity[[variable_name]] <- temp -} -# Add homogeneity to information -rda_info[["Homogeneity p-value (NULL hyp: distinct/homogeneous --> permanova suitable)"]] <- - homogeneity - -knitr::kable(rda_info) +tse <- transformAssay(tse, + method = "relabundance") ``` -```{r microbiome_RDA2} -# Load ggord for plotting +dbRDA can be perfomed with the `runRDA` function. In addition to the arguments +previously defined for [unsupervised ordination](#unsupervised-ordination), this +function takes a formula to control for variables and an action to treat missing +values. Along with clinical status, which is the main outcome, we control for +gender and age, and exclude observations where one of these variables is missing. + +```{r run-rda} +# Perform RDA +tse <- runRDA(tse, + assay.type = "relabundance", + formula = assay ~ ClinicalStatus + Gender + Age, + distance = "bray", + na.action = na.exclude) + +# Store results of PERMANOVA test +rda_info <- attr(reducedDim(tse, "RDA"), "significance") +``` + +The importance of each variable on the similarity between samples can be +assessed from the results of PERMANOVA, automatically provided by the `runRDA` +function. We see that both clinical status and age explain more than 10% of the +variance, but only age shows statistical significance. + +```{r rda-permanova-res} +rda_info$permanova +``` + +To ensure that the homogeneity assumption holds, we retrieve the corresponding +information from the results of RDA. In this case, none of the p-values is lower +than the significance threshold, and thus homogeneity is observed. + +```{r rda-homogeneity-res} +rda_info$homogeneity +``` + +Next, we proceed to visualize the weight and significance of each variable on +the similarity between samples with an RDA plot, which can be generated with +the following custom function. + +```{r plot-rda} +# Load packages for plotting function +library(stringr) library(ggord) +rda <- attr(reducedDim(tse, "RDA"), "rda") + +# Covariates that are being analyzed +variable_names <- c("ClinicalStatus", "Gender", "Age") + # Since na.exclude was used, if there were rows missing information, they were # dropped off. Subset coldata so that it matches with rda. -coldata <- coldata[ rownames(rda$CCA$wa), ] +coldata <- colData(tse)[ rownames(rda$CCA$wa), ] # Adjust names # Get labels of vectors @@ -400,9 +372,9 @@ vec_lab <- sapply(vec_lab_old, FUN = function(name){ } # Add percentage how much this variable explains, and p-value new_name <- expr(paste(!!new_name, " (", - !!format(round( rda_info[variable_name, "Explained variance"]*100, 1), nsmall = 1), + !!format(round( rda_info$permanova[variable_name, "Explained variance"]*100, 1), nsmall = 1), "%, ",italic("P"), " = ", - !!gsub("0\\.","\\.", format(round( rda_info[variable_name, "Pr(>F)"], 3), + !!gsub("0\\.","\\.", format(round( rda_info$permanova[variable_name, "Pr(>F)"], 3), nsmall = 3)), ")")) return(new_name) @@ -433,7 +405,9 @@ plot <- ggord(rda, grp_in = coldata[["ClinicalStatus"]], vec_lab = vec_lab, plot ``` -From RDA plot we can see that only age has significant affect on microbial profile. +From the plot above, we can see that only age significantly describes +differences between the microbial profiles of different samples. Such visual +approach complements the previous results of PERMANOVA. ## Case studies @@ -446,17 +420,17 @@ data at the Genus level and get the dominant taxa per sample. ```{r} # Agglomerate to genus level tse_genus <- mergeFeaturesByRank(tse, - rank = "Genus") + rank = "Genus") # Convert to relative abundances tse_genus <- transformAssay(tse, - method = "relabundance", - assay.type = "counts") + method = "relabundance", + assay.type = "counts") # Add info on dominant genus per sample tse_genus <- addPerSampleDominantFeatures(tse_genus, - assay.type = "relabundance", - name = "dominant_taxa") + assay.type = "relabundance", + name = "dominant_taxa") ``` Next, we perform PCoA with Bray-Curtis dissimilarity. @@ -474,8 +448,8 @@ that A 3D interactive version of the plot below can be found in \@ref(extras). ```{r} # Getting the top taxa top_taxa <- getTopFeatures(tse_genus, - top = 6, - assay.type = "relabundance") + top = 6, + assay.type = "relabundance") # Naming all the rest of non top-taxa as "Other" most_abundant <- lapply(colData(tse_genus)$dominant_taxa, diff --git a/30_differential_abundance.Rmd b/30_differential_abundance.Rmd index 995390011..12e0bbb8d 100755 --- a/30_differential_abundance.Rmd +++ b/30_differential_abundance.Rmd @@ -462,13 +462,11 @@ apparent dynamics between the response and the main independent variable. They are common in experimental studies. Generally, they can be classified into 3 groups: -- Biological confounders, such as age, sex, etc. +- Biological confounders, such as age and sex -- Technical confounders produced by data collection, storage, DNA - extraction, sequencing process, etc. +- Technical confounders produced during sample collection, processing and analysis -- Confounders resulting from experimental models, such as cage effect, - sample background, etc. +- Confounders resulting from experimental models, such as batch effects and sample history Controlling for confounders is an important practice to reach an unbiased conclusion. To perform causal inference, it is crucial that the method From f8c26df1324f6950bd99a334402134a6f9bfaf6d Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 22 Aug 2023 21:08:31 +0300 Subject: [PATCH 03/26] Add clarifications to DAA with confounding --- 30_differential_abundance.Rmd | 42 +++++++++++++++++++++++++++++------ 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/30_differential_abundance.Rmd b/30_differential_abundance.Rmd index 12e0bbb8d..2cc43ff86 100755 --- a/30_differential_abundance.Rmd +++ b/30_differential_abundance.Rmd @@ -351,7 +351,7 @@ significant taxa. Lastly, we cover linear models for differential abundance analysis of microbiome compositional data (@Zhou2022). This is very similar to -ANCOMBC with few differences: 1) LinDA correct for the compositional +ANCOMBC with few differences: 1) LinDA corrects for the compositional bias differently using the mode of all regression coefficients. 2) it is faster (100x-1000x than ANCOMBC and according to the authors); 3) it supports hierarchical models. The latest ANCOMBC versions are also @@ -387,6 +387,8 @@ linda_out <- linda.res$output$patient_statusControl linda_out %>% as.data.frame() %>% filter(reject) %>% + dplyr::select(stat, padj) %>% + rownames_to_column(var = "feature") %>% head() %>% knitr::kable() ``` @@ -494,7 +496,12 @@ colData(tse)$library_size <- colSums(assay(tse, "counts")) ``` -### ANCOMBC +### ANCOM-BC + +Here, confounders can be added to the formula along with patient status, the +main outcome variable. This way, the model evaluates whether differentially +abundant taxa are associated with one of the variables when the other two +are kept constant. ```{r run-adj-ancombc, warning=FALSE} # perform the analysis @@ -507,15 +514,19 @@ ancombc_cov <- ancombc2(tse, struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, - # multi group comparison is deactivated automatically + # multi-group comparison is deactivated automatically global = TRUE) -# now the model answers the question: holding cohort constant, are bacterial -# taxa differentially abundant? Or holding patient status constant, is cohort -# associated with bacterial abundance? # Again we only show the first 6 entries. ``` +In the output, each taxon is assigned with several effect sizes (lfc, which +stands for log-fold change) and adjusted p-values (q). For categorical variables such as patient status and cohort, the statistics indicate whether the abundance +of a given taxon is significantly different between the specified group (column +name) and the reference group (the group that does not appear in the column +names), whereas for numerical variables such as library size, they indicate +whether the abundance of a given taxon varies with that variable. + ```{r adj-ancombc-res} ancombc_cov$res %>% dplyr::select(starts_with(c("taxon", "lfc", "q"))) %>% @@ -525,6 +536,9 @@ ancombc_cov$res %>% ### LinDA +As in the previous method, confounders can be included in the formula with the +main outcome variable. + ```{r run-adj-linda} linda_cov <- linda(as.data.frame(assay(tse, "counts")), as.data.frame(colData(tse)), @@ -534,19 +548,31 @@ linda_cov <- linda(as.data.frame(assay(tse, "counts")), mean.abund.filter = 0) ``` +The model returns an output for every variable included in the formula. Normally, +only the results on the main outcome variable are relevant and can be retrieved +as shown below. However, the statistics on the confounders can be similarly +obtained by accessing the corresponding items from the output object. + ```{r adj-linda-res} +# Select results for the patient status linda.res <- linda_cov$output$patient_statusControl linda.res %>% filter(reject) %>% + dplyr::select(log2FoldChange, stat, padj) %>% + rownames_to_column(var = "feature") %>% head() %>% knitr::kable() ``` - +The output shows effect sizes in terms of log-fold changes and a derived +statistic (stat) as well as the corresponding adjusted p-values for differences +in abundance of each taxon between the control and treated group. ### ZicoSeq +For this method, confounders can be added as a list to the `adj.name` argument. + ```{r run-adj-zicoseq} set.seed(123) zicoseq.obj <- ZicoSeq(feature.dat = as.matrix(assay(tse)), @@ -561,6 +587,8 @@ zicoseq.obj <- ZicoSeq(feature.dat = as.matrix(assay(tse)), perm.no = 999) ``` +The output shows the raw and adjusted p-values for clinical status. + ```{r adj-zicoseq-res} zicoseq_out <- cbind.data.frame(p.raw = zicoseq.obj$p.raw, p.adj.fdr = zicoseq.obj$p.adj.fdr) From 8baac91dc6942d721f418fbd8c725130fe416c99 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Wed, 23 Aug 2023 07:27:35 +0300 Subject: [PATCH 04/26] Fix beta diversity bug --- 20_beta_diversity.Rmd | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index c3ba0ab6c..1f1a00b5a 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -292,11 +292,11 @@ and a few covariates such as their gender and age. ```{r import-rda-dataset} # Load data data("enterotype", package = "mia") -tse <- enterotype +tse2 <- enterotype # Apply relative transform -tse <- transformAssay(tse, - method = "relabundance") +tse2 <- transformAssay(tse2, + method = "relabundance") ``` dbRDA can be perfomed with the `runRDA` function. In addition to the arguments @@ -307,14 +307,14 @@ gender and age, and exclude observations where one of these variables is missing ```{r run-rda} # Perform RDA -tse <- runRDA(tse, - assay.type = "relabundance", - formula = assay ~ ClinicalStatus + Gender + Age, - distance = "bray", - na.action = na.exclude) +tse2 <- runRDA(tse2, + assay.type = "relabundance", + formula = assay ~ ClinicalStatus + Gender + Age, + distance = "bray", + na.action = na.exclude) # Store results of PERMANOVA test -rda_info <- attr(reducedDim(tse, "RDA"), "significance") +rda_info <- attr(reducedDim(tse2, "RDA"), "significance") ``` The importance of each variable on the similarity between samples can be @@ -343,14 +343,14 @@ the following custom function. library(stringr) library(ggord) -rda <- attr(reducedDim(tse, "RDA"), "rda") +rda <- attr(reducedDim(tse2, "RDA"), "rda") # Covariates that are being analyzed variable_names <- c("ClinicalStatus", "Gender", "Age") # Since na.exclude was used, if there were rows missing information, they were # dropped off. Subset coldata so that it matches with rda. -coldata <- colData(tse)[ rownames(rda$CCA$wa), ] +coldata <- colData(tse2)[ rownames(rda$CCA$wa), ] # Adjust names # Get labels of vectors @@ -537,7 +537,7 @@ the variation between communities. ```{r} # Agglomerate data to Species level tse <- mergeFeaturesByRank(tse, - rank = "Species") + rank = "Species") # Set seed for reproducibility set.seed(1576) From 8195dee4e121bc827795b9471499e98790728492 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Wed, 23 Aug 2023 07:54:54 +0300 Subject: [PATCH 05/26] Minor change --- 95_resources.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/95_resources.Rmd b/95_resources.Rmd index 3089c1700..7dd3834df 100644 --- a/95_resources.Rmd +++ b/95_resources.Rmd @@ -28,7 +28,7 @@ [NGS Analysis Basics](http://girke.bioinformatics.ucr.edu/GEN242/tutorials/rsequences/rsequences/) provides a walk-through of the above-mentioned features with detailed examples. -### Phyloseq: an alternative container for microbiome data +### phyloseq: an alternative container for microbiome data The `phyloseq` package and class became the first widely used data container for microbiome data science in R. Many methods for taxonomic From 2951452666cbfaf3dfc2b3d8f6ac05559bde39e1 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Sat, 26 Aug 2023 14:45:39 +0300 Subject: [PATCH 06/26] Add exercise on DAA method comparison --- 20_beta_diversity.Rmd | 6 +- 30_differential_abundance.Rmd | 120 +++++++++++++++------------------- 98_exercises.Rmd | 68 +++++++++++++++++-- 3 files changed, 118 insertions(+), 76 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 1f1a00b5a..09a469ac6 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -323,7 +323,8 @@ function. We see that both clinical status and age explain more than 10% of the variance, but only age shows statistical significance. ```{r rda-permanova-res} -rda_info$permanova +rda_info$permanova %>% + knitr::kable() ``` To ensure that the homogeneity assumption holds, we retrieve the corresponding @@ -331,7 +332,8 @@ information from the results of RDA. In this case, none of the p-values is lower than the significance threshold, and thus homogeneity is observed. ```{r rda-homogeneity-res} -rda_info$homogeneity +rda_info$homogeneity %>% + knitr::kable() ``` Next, we proceed to visualize the weight and significance of each variable on diff --git a/30_differential_abundance.Rmd b/30_differential_abundance.Rmd index 2cc43ff86..9eaf19382 100755 --- a/30_differential_abundance.Rmd +++ b/30_differential_abundance.Rmd @@ -265,25 +265,16 @@ that we specify. library(ANCOMBC) # Run ANCOM-BC at the genus level and only including the prevalent genera -out <- ancombc2(data = tse, - assay_name = "counts", - fix_formula = "patient_status", - p_adj_method = "fdr", - prv_cut = 0, - group = "patient_status", - struc_zero = TRUE, - neg_lb = TRUE, - # multi group comparison is deactivated automatically - global = TRUE) -``` - -```{r ancombc-res1} -# store the FDR adjusted results -ancombc_result <- cbind.data.frame(taxid = out$res$taxon, - ancombc = as.vector(out$res$q_patient_statusControl)) - -ancombc_result <- out$res %>% - dplyr::select(starts_with(c("taxon", "lfc", "q"))) +ancombc2_out <- ancombc2(data = tse, + assay_name = "counts", + fix_formula = "patient_status", + p_adj_method = "fdr", + prv_cut = 0, + group = "patient_status", + struc_zero = TRUE, + neg_lb = TRUE, + # multi group comparison is deactivated automatically + global = TRUE) ``` The object `out` contains all model output. Again, see the @@ -295,14 +286,17 @@ object, which contains dataframes with the coefficients, standard errors, p-values and q-values. Below we show the first entries of this dataframe. -```{r ancombc-res2} -ancombc_result %>% +```{r ancombc-res} +# store the FDR adjusted results +ancombc2_out$res %>% + dplyr::select(taxon, lfc_patient_statusControl, q_patient_statusControl) %>% + filter(q_patient_statusControl < 0.05) %>% + arrange(q_patient_statusControl) %>% head() %>% knitr::kable() ``` - ### MaAsLin2 Let us next illustrate MaAsLin2 [@Mallick2020]. This method is based on @@ -337,8 +331,7 @@ Which genera are identified as differentially abundant? (leave out "head" to see ```{r, maaslin2-res} maaslin2_out$results %>% - filter(qval <= 0.05) %>% - head() %>% + filter(qval < 0.05) %>% knitr::kable() ``` @@ -366,14 +359,9 @@ set sizes. # Load package library(MicrobiomeStat) -# Store independent variables into a data.frame -meta.dat <- colData(tse) %>% - as.data.frame() %>% - dplyr::select(patient_status) - # Run LinDA -linda.res <- linda(feature.dat = as.data.frame(assay(tse)), - meta.dat = meta.dat, +linda_out <- linda(feature.dat = as.data.frame(assay(tse)), + meta.dat = as.data.frame(colData(tse)), formula = "~ patient_status", alpha = 0.05, prev.filter = 0, @@ -381,15 +369,11 @@ linda.res <- linda(feature.dat = as.data.frame(assay(tse)), ``` ```{r linda-res} -linda_out <- linda.res$output$patient_statusControl - # List genera for which H0 could be rejected: -linda_out %>% - as.data.frame() %>% +linda_out$output$patient_statusControl %>% filter(reject) %>% dplyr::select(stat, padj) %>% rownames_to_column(var = "feature") %>% - head() %>% knitr::kable() ``` @@ -413,7 +397,7 @@ rate, which has the following components: library(GUniFrac) set.seed(123) -zicoseq.obj <- ZicoSeq(feature.dat = as.matrix(assay(tse)), +zicoseq_out <- ZicoSeq(feature.dat = as.matrix(assay(tse)), meta.dat = as.data.frame(colData(tse)), grp.name = "patient_status", feature.dat.type = "count", @@ -425,21 +409,21 @@ zicoseq.obj <- ZicoSeq(feature.dat = as.matrix(assay(tse)), ``` ```{r zicoseq-res} -zicoseq_out <- cbind.data.frame(p.raw = zicoseq.obj$p.raw, - p.adj.fdr = zicoseq.obj$p.adj.fdr) +zicoseq_res <- cbind.data.frame(p.raw = zicoseq_out$p.raw, + p.adj.fdr = zicoseq_out$p.adj.fdr) -zicoseq_out %>% +zicoseq_res %>% filter(p.adj.fdr < 0.05) %>% - head() %>% + arrange(p.adj.fdr) %>% knitr::kable() ``` ```{r plot-zicoseq} ## x-axis is the effect size: R2 * direction of coefficient -ZicoSeq.plot(ZicoSeq.obj = zicoseq.obj, - meta.dat = meta.dat, - pvalue.type ='p.adj.fdr') +ZicoSeq.plot(ZicoSeq.obj = zicoseq_out, + meta.dat = as.data.frame(colData(tse)), + pvalue.type = 'p.adj.fdr') ``` ### PhILR @@ -451,10 +435,9 @@ balances. A detailed introduction to this method is available in ### Comparison of methods Although the methods described above yield unidentical results, they are -expected to agree on a few differentially abundant taxa. As an exercise, -you can compare the outcomes between the different methods in terms of -effect sizes, significances, or other aspects that are comparable -between them. +expected to agree on a few differentially abundant taxa. To draw more informed +conclusions, it is good practice to compare the outcomes of different methods in terms of found features, their effect sizes and significances, as well as other method-specific aspects. Such comparative approach is outlined in +[this exercise](#compare-daa-methods). ## DAA with confounding @@ -505,19 +488,17 @@ are kept constant. ```{r run-adj-ancombc, warning=FALSE} # perform the analysis -ancombc_cov <- ancombc2(tse, - assay_name = "counts", - fix_formula = "patient_status + cohort + library_size", - p_adj_method = "fdr", - lib_cut = 0, - group = "patient_status", - struc_zero = TRUE, - neg_lb = TRUE, - alpha = 0.05, - # multi-group comparison is deactivated automatically - global = TRUE) - -# Again we only show the first 6 entries. +ancombc2_out <- ancombc2(tse, + assay_name = "counts", + fix_formula = "patient_status + cohort + library_size", + p_adj_method = "fdr", + lib_cut = 0, + group = "patient_status", + struc_zero = TRUE, + neg_lb = TRUE, + alpha = 0.05, + # multi-group comparison is deactivated automatically + global = TRUE) ``` In the output, each taxon is assigned with several effect sizes (lfc, which @@ -528,8 +509,9 @@ names), whereas for numerical variables such as library size, they indicate whether the abundance of a given taxon varies with that variable. ```{r adj-ancombc-res} -ancombc_cov$res %>% +ancombc2_out$res %>% dplyr::select(starts_with(c("taxon", "lfc", "q"))) %>% + arrange(q_patient_statusControl) %>% head() %>% knitr::kable() ``` @@ -540,7 +522,7 @@ As in the previous method, confounders can be included in the formula with the main outcome variable. ```{r run-adj-linda} -linda_cov <- linda(as.data.frame(assay(tse, "counts")), +linda_out <- linda(as.data.frame(assay(tse, "counts")), as.data.frame(colData(tse)), formula = "~ patient_status + cohort + library_size", alpha = 0.05, @@ -555,9 +537,9 @@ obtained by accessing the corresponding items from the output object. ```{r adj-linda-res} # Select results for the patient status -linda.res <- linda_cov$output$patient_statusControl +linda_res <- linda_out$output$patient_statusControl -linda.res %>% +linda_res %>% filter(reject) %>% dplyr::select(log2FoldChange, stat, padj) %>% rownames_to_column(var = "feature") %>% @@ -575,7 +557,7 @@ For this method, confounders can be added as a list to the `adj.name` argument. ```{r run-adj-zicoseq} set.seed(123) -zicoseq.obj <- ZicoSeq(feature.dat = as.matrix(assay(tse)), +zicoseq_out <- ZicoSeq(feature.dat = as.matrix(assay(tse)), meta.dat = as.data.frame(colData(tse)), grp.name = "patient_status", adj.name = c("cohort", "library_size"), @@ -590,10 +572,10 @@ zicoseq.obj <- ZicoSeq(feature.dat = as.matrix(assay(tse)), The output shows the raw and adjusted p-values for clinical status. ```{r adj-zicoseq-res} -zicoseq_out <- cbind.data.frame(p.raw = zicoseq.obj$p.raw, - p.adj.fdr = zicoseq.obj$p.adj.fdr) +zicoseq_res <- cbind.data.frame(p.raw = zicoseq_out$p.raw, + p.adj.fdr = zicoseq_out$p.adj.fdr) -zicoseq_out %>% +zicoseq_res %>% filter(p.adj.fdr < 0.05) %>% head() %>% knitr::kable() diff --git a/98_exercises.Rmd b/98_exercises.Rmd index 76f2d4cef..5239733a4 100644 --- a/98_exercises.Rmd +++ b/98_exercises.Rmd @@ -570,11 +570,70 @@ walkthrough, which may be simplified in the future. Useful functions: runMDS, runRDA, anova.cca, transformAssay, mergeFeaturesByRank, ggplot, plotReducedDim, vegan::adonis2 - - ## Differential abundance -### Univariate analyses +### Standard analysis with ALDEx2 + +1. Import the mia and ALDEx2 packages, load peerj13075 with `data` and store it + into a variable named `tse`. +2. Agglomerate the TreeSE by genus and filter by a prevalence of 10%. You can + perform both operations in one go with `subsetByPrevalentTaxa` by specifying + the `rank` and `prevalence` arguments. +3. Model the counts assay of the TreeSE with `aldex.clr` and store it into the + variable `x`. As a second argument, provide the grouping variable `Diet`, + which is contained in a column of the colData. +4. Feed `x` to the functions `aldex.ttest` to erform t-test and to `aldex.effect` + to estimate effect sizes. Store the output into `x_tt` and `x_effect`, + respectively. +5. Create a data.frame named `aldex_out` which includes both `x_tt` and `x_effect` + and filter for the features with `wi.eBH < 0.05`. Are there any significantly + differential abundance taxa? +6. **Extra**: If these results appear boring, repeat steps 1 - 5, but use + `Gender` or `Age` as the grouping variable. Do we have any better luck with + Gender? What is the problem with Age? + +### Controlling for confounders {#control-daa-confounders} + +1. Import the mia and MicrobiomeStat packages, load peerj13075 with `data` and + store it into a variable named `tse`. +2. Agglomerate the TreeSE by genus and filter by a prevalence of 10%. You can + perform both operations in one go with `subsetByPrevalentTaxa` by specifying + the `rank` and `prevalence` arguments. +3. Model the counts assay of the TreeSE with `linda` and store the output into + a variable named `linda_out`. Provide the colData converted into a data.frame + (with `as.data.frame`) as the second argument, and a `formula` with the Age, + Gender and Diet as variables. For example, `formula = "~ A + B"` represents a + formula with variables A and B. +4. Extract the `output$AgeElderly` object from `linda_out` with `$` and store it + into a variable named `linda_res`. +5. Filter `linda_res` for features with `reject == TRUE`. How many differentially + abundant taxa were found? What are their names and how significant are they + in terms of log-fold change and adjusted p-value? + +### Comparing methods {#compare-daa-methods} + +Here, we conduct DAA with identical parameters as in the +[previous exercise](#control-daa-confounders), but with a different method, +namely ZicoSeq. We aim to compare the results between these two methods and draw better informed conclusions from such comparative approach. + +1. Import the mia and MicrobiomeStat packages, load peerj13075 with `data` and + store it into a variable named `tse`. +2. Agglomerate the TreeSE by genus and filter by a prevalence of 10%. You can + perform both operations in one go with `subsetByPrevalentTaxa` by specifying + the `rank` and `prevalence` arguments. +3. Model the counts assay of the TreeSE with `ZicoSeq` as the `feature.dat` + argument and store the output into a variable named `zicoseq_out`. Provide + also the colData converted to a data.frame (with `as.data.frame`) as + `meta.dat`. In addition, set `grp.name` to `"Age"`, `adj.name` to + `c("Diet", "Gender")`, `feature.dat.type` to `"count"`, `return.feature.dat` + to `TRUE` and `perm.no` to `999`. +4. View the top six differentially abundant taxa and their adjusted p-values + with `head(sort(zicoseq_out$p.adj.fdr))`. Is there any significant taxon + according to ZicoSeq? Compared to the output of linda, do we see the same + taxa at the top in terms of significance? Overall, to what extent do the two + methods agree with one another? + +### Workflow 1 1. Get the abundances for an individual feature (taxonomic group / row) 2. Visualize the abundances per group with boxplot / jitterplot @@ -588,7 +647,7 @@ Useful functions: runMDS, runRDA, anova.cca, transformAssay, mergeFeaturesByRank Useful functions: [], ggplot2::geom_boxplot, ggplot2::geom_jitter, wilcox.test, lm.test, transformAssay, p.adjust -### Differential abundance analysis +### Workflow 2 1. install the latest development version of mia from GitHub. 2. Load experimental dataset from mia. @@ -600,7 +659,6 @@ Useful functions: [], ggplot2::geom_boxplot, ggplot2::geom_jitter, wilcox.test, Useful functions: wilcox.test, kruskal.test, ggplot, pheatmap, ComplexHeatMap::Heatmap, ancombc, aldex2, maaslin2, mergeFeaturesByRank, transformAssay, subsetByPrevalentFeatures - ## Visualization ### Multivariate ordination From 49adb3fb5fc74fa03538ad6ad45cce54ae7f081b Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Sat, 26 Aug 2023 14:54:24 +0300 Subject: [PATCH 07/26] Minor fix --- 98_exercises.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/98_exercises.Rmd b/98_exercises.Rmd index 5239733a4..910cd351b 100644 --- a/98_exercises.Rmd +++ b/98_exercises.Rmd @@ -616,7 +616,7 @@ Here, we conduct DAA with identical parameters as in the [previous exercise](#control-daa-confounders), but with a different method, namely ZicoSeq. We aim to compare the results between these two methods and draw better informed conclusions from such comparative approach. -1. Import the mia and MicrobiomeStat packages, load peerj13075 with `data` and +1. Import the mia and GUniFrac packages, load peerj13075 with `data` and store it into a variable named `tse`. 2. Agglomerate the TreeSE by genus and filter by a prevalence of 10%. You can perform both operations in one go with `subsetByPrevalentTaxa` by specifying From 4222694ccaeff1b7c93b4e4f0ed8a8f0a75d121b Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Sat, 26 Aug 2023 15:11:44 +0300 Subject: [PATCH 08/26] Minor fix --- 20_beta_diversity.Rmd | 2 ++ 98_exercises.Rmd | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 09a469ac6..718b8cd4c 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -54,6 +54,7 @@ To run the examples in this chapter, the following packages should be imported: * vegan: ecological distances * ggplot2: plotting * patchwork: combining plots +* dplyr: pipe operator ```{r betadiv-packages, include = FALSE} library(mia) @@ -61,6 +62,7 @@ library(scater) library(vegan) library(ggplot2) library(patchwork) +library(dplyr) ``` ## Unsupervised ordination {#unsupervised-ordination} diff --git a/98_exercises.Rmd b/98_exercises.Rmd index 910cd351b..356c791a9 100644 --- a/98_exercises.Rmd +++ b/98_exercises.Rmd @@ -185,7 +185,7 @@ Usefuls functions: DataFrame, TreeSummarizedExperiment, matrix, rownames, colnam ### Importing data Raw data of different types can be imported as a TreeSE with a number of -functions explained in chapter \@ref(#import-from-file). You can also check the +functions explained in chapter \@ref(import-from-file). You can also check the [function reference in the mia package](https://microbiome.github.io/mia/reference/index.html). 1. Get familiar with the @@ -318,7 +318,7 @@ Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAs 1. Import the mia package, load peerj13075 with `data` and store it into a variable named `tse`. 2. Transform the counts assay into relative abundances with `transformAssay` and - store it into the TreeSE as an assay named `relabund` (see chapter \ref(assay-transform)). + store it into the TreeSE as an assay named `relabund` (see chapter \@ref(assay-transform)). 3. Similarly, perform a clr transformation on the counts assay with a `pseudocount` of 1 and add it to the TreeSE as a new assay. 4. List the available assays by name with `assays`. From 65014ef1b02daa66de8b014ef774fde0745b16f6 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Mon, 11 Sep 2023 20:31:53 +0200 Subject: [PATCH 09/26] Streamline RDA section with new plotRDA function --- 20_beta_diversity.Rmd | 124 +++++++++++------------------------------- 1 file changed, 31 insertions(+), 93 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 718b8cd4c..129e90e9a 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -47,23 +47,6 @@ Reduction (UMAP), whereas the latter is mainly represented by distance-based Redundancy Analysis (dbRDA). We will first discuss unsupervised ordination methods and then proceed to supervised ones. -To run the examples in this chapter, the following packages should be imported: - -* mia: microbiome analysis framework -* scater: plotting reduced dimensions -* vegan: ecological distances -* ggplot2: plotting -* patchwork: combining plots -* dplyr: pipe operator - -```{r betadiv-packages, include = FALSE} -library(mia) -library(scater) -library(vegan) -library(ggplot2) -library(patchwork) -library(dplyr) -``` ## Unsupervised ordination {#unsupervised-ordination} @@ -75,10 +58,9 @@ demonstration we will analyse beta diversity in GlobalPatterns, and observe the variation between stool samples and those with a different origin. ```{r prep-tse} -# Example data +# Load mia and import sample dataset +library(mia) data("GlobalPatterns", package = "mia") - -# Data matrix (features x samples) tse <- GlobalPatterns # some beta diversity metrics are usually applied to relative abundances @@ -106,6 +88,9 @@ dimensions via an ordination method, the results of which can be stored in the and `runNMDS` functions. ```{r runMDS} +# Load package to plot reducedDim +library(scater) + # Perform PCoA tse <- runMDS(tse, FUN = vegan::vegdist, @@ -135,31 +120,40 @@ p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep p ``` -With additional tools from the ggplot2 package, ordination methods can be -compared to find similarities between them or select the most suitable one to -visualize beta diversity in the light of the research question. +Multiple ordination plots are combined into a multi-panel plot with the +patchwork package, so that different methods can be compared to find similarities +between them or select the most suitable one to visualize beta diversity in the +light of the research question. ```{r plot-mds-nmds-comparison, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or euclidean distances on the GlobalPattern dataset."} +# Run MDS on counts assay with Euclidean distances tse <- runMDS(tse, FUN = vegan::vegdist, name = "MDS_euclidean", method = "euclidean", assay.type = "counts") +# Run NMDS on counts assay with Bray-Curtis distances tse <- runNMDS(tse, FUN = vegan::vegdist, name = "NMDS_BC") +# Run NMDS on counts assay with Euclidean distances tse <- runNMDS(tse, FUN = vegan::vegdist, name = "NMDS_euclidean", method = "euclidean") +# Generate plots for all 4 reducedDims plots <- lapply(c("PCoA_BC", "MDS_euclidean", "NMDS_BC", "NMDS_euclidean"), plotReducedDim, object = tse, colour_by = "Group") +# Load package for multi-panel plotting +library(patchwork) + +# Generate multi-panel plot ((plots[[1]] | plots[[2]]) / (plots[[3]] | plots[[4]])) + plot_layout(guides = "collect") ``` @@ -240,6 +234,9 @@ would report relative stress, which varies in the unit interval and is better if smaller. This can be calculated as shown below. ```{r relstress} +# Load vegan package +library(vegan) + # Quantify dissimilarities in the original feature space x <- assay(tse, "relabundance") # Pick relabunance assay separately d0 <- as.matrix(vegdist(t(x), "bray")) @@ -282,10 +279,10 @@ them. The result shows how much each covariate affects beta diversity. The table below illustrates the relation between supervised and unsupervised ordination methods. -| | supervised ordination | unsupervised ordination | -|:-------------------------:|:----------------------:|:------------------------:| -| Euclidean distance | RDA | PCA | -| non-Euclidean distance | dbRDA | PCoA | +| | supervised ordination | unsupervised ordination +|:-------------------------:|:----------------------:|:------------------------: +| Euclidean distance | RDA | PCA +| non-Euclidean distance | dbRDA | PCoA We demonstrate the usage of dbRDA with the enterotype dataset, where samples correspond to patients. The colData contains the clinical status of each patient @@ -325,7 +322,7 @@ function. We see that both clinical status and age explain more than 10% of the variance, but only age shows statistical significance. ```{r rda-permanova-res} -rda_info$permanova %>% +rda_info$permanova |> knitr::kable() ``` @@ -334,79 +331,20 @@ information from the results of RDA. In this case, none of the p-values is lower than the significance threshold, and thus homogeneity is observed. ```{r rda-homogeneity-res} -rda_info$homogeneity %>% +rda_info$homogeneity |> knitr::kable() ``` Next, we proceed to visualize the weight and significance of each variable on the similarity between samples with an RDA plot, which can be generated with -the following custom function. +the `plotRDA` function from the miaViz package. ```{r plot-rda} # Load packages for plotting function -library(stringr) -library(ggord) - -rda <- attr(reducedDim(tse2, "RDA"), "rda") - -# Covariates that are being analyzed -variable_names <- c("ClinicalStatus", "Gender", "Age") - -# Since na.exclude was used, if there were rows missing information, they were -# dropped off. Subset coldata so that it matches with rda. -coldata <- colData(tse2)[ rownames(rda$CCA$wa), ] - -# Adjust names -# Get labels of vectors -vec_lab_old <- rownames(rda$CCA$biplot) - -# Loop through vector labels -vec_lab <- sapply(vec_lab_old, FUN = function(name){ - # Get the variable name - variable_name <- variable_names[ str_detect(name, variable_names) ] - # If the vector label includes also group name - if( !any(name %in% variable_names) ){ - # Get the group names - group_name <- unique( coldata[[variable_name]] )[ - which( paste0(variable_name, unique( coldata[[variable_name]] )) == name ) ] - # Modify vector so that group is separated from variable name - new_name <- paste0(variable_name, " \U2012 ", group_name) - } else{ - new_name <- name - } - # Add percentage how much this variable explains, and p-value - new_name <- expr(paste(!!new_name, " (", - !!format(round( rda_info$permanova[variable_name, "Explained variance"]*100, 1), nsmall = 1), - "%, ",italic("P"), " = ", - !!gsub("0\\.","\\.", format(round( rda_info$permanova[variable_name, "Pr(>F)"], 3), - nsmall = 3)), ")")) - - return(new_name) -}) -# Add names -names(vec_lab) <- vec_lab_old - -# Create labels for axis -xlab <- paste0("RDA1 (", format(round( rda$CCA$eig[[1]]/rda$CCA$tot.chi*100, 1), nsmall = 1 ), "%)") -ylab <- paste0("RDA2 (", format(round( rda$CCA$eig[[2]]/rda$CCA$tot.chi*100, 1), nsmall = 1 ), "%)") - -# Create a plot -plot <- ggord(rda, grp_in = coldata[["ClinicalStatus"]], vec_lab = vec_lab, - alpha = 0.5, - size = 4, addsize = -4, - #ext= 0.7, - txt = 3.5, repel = TRUE, - #coord_fix = FALSE - ) + - # Adjust titles and labels - guides(colour = guide_legend("ClinicalStatus"), - fill = guide_legend("ClinicalStatus"), - group = guide_legend("ClinicalStatus"), - shape = guide_legend("ClinicalStatus"), - x = guide_axis(xlab), - y = guide_axis(ylab)) + - theme( axis.title = element_text(size = 10) ) -plot +library(miaViz) + +# Generate RDA plot coloured by clinical status +plotRDA(tse2, "RDA", colour_by = "ClinicalStatus") ``` From the plot above, we can see that only age significantly describes From 0a3e503b806c7c49012a5b347d54b83436120580 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Mon, 11 Sep 2023 20:39:00 +0200 Subject: [PATCH 10/26] Fix rmd table in beta diversity chapter --- 20_beta_diversity.Rmd | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 129e90e9a..17d213600 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -279,10 +279,10 @@ them. The result shows how much each covariate affects beta diversity. The table below illustrates the relation between supervised and unsupervised ordination methods. -| | supervised ordination | unsupervised ordination -|:-------------------------:|:----------------------:|:------------------------: -| Euclidean distance | RDA | PCA -| non-Euclidean distance | dbRDA | PCoA +| | supervised ordination | unsupervised ordination | +|:------------------------:|:----------------------:|:------------------------:| +| Euclidean distance | RDA | PCA | +| non-Euclidean distance | dbRDA | PCoA | We demonstrate the usage of dbRDA with the enterotype dataset, where samples correspond to patients. The colData contains the clinical status of each patient From 092d9c2ca308765d32b11c72fffcb2f8da2e7a32 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Mon, 11 Sep 2023 21:49:20 +0200 Subject: [PATCH 11/26] Fix miaTime missing error --- 04_containers.Rmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/04_containers.Rmd b/04_containers.Rmd index 630069e8b..57b8f5880 100644 --- a/04_containers.Rmd +++ b/04_containers.Rmd @@ -68,7 +68,8 @@ Let us load example data and rename it as tse. ```{r} library(mia) -data(hitchip1006, package="miaTime") +library(miaTime) +data("hitchip1006", package = "miaTime") tse <- hitchip1006 ``` From 84b0e883f8a034d8e7ec370c83914f107eac2612 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 12 Sep 2023 07:58:31 +0200 Subject: [PATCH 12/26] Fix miaTime missing error --- 04_containers.Rmd | 1 - DESCRIPTION | 1 + 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/04_containers.Rmd b/04_containers.Rmd index 57b8f5880..e3b460592 100644 --- a/04_containers.Rmd +++ b/04_containers.Rmd @@ -68,7 +68,6 @@ Let us load example data and rename it as tse. ```{r} library(mia) -library(miaTime) data("hitchip1006", package = "miaTime") tse <- hitchip1006 ``` diff --git a/DESCRIPTION b/DESCRIPTION index 401fc3e37..42e7665b2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -40,6 +40,7 @@ Suggests: matrixStats, mia, miaViz, + miaTime, MicrobiotaProcess, MicrobiomeStat, microbiomeDataSets, From a5555135e8bcc99fdc42d2493c69998b309348af Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 12 Sep 2023 09:09:34 +0200 Subject: [PATCH 13/26] Add dendextend to DESCRIPTION --- DESCRIPTION | 1 + 1 file changed, 1 insertion(+) diff --git a/DESCRIPTION b/DESCRIPTION index 42e7665b2..4f4a081b4 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -31,6 +31,7 @@ Suggests: BiocCheck, bookdown, curatedMetagenomicData, + dendextend, fido, ggpubr, HDF5Array, From 079e23cb0de25174beaec7d8a4f7fd780bc3c930 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 12 Sep 2023 09:52:38 +0200 Subject: [PATCH 14/26] Add other missing deps --- DESCRIPTION | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/DESCRIPTION b/DESCRIPTION index 4f4a081b4..a6e7ee43b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -29,7 +29,9 @@ Suggests: ANCOMBC, benchdamic, BiocCheck, + biclust, bookdown, + cobiclust, curatedMetagenomicData, dendextend, fido, @@ -46,7 +48,9 @@ Suggests: MicrobiomeStat, microbiomeDataSets, mikropml, + Nbclust, patchwork, + pheatmap, philr, picante, rebook, From 67e32e50e09a6bb4f45514c0441ccc6ab97c8875 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 12 Sep 2023 10:24:41 +0200 Subject: [PATCH 15/26] Fix dep names --- DESCRIPTION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index a6e7ee43b..31e0d82cf 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -48,7 +48,7 @@ Suggests: MicrobiomeStat, microbiomeDataSets, mikropml, - Nbclust, + NbClust, patchwork, pheatmap, philr, From f0764e8a545a208c7739946b7d7a6bbc3eddcdc4 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 12 Sep 2023 11:03:58 +0200 Subject: [PATCH 16/26] Add multiassay analyses deps --- DESCRIPTION | 2 ++ 1 file changed, 2 insertions(+) diff --git a/DESCRIPTION b/DESCRIPTION index 31e0d82cf..da68c2ba6 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -48,12 +48,14 @@ Suggests: MicrobiomeStat, microbiomeDataSets, mikropml, + MOFA2, NbClust, patchwork, pheatmap, philr, picante, rebook, + reticulate, rmarkdown, Rtsne, scater, From 7f8190f0d8734544469c9e038f0fe381795e32ef Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 12 Sep 2023 11:18:32 +0200 Subject: [PATCH 17/26] Remove reticulate from deps --- DESCRIPTION | 1 - 1 file changed, 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index da68c2ba6..6f3da5d1a 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -55,7 +55,6 @@ Suggests: philr, picante, rebook, - reticulate, rmarkdown, Rtsne, scater, From db78f67e72ed0d05574a605decc62363a77a9c04 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 12 Sep 2023 13:55:33 +0200 Subject: [PATCH 18/26] Add deps for extra materials --- 97_extra_materials.Rmd | 5 +---- DESCRIPTION | 5 ++++- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/97_extra_materials.Rmd b/97_extra_materials.Rmd index b35fc462d..55ee38696 100644 --- a/97_extra_materials.Rmd +++ b/97_extra_materials.Rmd @@ -232,14 +232,13 @@ plot(posterior, par="Lambda", focus.cov = rownames(X)[c(2,4)]) ## Interactive 3D Plots ```{r, message=FALSE, warning=FALSE} -# Installing libraryd packages +# Load libraries library(rgl) library(plotly) ``` ```{r setup2, warning=FALSE, message=FALSE} library(knitr) -library(rgl) knitr::knit_hooks$set(webgl = hook_webgl) ``` @@ -247,8 +246,6 @@ knitr::knit_hooks$set(webgl = hook_webgl) In this section we make a 3D version of the earlier Visualizing the most dominant genus on PCoA (see \@ref(quality-control)), with the help of the plotly [@Sievert2020]. ```{r, message=FALSE, warning=FALSE} -# Installing the package -library(curatedMetagenomicData) # Importing necessary libraries library(curatedMetagenomicData) library(dplyr) diff --git a/DESCRIPTION b/DESCRIPTION index 6f3da5d1a..a4f850668 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -34,6 +34,7 @@ Suggests: cobiclust, curatedMetagenomicData, dendextend, + DT, fido, ggpubr, HDF5Array, @@ -53,8 +54,10 @@ Suggests: patchwork, pheatmap, philr, - picante, + picante, + plotly, rebook, + rgl, rmarkdown, Rtsne, scater, From dc6b188ae9008949b13b5e379bf613d7617c5910 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Thu, 14 Sep 2023 16:05:34 +0200 Subject: [PATCH 19/26] Improve PCoA example --- 20_beta_diversity.Rmd | 80 +++++++++++++++++++++++++++---------------- 1 file changed, 50 insertions(+), 30 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 17d213600..a59e10d8b 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -63,10 +63,17 @@ library(mia) data("GlobalPatterns", package = "mia") tse <- GlobalPatterns -# some beta diversity metrics are usually applied to relative abundances +# Beta diversity metrics like Bray-Curtis are often applied to relabundances tse <- transformAssay(tse, + assay.type = "counts", method = "relabundance") +# Other metrics like Aitchison to clr-transformed data +tse <- transformAssay(tse, + assay.type = "relabundance", + method = "clr", + pseudocount = 1) + # Add group information Feces yes/no tse$Group <- tse$SampleType == "Feces" ``` @@ -88,15 +95,12 @@ dimensions via an ordination method, the results of which can be stored in the and `runNMDS` functions. ```{r runMDS} -# Load package to plot reducedDim -library(scater) - -# Perform PCoA +# Run PCoA on relabundance assay with Bray-Curtis distances tse <- runMDS(tse, FUN = vegan::vegdist, method = "bray", - name = "PCoA_BC", - assay.type = "relabundance") + assay.type = "relabundance", + name = "MDS_bray") ``` Sample dissimilarity can be visualized on a lower-dimensional display (typically @@ -105,12 +109,15 @@ provides tools to incorporate additional information encoded by color, shape, size and other aesthetics. Can you find any difference between the groups? ```{r plot-mds-bray-curtis, fig.cap = "MDS plot based on the Bray-Curtis distances on the GlobalPattern dataset."} +# Load package to plot reducedDim +library(scater) + # Create ggplot object -p <- plotReducedDim(tse, "PCoA_BC", +p <- plotReducedDim(tse, "MDS_bray", colour_by = "Group") # Calculate explained variance -e <- attr(reducedDim(tse, "PCoA_BC"), "eig") +e <- attr(reducedDim(tse, "MDS_bray"), "eig") rel_eig <- e / sum(e[e > 0]) # Add explained variance for each axis @@ -120,41 +127,54 @@ p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep p ``` -Multiple ordination plots are combined into a multi-panel plot with the -patchwork package, so that different methods can be compared to find similarities -between them or select the most suitable one to visualize beta diversity in the -light of the research question. +A few combinations of beta diversity metrics and assay types are conventionally +used. For instance, Bray-Curtis dissimilarity and Aitchison distance are often applied to the relative abundance and the clr assays, respectively. Besides +**beta diversity metric** and **assay type**, the **PCoA algorithm** is also a variable that should be considered. Below, we show how the choice of these three +factors can affect the resulting lower-dimensional data. + +```{r mds-nmds-comparison, warning=FALSE, message=FALSE} +# Run NMDS on relabundance assay with Bray-Curtis distances +tse <- runNMDS(tse, + FUN = vegan::vegdist, + method = "bray", + assay.type = "relabundance", + name = "NMDS_bray") -```{r plot-mds-nmds-comparison, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or euclidean distances on the GlobalPattern dataset."} -# Run MDS on counts assay with Euclidean distances +# Run MDS on clr assay with Aitchison distances tse <- runMDS(tse, FUN = vegan::vegdist, - name = "MDS_euclidean", - method = "euclidean", - assay.type = "counts") + method = "aitchison", + assay.type = "clr", + pseudocount = 1, + name = "MDS_aitchison") -# Run NMDS on counts assay with Bray-Curtis distances +# Run NMDS on clr assay with Aitchison distances tse <- runNMDS(tse, FUN = vegan::vegdist, - name = "NMDS_BC") + method = "aitchison", + assay.type = "clr", + pseudocount = 1, + name = "NMDS_aitchison") +``` -# Run NMDS on counts assay with Euclidean distances -tse <- runNMDS(tse, - FUN = vegan::vegdist, - name = "NMDS_euclidean", - method = "euclidean") +Multiple ordination plots are combined into a multi-panel plot with the +patchwork package, so that different methods can be compared to find similarities +between them or select the most suitable one to visualize beta diversity in the +light of the research question. + +```{r, fig.cap = "Comparison of MDS and NMDS plots based on the Bray-Curtis or Aitchison distances on the GlobalPattern dataset."} +# Load package for multi-panel plotting +library(patchwork) # Generate plots for all 4 reducedDims -plots <- lapply(c("PCoA_BC", "MDS_euclidean", "NMDS_BC", "NMDS_euclidean"), +plots <- lapply(c("MDS_bray", "NMDS_aitchison", + "NMDS_bray", "NMDS_aitchison"), plotReducedDim, object = tse, colour_by = "Group") -# Load package for multi-panel plotting -library(patchwork) - # Generate multi-panel plot -((plots[[1]] | plots[[2]]) / (plots[[3]] | plots[[4]])) + +wrap_plots(plots) + plot_layout(guides = "collect") ``` From a59b8293d0e5a4ad1a9b70ed0c6a523eebcfaf17 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Thu, 14 Sep 2023 17:17:58 +0200 Subject: [PATCH 20/26] Fix deployment --- 20_beta_diversity.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index a59e10d8b..3f6b95502 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -95,6 +95,9 @@ dimensions via an ordination method, the results of which can be stored in the and `runNMDS` functions. ```{r runMDS} +# Load package to plot reducedDim +library(scater) + # Run PCoA on relabundance assay with Bray-Curtis distances tse <- runMDS(tse, FUN = vegan::vegdist, @@ -109,9 +112,6 @@ provides tools to incorporate additional information encoded by color, shape, size and other aesthetics. Can you find any difference between the groups? ```{r plot-mds-bray-curtis, fig.cap = "MDS plot based on the Bray-Curtis distances on the GlobalPattern dataset."} -# Load package to plot reducedDim -library(scater) - # Create ggplot object p <- plotReducedDim(tse, "MDS_bray", colour_by = "Group") From 1dd02ffe43568b71c3c5ce8cb6d2c21d15b4e0d9 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Sat, 23 Sep 2023 17:07:46 +0200 Subject: [PATCH 21/26] Implement pseudocount = TRUE and minor fixes --- 20_beta_diversity.Rmd | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 3f6b95502..dbff46dd3 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -72,7 +72,7 @@ tse <- transformAssay(tse, tse <- transformAssay(tse, assay.type = "relabundance", method = "clr", - pseudocount = 1) + pseudocount = TRUE) # Add group information Feces yes/no tse$Group <- tse$SampleType == "Feces" @@ -143,17 +143,15 @@ tse <- runNMDS(tse, # Run MDS on clr assay with Aitchison distances tse <- runMDS(tse, FUN = vegan::vegdist, - method = "aitchison", + method = "euclidean", assay.type = "clr", - pseudocount = 1, name = "MDS_aitchison") -# Run NMDS on clr assay with Aitchison distances +# Run NMDS on clr assay with Euclidean distances tse <- runNMDS(tse, FUN = vegan::vegdist, - method = "aitchison", + method = "euclidean", assay.type = "clr", - pseudocount = 1, name = "NMDS_aitchison") ``` @@ -167,7 +165,7 @@ light of the research question. library(patchwork) # Generate plots for all 4 reducedDims -plots <- lapply(c("MDS_bray", "NMDS_aitchison", +plots <- lapply(c("MDS_bray", "MDS_aitchison", "NMDS_bray", "NMDS_aitchison"), plotReducedDim, object = tse, @@ -302,7 +300,7 @@ methods. | | supervised ordination | unsupervised ordination | |:------------------------:|:----------------------:|:------------------------:| | Euclidean distance | RDA | PCA | -| non-Euclidean distance | dbRDA | PCoA | +| non-Euclidean distance | dbRDA | PCoA/MDS, NMDS and UMAP | We demonstrate the usage of dbRDA with the enterotype dataset, where samples correspond to patients. The colData contains the clinical status of each patient From a7a4bf8ff15deec4721019139a593a3f38a0db4f Mon Sep 17 00:00:00 2001 From: Elina297 <125908303+Elina297@users.noreply.github.com> Date: Mon, 25 Sep 2023 11:19:26 +0200 Subject: [PATCH 22/26] Update 30_differential_abundance (#348) --- 30_differential_abundance.Rmd | 1 - 1 file changed, 1 deletion(-) diff --git a/30_differential_abundance.Rmd b/30_differential_abundance.Rmd index 9eaf19382..c3dc58929 100755 --- a/30_differential_abundance.Rmd +++ b/30_differential_abundance.Rmd @@ -422,7 +422,6 @@ zicoseq_res %>% ```{r plot-zicoseq} ## x-axis is the effect size: R2 * direction of coefficient ZicoSeq.plot(ZicoSeq.obj = zicoseq_out, - meta.dat = as.data.frame(colData(tse)), pvalue.type = 'p.adj.fdr') ``` From a85760641363a7fbc597c9986accc132661d0ad9 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Mon, 25 Sep 2023 19:27:07 +0300 Subject: [PATCH 23/26] Replace pseudocount 1 with TRUE throughout book --- 20_beta_diversity.Rmd | 2 +- 23_multi-assay_analyses.Rmd | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index dbff46dd3..7d9804189 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -127,7 +127,7 @@ p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep p ``` -A few combinations of beta diversity metrics and assay types are conventionally +A few combinations of beta diversity metrics and assay types are tipically used. For instance, Bray-Curtis dissimilarity and Aitchison distance are often applied to the relative abundance and the clr assays, respectively. Besides **beta diversity metric** and **assay type**, the **PCoA algorithm** is also a variable that should be considered. Below, we show how the choice of these three factors can affect the resulting lower-dimensional data. diff --git a/23_multi-assay_analyses.Rmd b/23_multi-assay_analyses.Rmd index c88e1153e..683de3c18 100644 --- a/23_multi-assay_analyses.Rmd +++ b/23_multi-assay_analyses.Rmd @@ -102,7 +102,7 @@ bacterium X is present, is the concentration of metabolite Y lower or higher"? # Agglomerate microbiome data at family level mae[[1]] <- mergeFeaturesByPrevalence(mae[[1]], rank = "Family") # Does log10 transform for microbiome data -mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = 1) +mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = TRUE) # Give unique names so that we do not have problems when we are creating a plot rownames(mae[[1]]) <- getTaxonomyLabels(mae[[1]]) @@ -193,8 +193,8 @@ mae[[2]] <- transformAssay(mae[[2]], assay.type = "nmr", # Transforming biomarker data with z-transform mae[[3]] <- transformAssay(mae[[3]], assay.type = "signals", - MARGIN = "features", - method = "z", pseudocount = 1) + MARGIN = "features", + method = "z", pseudocount = TRUE) # Removing assays no longer needed assay(mae[[1]], "counts") <- NULL From af6f052670e07ce49575e567ea562988e82b80c3 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Mon, 25 Sep 2023 20:03:39 +0300 Subject: [PATCH 24/26] Add table of typical beta div combinations --- 20_beta_diversity.Rmd | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 7d9804189..6a43c34d0 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -16,11 +16,11 @@ knitr::opts_chunk$set( # Community Similarity {#community-similarity} -Whereas alpha diversity focuses on community variation within a community -(one sample), beta diversity quantifies the dissimilarity between communities -(multiple samples). In microbiome research, the most popular metrics of beta +Beta diversity quantifies the dissimilarity between communities (multiple +samples), as opposed to alpha diversity which focuses on variation within a +community (one sample). In microbiome research, commonly used metrics of beta diversity include the Bray-Curtis index (for compositional data), Jaccard index -(for presence / absence data, ignoring abundance information), Aitchison distance +(for presence/absence data, ignoring abundance information), Aitchison distance (Euclidean distance for clr transformed abundances, aiming to avoid the compositionality bias), and the Unifrac distance (that takes into account the phylogenetic tree information). Notably, only some of these measures are actual @@ -28,6 +28,15 @@ _distances_, as this is a mathematical concept whose definition is not satisfied by certain ecological measure, such as the Bray-Curtis index. Therefore, the terms dissimilarity and beta diversity are preferred. +| Method description | Assay type | Beta diversity metric | +|:--------------------------:|:------------------:|:---------------------:| +| Quantitative profiling | Absolute counts | Bray-Curtis | +| Relative profiling | Relative abundances | Bray-Curtis | +| Aitchison distance | clr | Euclidean | +| Robust Aitchison distance | rclr | Euclidean | +| Present/Absence similarity | Absolute counts | Jaccard | +| Phylogenetic distance | Absolute counts | Unifrac | + In practice, beta diversity is usually represented as a `dist` object, a triangular matrix where the distance between each pair of samples is encoded by a specific cell. This distance matrix can then undergo ordination, which is an @@ -128,11 +137,13 @@ p ``` A few combinations of beta diversity metrics and assay types are tipically -used. For instance, Bray-Curtis dissimilarity and Aitchison distance are often applied to the relative abundance and the clr assays, respectively. Besides -**beta diversity metric** and **assay type**, the **PCoA algorithm** is also a variable that should be considered. Below, we show how the choice of these three +used. For instance, Bray-Curtis dissimilarity and Euclidean distance are often +applied to the relative abundance and the clr assays, respectively. Besides +**beta diversity metric** and **assay type**, the **PCoA algorithm** is also a +variable that should be considered. Below, we show how the choice of these three factors can affect the resulting lower-dimensional data. -```{r mds-nmds-comparison, warning=FALSE, message=FALSE} +```{r mds-nmds-comparison, results='hide'} # Run NMDS on relabundance assay with Bray-Curtis distances tse <- runNMDS(tse, FUN = vegan::vegdist, @@ -181,16 +192,14 @@ relationship of features in form on a `phylo` tree. `calculateUnifrac` performs the calculation to return a `dist` object, which can again be used within `runMDS`. -```{r} +```{r plot-unifrac, fig.cap = "Unifrac distances scaled by MDS of the GlobalPattern dataset."} tse <- runMDS(tse, FUN = mia::calculateUnifrac, name = "Unifrac", tree = rowTree(tse), ntop = nrow(tse), assay.type = "counts") -``` -```{r plot-unifrac, fig.cap = "Unifrac distances scaled by MDS of the GlobalPattern dataset."} plotReducedDim(tse, "Unifrac", colour_by = "Group") ``` From a9f91f32eed20b0d9f9d1c3d41505f30debe86e6 Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Tue, 26 Sep 2023 17:14:33 +0300 Subject: [PATCH 25/26] Fix pseudocount bug --- 20_beta_diversity.Rmd | 4 ++-- 23_multi-assay_analyses.Rmd | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 6a43c34d0..7f137fc53 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -28,8 +28,8 @@ _distances_, as this is a mathematical concept whose definition is not satisfied by certain ecological measure, such as the Bray-Curtis index. Therefore, the terms dissimilarity and beta diversity are preferred. -| Method description | Assay type | Beta diversity metric | -|:--------------------------:|:------------------:|:---------------------:| +| Method description | Assay type | Beta diversity metric | +|:--------------------------:|:-------------------:|:---------------------:| | Quantitative profiling | Absolute counts | Bray-Curtis | | Relative profiling | Relative abundances | Bray-Curtis | | Aitchison distance | clr | Euclidean | diff --git a/23_multi-assay_analyses.Rmd b/23_multi-assay_analyses.Rmd index 683de3c18..3f04a1599 100644 --- a/23_multi-assay_analyses.Rmd +++ b/23_multi-assay_analyses.Rmd @@ -194,7 +194,7 @@ mae[[2]] <- transformAssay(mae[[2]], assay.type = "nmr", # Transforming biomarker data with z-transform mae[[3]] <- transformAssay(mae[[3]], assay.type = "signals", MARGIN = "features", - method = "z", pseudocount = TRUE) + method = "z", pseudocount = 1) # Removing assays no longer needed assay(mae[[1]], "counts") <- NULL From f460fc5b1a81af9ea1317e8eed9c47ac67599adc Mon Sep 17 00:00:00 2001 From: Giulio Benedetti Date: Wed, 27 Sep 2023 20:11:41 +0300 Subject: [PATCH 26/26] Update beta diversity table --- 20_beta_diversity.Rmd | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/20_beta_diversity.Rmd b/20_beta_diversity.Rmd index 7f137fc53..ac0c74de2 100644 --- a/20_beta_diversity.Rmd +++ b/20_beta_diversity.Rmd @@ -28,14 +28,16 @@ _distances_, as this is a mathematical concept whose definition is not satisfied by certain ecological measure, such as the Bray-Curtis index. Therefore, the terms dissimilarity and beta diversity are preferred. -| Method description | Assay type | Beta diversity metric | -|:--------------------------:|:-------------------:|:---------------------:| -| Quantitative profiling | Absolute counts | Bray-Curtis | -| Relative profiling | Relative abundances | Bray-Curtis | -| Aitchison distance | clr | Euclidean | -| Robust Aitchison distance | rclr | Euclidean | -| Present/Absence similarity | Absolute counts | Jaccard | -| Phylogenetic distance | Absolute counts | Unifrac | +| Method description | Assay type | Beta diversity metric | +|:---------------------------:|:-------------------:|:---------------------:| +| Quantitative profiling | Absolute counts | Bray-Curtis | +| Relative profiling | Relative abundances | Bray-Curtis | +| Aitchison distance | Absolute counts | Aitchison | +| Aitchison distance | clr | Euclidean | +| Robust Aitchison distance | rclr | Euclidean | +| Presence/Absence similarity | Relative abundances | Jaccard | +| Presence/Absence similarity | Absolute counts | Jaccard | +| Phylogenetic distance | Rarefied counts | Unifrac | In practice, beta diversity is usually represented as a `dist` object, a triangular matrix where the distance between each pair of samples is encoded by @@ -136,7 +138,7 @@ p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep p ``` -A few combinations of beta diversity metrics and assay types are tipically +A few combinations of beta diversity metrics and assay types are typically used. For instance, Bray-Curtis dissimilarity and Euclidean distance are often applied to the relative abundance and the clr assays, respectively. Besides **beta diversity metric** and **assay type**, the **PCoA algorithm** is also a