This repository contains scripts
to perform statistical analyses on combined datasets. These scripts were used to make the figures in the paper.
Note: we strongly recommend to run 04a_LogRatios-Taxa.R
, the scCODA/AGP part of10_DA-analysis/DA_analysis_run.ipynb
and the scCODA part of 10_DA-analysis/shared_ASV_run.ipynb
on a high performance computing cluster.
A bash script to execute the logratio analysis is provided in the corresponding data directory. Code for executing the differential abundance analysis in a distributed fashion is provided in the data folder.
Script | Paper figure(s) | Short description |
---|---|---|
01_TaxonomicTree.R |
Fig1A, FigS2 | Taxonomic tree of all ASVs inferred across datasets |
02_LogRatio-FirmBact.R |
Fig2, FigS5 | Log-ratio of Firmicutes:Bacteroidota abundance in healthy vs IBS samples |
03_Heatmaps.R |
Fig3A, FigS6A | Heatmap of microbial families relative abundances |
04a_LogRatios-Taxa.R |
Fig3B, FigS6B | Compute log-ratio between all combinations of microbial families, and save the sample x log-ratio dataframe (to be provided as input for UMAP) |
04b_UMAP.R |
Fig3B, FigS6B | UMAP of log-ratios between microbial families across datasets |
05_Common-ASVs.R |
Fig5A | Find how many ASVs are identical across datasets (expectation is to find common ASVs between datasets that amplified the same variable regions) |
06_QCplot.R |
FigS1 | Plot number of reads per sample before/after quality filtering with DADA2 preprocessing |
07_RelativAbund.R |
FigS3 | Plot relative abundance of 5 main phyla across datasets |
08_AlphaDiversity.R |
FigS4 | Shannon and Simpson α-diversity indexes in healthy vs IBS samples |
09_PCoA-BrayCurtis-BigDatasets.R |
FigS7 | Compute Bray-Curtis dissimilarity in AGP, Pozuelo and Hugerth datasets (3 biggest datasets) and perform PCoA |
10_DA-analysis/sccoda_reference_finding.ipynb |
- | Find a suited reference taxon for running scCODA in the other scripts of this chapter |
10_DA-analysis/DA_analysis_run.ipynb |
- | Differential abundance analysis of all datasets individually (run models and create intermediate results) |
10_DA-analysis/DA_analysis_individual_data.ipynb |
Fig4, FigS8, FigS9, Table2, TableS5 | Differential abundance analysis of all datasets individually (result analysis) |
10_DA-analysis/shared_ASV_run.ipynb |
- | Differential abundance analysis of shared ASVs between Nagel and Pozuelo datasets (preparation for next chapter) |
11_shared-classification-analysis/11a_shared_classification.R |
- | Classification analysis of shared ASVs between Nagel and Pozuelo datasets |
11_shared-classification-analysis/11b_additional_plots.R |
- | Create supporting figures for the shared ASV analysis |
11_shared-classification-analysis/11c_combine_plots.R |
Fig5 | Combine the figures from the previous scripts in the folder |
12_composiitonal_mean_test.R |
Employ compositional mean test for IBS/healthy within datasets and similarity across datasets. |