Integrate survey paper into related work section

girder · Sep 18, 2020 · 29c3ea6 · 29c3ea6
1 parent ad27c1e
commit 29c3ea6
Showing 2 changed files with 147 additions and 37 deletions.
diff --git a/joss/paper.bib b/joss/paper.bib
@@ -104,3 +104,93 @@ @article{armitage:2015
 abstract = {The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40\% missing data could be truly missing. The range between 40 and 70\% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros.},
 year = {2015}
 }
+
+@article{spicer:2017,
+  title={Navigating freely-available software tools for metabolomics analysis},
+  volume={13},
+  DOI={10.1007/s11306-017-1242-7},
+  number={9},
+  journal={Metabolomics},
+  author={Spicer, Rachel and Salek, Reza M. and Moreno, Pablo and Cañueto, Daniel and Steinbeck, Christoph},
+  year={2017},
+}
+
+@article{pluskal:2010,
+  title={MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data},
+  volume={11},
+  DOI={10.1186/1471-2105-11-395},
+  number={1},
+  journal={BMC Bioinformatics},
+  author={Pluskal, Tomáš and Castillo, Sandra and Villar-Briones, Alejandro and Orešič, Matej},
+  year={2010},
+}
+
+@article{fernandez:2014,
+  title={An R package to analyse LC/MS metabolomic data: MAIT (Metabolite Automatic Identification Toolkit)},
+  volume={30},
+  DOI={10.1093/bioinformatics/btu136},
+  number={13},
+  journal={Bioinformatics},
+  author={Fernández-Albert, Francesc and Llorach, Rafael and Andrés-Lacueva, Cristina and Perera, Alexandre},
+  year={2014},
+  pages={1937–1939},
+}
+
+@article{melamud:2010,
+  title={Metabolomic Analysis and Visualization Engine for LC−MS Data},
+  volume={82},
+  DOI={10.1021/ac1021166},
+  number={23},
+  journal={Analytical Chemistry},
+  author={Melamud, Eugene and Vastag, Livia and Rabinowitz, Joshua D.},
+  year={2010},
+  pages={9818–9826},
+}
+
+@article{clasquin:2012,
+  title={LC-MS Data Processing with MAVEN: A Metabolomic Analysis and Visualization Engine},
+  DOI={10.1002/0471250953.bi1411s37},
+  journal={Current Protocols in Bioinformatics},
+  author={Clasquin, Michelle F. and Melamud, Eugene and Rabinowitz, Joshua D.},
+  year={2012},
+}
+
+@article{giacomoni:2014,
+  title={Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics},
+  volume={31},
+  DOI={10.1093/bioinformatics/btu813},
+  number={9}, journal={Bioinformatics},
+  author={Giacomoni, F. and Corguille, G. Le and Monsoor, M. and Landi, M. and Pericard, P. and Petera, M. and Duperier, C. and Tremblay-Franco, M. and Martin, J.-F. and Jacob, D. and et al.},
+  year={2014},
+  pages={1493–1495},
+}
+
+@article{davidson:2016,
+  title={Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data},
+  volume={5},
+  DOI={10.1186/s13742-016-0115-8},
+  number={1},
+  journal={GigaScience},
+  author={Davidson, Robert L. and Weber, Ralf J. M. and Liu, Haoyu and Sharma-Oates, Archana and Viant, Mark R.},
+  year={2016},
+}
+
+@article{chong:2020,
+  title={Using MetaboAnalyst 4.0 for Metabolomics Data Analysis, Interpretation, and Integration with Other Omics Data},
+  DOI={10.1007/978-1-0716-0239-3_17},
+  journal={Computational Methods and Data Analysis for Metabolomics Methods in Molecular Biology},
+  author={Chong, Jasmine and Xia, Jianguo},
+  year={2020},
+  pages={337–360},
+}
+
+@article{tautenhahn:2012,
+  title={XCMS Online: A Web-Based Platform to Process Untargeted Metabolomic Data},
+  volume={84},
+  DOI={10.1021/ac300698c},
+  number={11},
+  journal={Analytical Chemistry},
+  author={Tautenhahn, Ralf and Patti, Gary J. and Rinehart, Duane and Siuzdak, Gary},
+  year={2012},
+  pages={5035–5039},
+}
diff --git a/joss/paper.md b/joss/paper.md
@@ -348,45 +348,65 @@ analysis methods.
 
 # Related Work
 
-One of the most commonly used tools for the analysis of metabolomics data is
-MetaboAnalyst, initially released in 2009 [@xia:2009] and currently on version 4
-[@chong:2018]. MetaboAnalyst has a wide range of capabilities including data
-processing, statistical analysis and pathway enrichment analyses.
-
-The initial motivation for the development of Viime was to readily ingest,
-integrate and analyze metabolomics data from multiple platforms--a unique
-capability compared with MetaboAnalyst. Data ingestion in Viime is highly
-interactive, enabling data files with different formats to be uploaded and
-formatted within the application. Separate datasets can be individually
-optimized prior to integration. For example, the optimal normalization,
-transformation and scaling parameters used for metabolomics data from NMR could
-be very different from those required for mass spectrometry data. In Viime, each
-dataset is processed independently, then integrated using three different
-alogorithms, including simple data fusion (i.e. concatenation), mid-level data
-fusion, and multi-block data fusion. The resulting fused datasets offer expanded
-metabolome coverage, enabling an analysis of the correlated behavior of
-metabolites detected by different platforms.
+New analytical approaches to effectively measure more and more of the metabolome
+are continually being developed. The data produced from these different
+approaches requires different handling in order to transform the data into
+useful biological information. Recently, Spicer et al. [@spicer:2017] reviewed
+the most popular freely-available software tools for metabolomics analysis.
+Based on their intended functionality the tools were classified into the
+following five groups: pre-processing, annotation, post-processing, statistical
+analysis, and workflows. Pre-processing and annotation tools are often very
+specific to the type, make, and model of the analytical instrument used to
+collect the data and therefore require specific tools. Once these steps are
+carried out, more general workflow tools can be used to complete the analysis.
+The intent of Viime package is to pick up the workflow after preprocessing and
+annotation and go all the way through statistical analysis and visualization.
+Note that annotation is not required and data can be analyzed that is
+unannotated or incompletely annotated.
+
+Spicer et al., briefly described seven popular metabolomics workflow packages
+that met a threshold of at least 50 citations on Web of Science (as of August 2016)
+or were reported in a recent survey of the Metabolomics Society. The
+packages MZmine [@pluskal:2010] and MAIT [@fernandez:2014] specifically focus on
+the analysis of mass-spectrometry data. The MAVEN package [@melamud:2010;
+@clasquin:2012] is focused on isotope tracer studies. The Workflow4Metabolomics
+[@giacomoni:2014] and Galaxy-M [@davidson:2016] packages are built upon the
+Galaxy web-based platform and are composed of various modules and workflows.
+Among the most well-known metabolomics workflow tools are MetaboAnalyst
+[@chong:2020] and XCMS Online [@tautenhahn:2012]. These are both workflow tools
+which include MS spectral processing and have statistical analyses and
+visualization tools that are generally similar to Viime.
+
+An exhaustive feature comparison of with these other platforms is beyond the
+scope of this paper, but a major distinguishing feature of Viime is its emphasis
+on ease of use and interactivity. Only XCMS and MetaboAnalyst are simple,
+readily accessible web applications that require no existing package (e.g. R),
+downloads or connection to the Galaxy platform. The unique user interactivity in
+Viime starts with the ability to simply drag and drop CSV or Excel files and
+interactively assign the sample identifiers, comparison groups, metadata, and
+metabolites. Dynamic visualization of the PCA scores and loadings plots with
+different types of data (e.g. NMR, LC-MS, and GC-MS) and data treatments (e.g.
+normalization, scaling and transformation) aids in selecting the optimal data
+treatment. Viime also enables integration between different data modalities,
+offering simple (i.e., concatenative), mid-level, and multi-block data fusion
+approaches. The resulting fused datasets offer expanded metabolome coverage,
+enabling an analysis of the correlated behavior of metabolites detected by
+different platforms.
 
 Viime offers another value-added feature during data ingestion: imputation of
-missing data. MetaboAnalyst replaces all missing values with 1/5 of the positive
-values of the corresponding column. Viime uses a more sophisticated imputation
-strategy [@armitage:2015], heuristically classifying missing data as Missing Not
-At Random (MNAR) or Missing Completely At Random (MCAR). For MNAR data, the user
-can choose to replace the values with either zeros or half of the minimum value
-of that variable, while the MCAR options include imputation by Random Forest,
-K-Nearest Neighbors, the mean value, or the median value.
-
-Finally, Viime includes powerful interactive data manipulation and visualization
-tools, improving upon tools such as MetaboAnalyst.  Both platforms enable a
-range of univariate and multivariate analyses. In Viime, the data table for both
-Wilcoxon and ANOVA yield p-values (including Tukey post-hoc values for ANOVA).
-An interactive table enables the selection of specific metabolites for further
-visualization based on user choice (e.g. to evaluate metabolites from a specific
-pathway) or based on p-values, only including metabolites showing a
-statistically significant difference between the groups. Viime's resulting
-visualizations, such as heatmaps, volcano plots, and network correlation
-diagrams, are therefore much more useful and interpretable as the
-non-informative metabolites have been removed.
+missing data. Viime uses a sophisticated imputation strategy [@armitage:2015],
+heuristically classifying missing data as Missing Not At Random (MNAR) or
+Missing Completely At Random (MCAR). For MNAR data, the user can choose to
+replace the values with either zeros or half of the minimum value of that
+variable, while the MCAR options include imputation by Random Forest, K-Nearest
+Neighbors, the mean value, or the median value.
+
+finally, Visualization of heatmaps, volcano plots, and network correlation
+diagrams, which all offer state-of-the-art web-based interactivity, can all be
+adjusted to include user selected subsets of data based on statistical
+significance or the particular interest of the investigator. This philosophy of
+interactivity will drive further development in viime as the platform expands
+its capabilities for further types of data analyses and visualization.
 
 # Acknowledgments