Skip to content

Commit

Permalink
Integrate survey paper into related work section
Browse files Browse the repository at this point in the history
waxlamp committed Sep 18, 2020
1 parent ad27c1e commit 29c3ea6
Showing 2 changed files with 147 additions and 37 deletions.
90 changes: 90 additions & 0 deletions joss/paper.bib
Original file line number Diff line number Diff line change
@@ -104,3 +104,93 @@ @article{armitage:2015
abstract = {The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40\% missing data could be truly missing. The range between 40 and 70\% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros.},
year = {2015}
}

@article{spicer:2017,
title={Navigating freely-available software tools for metabolomics analysis},
volume={13},
DOI={10.1007/s11306-017-1242-7},
number={9},
journal={Metabolomics},
author={Spicer, Rachel and Salek, Reza M. and Moreno, Pablo and Cañueto, Daniel and Steinbeck, Christoph},
year={2017},
}

@article{pluskal:2010,
title={MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data},
volume={11},
DOI={10.1186/1471-2105-11-395},
number={1},
journal={BMC Bioinformatics},
author={Pluskal, Tomáš and Castillo, Sandra and Villar-Briones, Alejandro and Orešič, Matej},
year={2010},
}

@article{fernandez:2014,
title={An R package to analyse LC/MS metabolomic data: MAIT (Metabolite Automatic Identification Toolkit)},
volume={30},
DOI={10.1093/bioinformatics/btu136},
number={13},
journal={Bioinformatics},
author={Fernández-Albert, Francesc and Llorach, Rafael and Andrés-Lacueva, Cristina and Perera, Alexandre},
year={2014},
pages={1937–1939},
}

@article{melamud:2010,
title={Metabolomic Analysis and Visualization Engine for LC−MS Data},
volume={82},
DOI={10.1021/ac1021166},
number={23},
journal={Analytical Chemistry},
author={Melamud, Eugene and Vastag, Livia and Rabinowitz, Joshua D.},
year={2010},
pages={9818–9826},
}

@article{clasquin:2012,
title={LC-MS Data Processing with MAVEN: A Metabolomic Analysis and Visualization Engine},
DOI={10.1002/0471250953.bi1411s37},
journal={Current Protocols in Bioinformatics},
author={Clasquin, Michelle F. and Melamud, Eugene and Rabinowitz, Joshua D.},
year={2012},
}

@article{giacomoni:2014,
title={Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics},
volume={31},
DOI={10.1093/bioinformatics/btu813},
number={9}, journal={Bioinformatics},
author={Giacomoni, F. and Corguille, G. Le and Monsoor, M. and Landi, M. and Pericard, P. and Petera, M. and Duperier, C. and Tremblay-Franco, M. and Martin, J.-F. and Jacob, D. and et al.},
year={2014},
pages={1493–1495},
}

@article{davidson:2016,
title={Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data},
volume={5},
DOI={10.1186/s13742-016-0115-8},
number={1},
journal={GigaScience},
author={Davidson, Robert L. and Weber, Ralf J. M. and Liu, Haoyu and Sharma-Oates, Archana and Viant, Mark R.},
year={2016},
}

@article{chong:2020,
title={Using MetaboAnalyst 4.0 for Metabolomics Data Analysis, Interpretation, and Integration with Other Omics Data},
DOI={10.1007/978-1-0716-0239-3_17},
journal={Computational Methods and Data Analysis for Metabolomics Methods in Molecular Biology},
author={Chong, Jasmine and Xia, Jianguo},
year={2020},
pages={337–360},
}

@article{tautenhahn:2012,
title={XCMS Online: A Web-Based Platform to Process Untargeted Metabolomic Data},
volume={84},
DOI={10.1021/ac300698c},
number={11},
journal={Analytical Chemistry},
author={Tautenhahn, Ralf and Patti, Gary J. and Rinehart, Duane and Siuzdak, Gary},
year={2012},
pages={5035–5039},
}
94 changes: 57 additions & 37 deletions joss/paper.md
Original file line number Diff line number Diff line change
@@ -348,45 +348,65 @@ analysis methods.

# Related Work

One of the most commonly used tools for the analysis of metabolomics data is
MetaboAnalyst, initially released in 2009 [@xia:2009] and currently on version 4
[@chong:2018]. MetaboAnalyst has a wide range of capabilities including data
processing, statistical analysis and pathway enrichment analyses.

The initial motivation for the development of Viime was to readily ingest,
integrate and analyze metabolomics data from multiple platforms--a unique
capability compared with MetaboAnalyst. Data ingestion in Viime is highly
interactive, enabling data files with different formats to be uploaded and
formatted within the application. Separate datasets can be individually
optimized prior to integration. For example, the optimal normalization,
transformation and scaling parameters used for metabolomics data from NMR could
be very different from those required for mass spectrometry data. In Viime, each
dataset is processed independently, then integrated using three different
alogorithms, including simple data fusion (i.e. concatenation), mid-level data
fusion, and multi-block data fusion. The resulting fused datasets offer expanded
metabolome coverage, enabling an analysis of the correlated behavior of
metabolites detected by different platforms.
New analytical approaches to effectively measure more and more of the metabolome
are continually being developed. The data produced from these different
approaches requires different handling in order to transform the data into
useful biological information. Recently, Spicer et al. [@spicer:2017] reviewed
the most popular freely-available software tools for metabolomics analysis.
Based on their intended functionality the tools were classified into the
following five groups: pre-processing, annotation, post-processing, statistical
analysis, and workflows. Pre-processing and annotation tools are often very
specific to the type, make, and model of the analytical instrument used to
collect the data and therefore require specific tools. Once these steps are
carried out, more general workflow tools can be used to complete the analysis.
The intent of Viime package is to pick up the workflow after preprocessing and
annotation and go all the way through statistical analysis and visualization.
Note that annotation is not required and data can be analyzed that is
unannotated or incompletely annotated.

Spicer et al., briefly described seven popular metabolomics workflow packages
that met a threshold of at least 50 citations on Web of Science (as of August 2016)
or were reported in a recent survey of the Metabolomics Society. The
packages MZmine [@pluskal:2010] and MAIT [@fernandez:2014] specifically focus on
the analysis of mass-spectrometry data. The MAVEN package [@melamud:2010;
@clasquin:2012] is focused on isotope tracer studies. The Workflow4Metabolomics
[@giacomoni:2014] and Galaxy-M [@davidson:2016] packages are built upon the
Galaxy web-based platform and are composed of various modules and workflows.
Among the most well-known metabolomics workflow tools are MetaboAnalyst
[@chong:2020] and XCMS Online [@tautenhahn:2012]. These are both workflow tools
which include MS spectral processing and have statistical analyses and
visualization tools that are generally similar to Viime.

An exhaustive feature comparison of with these other platforms is beyond the
scope of this paper, but a major distinguishing feature of Viime is its emphasis
on ease of use and interactivity. Only XCMS and MetaboAnalyst are simple,
readily accessible web applications that require no existing package (e.g. R),
downloads or connection to the Galaxy platform. The unique user interactivity in
Viime starts with the ability to simply drag and drop CSV or Excel files and
interactively assign the sample identifiers, comparison groups, metadata, and
metabolites. Dynamic visualization of the PCA scores and loadings plots with
different types of data (e.g. NMR, LC-MS, and GC-MS) and data treatments (e.g.
normalization, scaling and transformation) aids in selecting the optimal data
treatment. Viime also enables integration between different data modalities,
offering simple (i.e., concatenative), mid-level, and multi-block data fusion
approaches. The resulting fused datasets offer expanded metabolome coverage,
enabling an analysis of the correlated behavior of metabolites detected by
different platforms.

Viime offers another value-added feature during data ingestion: imputation of
missing data. MetaboAnalyst replaces all missing values with 1/5 of the positive
values of the corresponding column. Viime uses a more sophisticated imputation
strategy [@armitage:2015], heuristically classifying missing data as Missing Not
At Random (MNAR) or Missing Completely At Random (MCAR). For MNAR data, the user
can choose to replace the values with either zeros or half of the minimum value
of that variable, while the MCAR options include imputation by Random Forest,
K-Nearest Neighbors, the mean value, or the median value.

Finally, Viime includes powerful interactive data manipulation and visualization
tools, improving upon tools such as MetaboAnalyst. Both platforms enable a
range of univariate and multivariate analyses. In Viime, the data table for both
Wilcoxon and ANOVA yield p-values (including Tukey post-hoc values for ANOVA).
An interactive table enables the selection of specific metabolites for further
visualization based on user choice (e.g. to evaluate metabolites from a specific
pathway) or based on p-values, only including metabolites showing a
statistically significant difference between the groups. Viime's resulting
visualizations, such as heatmaps, volcano plots, and network correlation
diagrams, are therefore much more useful and interpretable as the
non-informative metabolites have been removed.
missing data. Viime uses a sophisticated imputation strategy [@armitage:2015],
heuristically classifying missing data as Missing Not At Random (MNAR) or
Missing Completely At Random (MCAR). For MNAR data, the user can choose to
replace the values with either zeros or half of the minimum value of that
variable, while the MCAR options include imputation by Random Forest, K-Nearest
Neighbors, the mean value, or the median value.

finally, Visualization of heatmaps, volcano plots, and network correlation
diagrams, which all offer state-of-the-art web-based interactivity, can all be
adjusted to include user selected subsets of data based on statistical
significance or the particular interest of the investigator. This philosophy of
interactivity will drive further development in viime as the platform expands
its capabilities for further types of data analyses and visualization.

# Acknowledgments

0 comments on commit 29c3ea6

Please sign in to comment.