The R package SplineOmics
finds the significant features (hits) of
time-series -omics data by using splines and limma
for hypothesis
testing. It then clusters the hits based on the spline shape while
showing all results in summary HTML reports.
The graphical abstract below shows the full workflow streamlined by
SplineOmics
:
- 📘 Introduction
- 🔧 Installation
▶️ Usage- 📦 Dependencies
- 📚 Further Reading
- ❓ Getting Help
- 🤝 Contributing
- 💬 Feedback
- 📜 License
- 🎓 Citation
- 🌟 Contributors
- 🙏 Acknowledgements
Welcome to SplineOmics
, an R package designed to streamline the
analysis of -omics time-series data, followed by automated HTML report
generation.
If you have -omics data over time, the package will help you to run
limma
with splines, decide on which parameters to use, perform the
clustering, run GSEA and show result plots in HTML reports. Any
time-series data that is a valid input to the limma
package is also a
valid input to the SplineOmics
package (such as transcriptomics,
proteomics, phosphoproteomics, metabolomics, glycan fractional
abundances, etc.).
-
Data: A data matrix where each row is a feature (e.g., protein, metabolite, etc.) and each column is a sample taken at a specific time. The data must have no NA values, should have normally distributed features and no dependence between the samples.
-
Meta: A table with metadata on the columns/samples of the data matrix (e.g., batch, time point, etc.)
-
Annotation (optional): A table with identifiers on the rows/features of the data matrix (e.g., gene and protein name).
With SplineOmics
, you can:
-
Automatically perform exploratory data analysis:
The
explore_data()
function generates an HTML report, containing various plots, such as density, PCA, and correlation heatmap plots (example report). -
Explore various limma splines hyperparameters:
Test combinations of hyperparameters, such as different datasets,
limma
design formulas, degrees of freedom, p-value thresholds, etc., using thescreen_limma_hyperparams()
function (example report (along with the encoding)). -
Perform limma spline analysis:
Use the
run_limma_splines()
function to perform thelimma
analysis with splines once the optimal hyperparameters are identified (example report). -
Cluster significant features:
Cluster the significant features (hits) identified in the spline analysis with the
cluster_hits()
function (example report). -
Run GSEA with clustered hits:
Perform gene set enrichment analysis (GSEA) using the clustered hits with the
create_gsea_report()
function (example report).
Follow the steps below to install the SplineOmics
package from the
GitHub repository into your R environment.
- Ensure R is installed on your system. If not, download and install it from CRAN.
- RStudio is recommended for a more user-friendly experience with R. Download and install RStudio from posit.co.
Note for Windows Users:
Note that some installation paths potentially are not writable on Windows. Therefore, it can be necessary to set up a library path and use that path for the installations:
# Define the custom library path and expand the tilde (~)
custom_lib_path <- path.expand("~/Rlibs")
# Create the directory if it doesn't exist
if (!dir.exists(custom_lib_path)) {
dir.create(
custom_lib_path,
showWarnings = FALSE,
recursive = TRUE
)
}
# Set the library path to include the new directory
.libPaths(c(custom_lib_path, .libPaths()))
# Check if the new library path is added successfully
if (custom_lib_path %in% .libPaths()) {
message("Library path set to: ", custom_lib_path)
} else {
stop("Failed to set library path.")
}
Alternatively, you can run RStudio
as administrator once for the
installation (which is however generally not recommended, because it is
a security risk).
-
Open RStudio or your R console.
-
Install
BiocManager
for Bioconductor dependencies (if not already installed)
install.packages(
"BiocManager",
# lib = custom_lib_path
)
- Install Bioconductor dependencies separately using
BiocManager
library(BiocManager) # load the BiocManager package
BiocManager::install(c(
"ComplexHeatmap",
"limma"
),
force = TRUE,
# lib = custom_lib_path
)
- Install the
remotes
package for GitHub downloads (if not already installed)
install.packages(
"remotes",
# lib = custom_lib_path
)
- Install the
SplineOmics
package from GitHub and all its non-Bioconductor dependencies, usingremotes
library(remotes) # load the remotes package
remotes::install_github(
"csbg/SplineOmics", # GitHub repository
ref = "0.1.0", # Specify the tag to install
dependencies = TRUE, # Install all dependencies
force = TRUE, # Force reinstallation
upgrade = "always", # Always upgrade dependencies
# lib = custom_lib_path
)
- Verify the installation of the
SplineOmics
package
if ("SplineOmics" %in% rownames(installed.packages())) {
message("SplineOmics installed successfully.")
} else {
message("SplineOmics installation failed.")
}
If you encounter errors related to dependencies or package versions during installation, try updating your R and RStudio to the latest versions and repeat the installation steps.
For issues specifically related to the SplineOmics
package, check the
Issues section of the
GitHub repository for similar problems or to post a new issue.
Alternatively, you can run your analysis in a Docker
container. The
underlying Docker
image encapsulates the SplineOmics
package
together with the necessary environment and dependencies. This ensures
higher levels of reproducibility because the analysis is carried out in
a consistent environment, independent of the operating system and its
custom configurations.
Please note that you must have the Docker Engine
installed on your
machine. For instructions on how to install it, consult the official
Docker Engine installation
guide.
More information about Docker
containers can be found on the official
Docker page.
For instructions on downloading the image of the SplineOmics
package
and running the container, please refer to the Docker
instructions.
If you face “permission denied” issues on Linux distributions, check this vignette.
This tutorial covers a real CHO cell time-series proteomics example from start to end.
To open an R Markdown file of the tutorial in RStudio
, run:
library(SplineOmics)
open_tutorial()
To open an R Markdown file in RStudio
containing a template for
your own analysis, run:
library(SplineOmics)
open_template()
A detailed description of all arguments and outputs of all the functions in the package (exported and internal functions) can be found here.
A quick guide on how to design a limma
design formula can be found
here
An explanation of the three different limma
results is
here
Transcriptomics data must be preprocessed for limma
. You need to
provide an appropriate object, such as a voom
object, in the
rna_seq_data
argument of the SplineOmics
object (see
documentation).
Along with this, the normalized matrix (e.g., the $E
slot of the
voom
object) must be passed to the data
argument. This allows
flexibility in preprocessing; you can use any method you prefer as long
as the final object and matrix are compatible with limma. One way to
preprocess your RNA-seq data is by using the preprocess_rna_seq_data()
function included in the SplineOmics
package (see
documentation).
The glycan fractional abundance data matrix, where each row represents a type of glycan and the columns correspond to timepoints, must be transformed before analysis. This preprocessing step is essential due to the compositional nature of the data. In compositional data, an increase in the abundance of one component (glycan) necessarily results in a decrease in others, introducing a dependency among the variables that can bias the analysis. One way to address this issue is by applying the Centered Log Ratio (CLR) transformation to the data with the clr function from the compositions package:
library(compositions)
clr_transformed_data <- clr(data_matrix) # use as SplineOmics input
The results from clr transformed data can be harder to understand and
interpret however. If you prefer ease of interpretation and are fine
that the results contain some artifacts due to the compositional nature
of the data, log2 transform your data instead and use that as input for
the SplineOmics
package.
log2_transformed_data <- log2(data_matrix) # use as SplineOmics input
The SplineOmics
package relies on several other R packages for its
functionality. Below is a list of dependencies that will automatically
be installed along with SplineOmics
. If you already have these
packages installed, ensure they are up to date to avoid any
compatibility issues.
- ComplexHeatmap (>= 2.18.0): For creating complex heatmaps with advanced features.
- base64enc (>= 0.1-3): For encoding/decoding base64.
- dendextend (>= 1.17.1): For extending
dendrogram
objects, allowing for easier manipulation of dendrograms. - dplyr (>= 1.1.4): For data manipulation.
- ggplot2 (>= 3.5.1): For creating elegant data visualizations using the grammar of graphics.
- ggrepel (>= 0.9.5): For better label placement in
ggplot2
. - here (>= 1.0.1): For constructing paths to your project’s files.
- limma (>= 3.58.1): For linear models in microarray and RNA-seq analysis.
- openxlsx (>= 4.2.6.1): For reading, writing, and editing
.xlsx
files. - patchwork (>= 1.2.0): For combining multiple
ggplot
objects into a single plot. - pheatmap (>= 1.0.12): For creating aesthetically pleasing heatmaps.
- progress (>= 1.2.3): For adding progress bars to loops and apply functions.
- purrr (>= 1.0.2): For functional programming tools.
- rlang (>= 1.1.4): For working with core language features of R.
- scales (>= 1.3.0): For scale functions in visualizations.
- svglite (>= 2.1.3): For creating high-quality vector graphics (SVG).
- tibble (>= 3.2.1): For creating tidy data frames.
- tidyr (>= 1.3.1): For tidying data.
- zip (>= 2.3.1): For compressing and combining files into zip archives.
These packages are optional and are only needed for specific functionality:
- edgeR (>= 4.0.16): For preprocessing RNA-seq data in the
preprocess_rna_seq_data()
function. - clusterProfiler (>= 4.10.1): For the
run_gsea()
function (gene set enrichment analysis). - rstudioapi (>= 0.16.0): For the
open_tutorial()
andopen_template()
functions.
- Recommended: R 4.3.3 or higher
For those interested in gaining a deeper understanding of the
methodologies used in the SplineOmics
package, here are some
recommended publications:
-
Splines: To learn more about splines, you can refer to this review.
-
limma: To read about the
limma
R package, you can refer to this publication. -
PCA: To learn more about PCA, download and read this document.
-
Hierarchical clustering: To get information about hierarchical clustering, you can refer to this web article.
If you encounter a bug or have a suggestion for improving the
SplineOmics
package, we encourage you to open an
issue on our GitHub
repository. Before opening a new issue, please check to see if your
question or bug has already been reported by another user. This helps
avoid duplicate reports and ensures that we can address problems
efficiently.
For more detailed questions, discussions, or contributions regarding the
package’s use and development, please refer to the GitHub
Discussions page for
SplineOmics
.
We welcome contributions to the SplineOmics
package! Whether you’re
interested in fixing bugs, adding new features, or improving
documentation, your help is greatly appreciated.
Here’s how you can contribute:
-
Report a Bug or Request a Feature: If you encounter a bug or have an idea for a new feature, please open an issue on our GitHub repository. Before opening a new issue, check to see if the issue has already been reported or the feature requested by another user.
-
Submit a Pull Request: If you’ve developed a bug fix or a new feature that you’d like to share, submit a pull request.
-
Improve Documentation: Good documentation is crucial for any project. If you notice missing or incorrect documentation, please feel free to contribute.
Please adhere to our Code of Conduct in all your interactions with the project.
Thank you for considering contributing to SplineOmics
. Your efforts
are what make the open-source community a fantastic place to learn,
inspire, and create.
We appreciate your feedback! Besides raising issues, you can provide feedback in the following ways:
-
Direct Email: Send your feedback directly to Thomas Rauter.
-
Anonymous Feedback: Use this Google Form to provide anonymous feedback by answering questions.
Your feedback helps us improve the project and address any issues you may encounter.
This package is licensed under the MIT License: LICENSE
© 2024 Thomas Rauter. All rights reserved.
The SplineOmics
package is currently not published in a peer-reviewed
scientific journal or similar outlet. However, if this package helped
you in your work, consider citing this GitHub repository.
To cite this package, you can use the citation information provided in
the CITATION.cff
file.
You can also generate a citation in various formats using the
CITATION.cff
file by visiting the top right of this repo and clicking
on the “Cite this repository” button.
Also, if you like the package, consider giving the GitHub repository a
star. Your support helps us in the continued development and improvement
of SplineOmics
. Thank you for using our package!
- Thomas-Rauter - 🚀 Wrote the package, developed the approach together with VSchaepertoens under guidance from nfortelny and skafdasschaf.
- nfortelny - 🧠 Principal Investigator, provided guidance and support for the overall approach.
- skafdasschaf - 🔧 Helped reviewing code, delivered improvement suggestions and scientific guidance to develop the approach.
- VSchaepertoens - ✨ Developed one internal plotting function, as well as some code for the exploratory data analysis plots, and the overall approach together with Thomas-Rauter.
This work was carried out in the context of the DigiTherapeutX project, which was funded by the Austrian Science Fund (FWF). The research was conducted under the supervision of Prof. Nikolaus Fortelny, who leads the Computational Systems Biology working group at the Paris Lodron University of Salzburg, Austria. You can find more information about Prof. Fortelny’s research group here.