Skip to content

Commit

Permalink
Merge pull request #43 from rformassspectrometry/jomain
Browse files Browse the repository at this point in the history
feat: add spectraSampleIndex function
  • Loading branch information
jorainer authored Apr 16, 2024
2 parents 1951d35 + 4532d99 commit 567ddfa
Show file tree
Hide file tree
Showing 13 changed files with 267 additions and 66 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/check-bioc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ jobs:
matrix:
config:
- { os: ubuntu-latest, r: 'devel', bioc: '3.19', cont: "bioconductor/bioconductor_docker:devel", rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest" }
- { os: macOS-latest, r: 'devel', bioc: '3.19'}
- { os: windows-latest, r: 'devel', bioc: '3.19'}
- { os: macOS-latest, r: 'next', bioc: '3.19'}
- { os: windows-latest, r: 'next', bioc: '3.19'}
env:
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
RSPM: ${{ matrix.config.rspm }}
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: MsExperiment
Title: Infrastructure for Mass Spectrometry Experiments
Version: 1.5.4
Version: 1.5.5
Description: Infrastructure to store and manage all aspects related to
a complete proteomics or metabolomics mass spectrometry (MS)
experiment. The MsExperiment package provides light-weight and
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export(otherData)
export(qdata)
export(readMsExperiment)
export(sampleData)
export(spectraSampleIndex)
exportClasses(MsExperiment)
exportClasses(MsExperimentFiles)
exportMethods("[")
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# MsExperiment 1.5

## MsExperiment 1.5.5

- Add `spectraSampleIndex()` function.

## MsExperiment 1.5.4

- Fix missing export of `filterSpectra`.
Expand Down
3 changes: 2 additions & 1 deletion R/MsExperiment-db.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ NULL
#'
#' For `MsExperiment` objects with their MS data represented by a `Spectra`
#' object that use a `MsBackendSql` backend, its sample annotations can be
#' written to the backend's SQL database with the `dbWriteSampleData` function.
#' written to the backend's SQL database with the `dbWriteSampleData()`
#' function.
#' The content of the object's `[sampleData()]` (as well as eventually present
#' *linking* between samples and spectra) will be stored in two separate
#' database tables *sample_data* and *sample_to_msms_spectrum* in the same
Expand Down
61 changes: 61 additions & 0 deletions R/MsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -428,3 +428,64 @@ readMsExperiment <- function(spectraFiles = character(),
x@sampleDataLinks[["spectra"]] <- sdl
x
}

#' @export
#'
#' @rdname MsExperiment
spectraSampleIndex <- function(x, duplicates = c("first", "keep")) {
duplicates <- match.arg(duplicates)
if (duplicates == "first") {
.spectra_sample_index_first(x@sampleDataLinks[["spectra"]],
length(x@spectra))
} else {
.spectra_sample_index_all(x@sampleDataLinks[["spectra"]],
length(x@spectra))
}
}

#' Return an `integer` vector with the sample index for each spectrum. If
#' a spectrum is assigned to more than one sample, the index of the first
#' is returned and a warning shown. For spectra without a sample assignment
#' `NA_integer_` is returned.
#'
#' @param x 2 column `matrix`, first column being sample index, second
#' spectra index.
#'
#' @param nspectra length of the object's `Spectra` object
#'
#' @return `integer` of length equal to the number of spectra.
#'
#' @noRd
.spectra_sample_index_first <- function(x, nspectra) {
if (length(x)) {
if (anyDuplicated(x[, 2L])) {
warning("Found at least one spectrum that is assigned to more ",
"than one sample. Will return the first sample for these. ",
"Consider using 'duplicates = \"all\"' to retrieve all ",
"mappings.")
res <- x[match(seq_len(nspectra), x[, 2L]), 1L]
} else {
res <- rep(NA_integer_, nspectra)
res[x[, 2L]] <- x[, 1L]
}
} else res <- rep(NA_integer_, nspectra)
res
}

#' Return an `list` of integer vectors with the sample indices for each
#' spectrum. For spectra without a sample assignment `integer()` is returned.
#'
#' @param x 2 column `matrix`, first column being sample index, second
#' spectra index.
#'
#' @param nspectra length of the object's `Spectra` object
#'
#' @return `list` of length equal to the number of spectra.
#'
#' @noRd
.spectra_sample_index_all <- function(x, nspectra) {
if (length(x)) {
res <- split(x[, 1L], f = factor(x[, 2L], levels = seq_len(nspectra)))
} else res <- replicate(nspectra, integer())
unname(res)
}
82 changes: 54 additions & 28 deletions R/MsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,23 +28,23 @@
#' all relevant information on that sample.
#'
#' - Files to data or annotations. These are stored in the
#' `experimentFiles` slot as an instance of class `MsExperimentFiles`.
#' `@experimentFiles` slot as an instance of class `MsExperimentFiles`.
#'
#' - General metadata about the experiment, stored as a `list` in the
#' `metadata` slot.
#' `@metadata` slot.
#'
#' - Mass spectrometry data. Sectra and their metadata are stored as
#' an `[Spectra()]` object in the `spectra` slot. Chromatographic data
#' is not yet supported but will be stored as a `Chromatograms()`
#' object in the `chromatorgrams` slot.
#' object in the `@chromatorgrams` slot.
#'
#' - Quantification data is stored as `QFeatures` or
#' `SummarizedExperiment` objects in the `qdata` slot and can be accessed or
#' replaced with the `qdata` or `qdata<-` functions, respectively.
#' `SummarizedExperiment` objects in the `@qdata` slot and can be accessed or
#' replaced with the `qdata()` or `qdata<-` functions, respectively.
#'
#' - Any additional data, be it other spectra data, or proteomics
#' identification data (i.e peptide-spectrum matches defined as
#' `PSM()` objects) can be added as elements to the list stored in
#' `PSM` objects) can be added as elements to the list stored in
#' the `otherData` slot.
#'
#' The *length* of a `MsExperiment` is defined by the number of samples (i.e.
Expand All @@ -70,24 +70,40 @@
#' Data from an `MsExperiment` object can be accessed with the dedicated
#' accessor functions:
#'
#' - `experimentFiles`, `experimentFiles<-`: gets or sets experiment files.
#' - `experimentFiles()`, `experimentFiles<-`: gets or sets experiment files.
#'
#' - `length`: get the *length* of the object which represents the number of
#' - `length()`: get the *length* of the object which represents the number of
#' samples availble in the object's `sampleData`.
#'
#' - `metadata`, `metadata<-`: gets or sets the object's metadata.
#' - `metadata()`, `metadata<-`: gets or sets the object's metadata.
#'
#' - `sampleData`, `sampleData`: gets or sets the object's sample data (i.e. a
#' `DataFrame` containing sample descriptions).
#' - `sampleData()`, `sampleData<-`: gets or sets the object's sample data
#' (i.e. a `DataFrame` containing sample descriptions).
#'
#' - `spectra`, `spectra<-`: gets or sets spectra data. `spectra` returns a
#' - `spectra()`, `spectra<-`: gets or sets spectra data. `spectra()` returns a
#' [Spectra()] object, `spectra<-` takes a `Spectra` data as input and returns
#' the updated `MsExperiment`.
#'
#' - `qdata`, `qdata<-`: gets or sets the quantification data, which can be a
#' - `spectraSampleIndex()`: depending on parameter `duplicates` it returns
#' either an `integer` (`duplicates = "first"`, the default) or a `list`
#' (`duplicates = "keep"`) of length equal to the number of spectra within
#' the object with the indices of the sample(s) (in `sampleData()`) a
#' spectrum is assigned to. With `duplicates = "first"`, an `integer` with
#' the index is returned for each spectrum. If a spectrum was assigned to
#' more than one sample a warning is shown and only the first sample index
#' is returned for that spectrum. For `duplicates = "keep"`, assignments are
#' returned as a `list` of `integer` vectors, each element being the
#' index(es) of the sample(s) a spectrum is assigned to. For spectra that are
#' not linked to any sample an `NA_integer_` is returned as index for
#' `duplicates = "first"` and an empty integer (`integer()`) for
#' `duplicates = "keep"`.
#' Note that the default `duplicates = "first"` will work in almost all use
#' cases, as generally, a spectrum will be assigned to a single sample.
#'
#' - `qdata()`, `qdata<-`: gets or sets the quantification data, which can be a
#' `QFeatures` or `SummarizedExperiment`.
#'
#' - `otherData` , `otherData<-`: gets or sets the addition data
#' - `otherData()` , `otherData<-`: gets or sets the addition data
#' types, stored as a `List` in the object's `otherData` slot.
#'
#' @section Linking sample data to other experimental data:
Expand Down Expand Up @@ -117,12 +133,14 @@
#' `spectra` slot.
#'
#' Links between sample data rows and any other data element are stored as
#' `integer` matrices within the `sampleDataLinks` slot of the object (see also
#' the vignette for examples and illustrations). Such links can be defined/added
#' with the `linkSampleData` function which adds a relationship between rows in
#' `sampleData` to elements in any other data within the `MsExperiment` that
#' are specified with parameter `with`. `linkSampleData` supports two different
#' ways to define the link:
#' `integer` matrices within the `@sampleDataLinks` slot of the object (see also
#' the vignette for examples and illustrations). The first column of a matrix
#' is always the index of the sample, and the second column the index of the
#' element that is linked to that sample, with one row per element.
#' Links can be defined/added with the `linkSampleData()` function which adds
#' a relationship between rows in `sampleData` to elements in any other data
#' within the `MsExperiment` that are specified with parameter `with`.
#' `linkSampleData()` supports two different ways to define the link:
#'
#' - Parameter `with` defines the data to which the link should be established.
#' To link samples to raw data files that would for example be available as a
Expand All @@ -148,6 +166,10 @@
#' Note that `linkSampleData` will **replace** a previously existing link to the
#' same data element.
#'
#' - `spectraSampleIndex()` is a convenience function that extracts for each
#' spectrum in the object's `spectra()` the index of the sample it is
#' associated with (see function's help above for more information).
#'
#' @section Subsetting and filtering:
#'
#' - `[`: `MsExperiment` objects can be subset **by samples** with `[i]`
Expand All @@ -161,12 +183,12 @@
#' arbitrary order is supported.
#' See the vignette for details and examples.
#'
#' - `filterSpectra`: subsets the `Spectra` within an `MsExperiment` using a
#' - `filterSpectra()`: subsets the `Spectra` within an `MsExperiment` using a
#' provided filter function (parameter `filter`). Parameters for the filter
#' function can be passed with parameter `...`. Any of the filter functions
#' of a [Spectra()] object can be passed with parameter `filter`. Possibly
#' present relationships between samples and spectra (*links*, see also
#' `linkSampleData`) are updated. Filtering affects only the spectra data
#' `linkSampleData()`) are updated. Filtering affects only the spectra data
#' of the object, none of the other slots and data (e.g. `sampleData`) are
#' modified.
#' The function returns an `MsExperiment` with the filtered `Spectra` object.
Expand All @@ -181,7 +203,7 @@
#' @param experimentFiles [MsExperimentFiles()] defining (external) files
#' to data or annotation.
#'
#' @param filter for `filterSpectra`: any filter function supported by
#' @param filter for `filterSpectra()`: any filter function supported by
#' [Spectra()] to filter the spectra object (such as `filterRt` or
#' `filterMsLevel`). Parameters for the filter function can be passed
#' through `...`.
Expand All @@ -202,25 +224,25 @@
#' @param sampleData `DataFrame` (or `data.frame`) with information on
#' individual samples of the experiment.
#'
#' @param sampleIndex for `linkSampleData`: `integer` with the indices of the
#' @param sampleIndex for `linkSampleData()`: `integer` with the indices of the
#' samples in `sampleData(object)` that should be linked.
#'
#' @param subsetBy for `linkSampleData`: optional `integer(1)` defining the
#' @param subsetBy for `linkSampleData()`: optional `integer(1)` defining the
#' dimension on which the subsetting will occurr on the linked data.
#' Defaults to `subsetBy = 1L` thus subsetting will happen on the first
#' dimension (rows or elements).
#'
#' @param with for `linkSampleData`: `character(1)` defining the data to which
#' @param with for `linkSampleData()`: `character(1)` defining the data to which
#' samples should be linked. See section *Linking sample data to other
#' experimental data* for details.
#'
#' @param withIndex for `linkSampleData`: `integer` with the indices of the
#' @param withIndex for `linkSampleData()`: `integer` with the indices of the
#' elements in `with` to which the samples (specified by `sampleIndex`)
#' should be linked to.
#'
#' @param x an `MsExperiment`.
#'
#' @param ... optional additional parameters. For `filterSpectra`: parameters
#' @param ... optional additional parameters. For `filterSpectra()`: parameters
#' to be passed to the filter function (parameter `filter`).
#'
#' @name MsExperiment
Expand Down Expand Up @@ -292,6 +314,10 @@
#' sampleData(b)
#' experimentFiles(b)$mzML_files
#'
#' ## The `spectraSampleIndex()` function returns, for each spectrum, the
#' ## index in the object's `sampleData` to which it is linked/assigned
#' spectraSampleIndex(mse)
#'
#' ## Subsetting with duplication of n:m sample to data relationships
#' ##
#' ## Both samples were assigned above to one "annotation" file in
Expand Down
2 changes: 1 addition & 1 deletion R/existMsExperimentFiles.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' @param object The `existMsExperimentFiles()` fonction works with
#' @param object The `existMsExperimentFiles()` function works with
#' either an instance of `MsExperimentFiles` or `MsExperiment`.
#'
#' @export
Expand Down
Loading

0 comments on commit 567ddfa

Please sign in to comment.