Skip to content

Commit

Permalink
refactor: add parameter duplicates to spectraSampleIndex
Browse files Browse the repository at this point in the history
  • Loading branch information
jorainer committed Apr 16, 2024
1 parent 75816ac commit 4532d99
Show file tree
Hide file tree
Showing 6 changed files with 156 additions and 25 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/check-bioc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ jobs:
matrix:
config:
- { os: ubuntu-latest, r: 'devel', bioc: '3.19', cont: "bioconductor/bioconductor_docker:devel", rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest" }
- { os: macOS-latest, r: 'devel', bioc: '3.19'}
- { os: windows-latest, r: 'devel', bioc: '3.19'}
- { os: macOS-latest, r: 'next', bioc: '3.19'}
- { os: windows-latest, r: 'next', bioc: '3.19'}
env:
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
RSPM: ${{ matrix.config.rspm }}
Expand Down
62 changes: 54 additions & 8 deletions R/MsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -432,14 +432,60 @@ readMsExperiment <- function(spectraFiles = character(),
#' @export
#'
#' @rdname MsExperiment
spectraSampleIndex <- function(x) {
res <- rep(NA_integer_, length(x@spectra))
if (length(x@sampleDataLinks[["spectra"]])) {
if (anyDuplicated(x@sampleDataLinks[["spectra"]][, 2L]))
stop("Can not return a single index for each spectrum. One or ",
"more spectra are linked to more than one sample")
res[x@sampleDataLinks[["spectra"]][, 2L]] <-
x@sampleDataLinks[["spectra"]][, 1L]
spectraSampleIndex <- function(x, duplicates = c("first", "keep")) {
duplicates <- match.arg(duplicates)
if (duplicates == "first") {
.spectra_sample_index_first(x@sampleDataLinks[["spectra"]],
length(x@spectra))
} else {
.spectra_sample_index_all(x@sampleDataLinks[["spectra"]],
length(x@spectra))
}
}

#' Return an `integer` vector with the sample index for each spectrum. If
#' a spectrum is assigned to more than one sample, the index of the first
#' is returned and a warning shown. For spectra without a sample assignment
#' `NA_integer_` is returned.
#'
#' @param x 2 column `matrix`, first column being sample index, second
#' spectra index.
#'
#' @param nspectra length of the object's `Spectra` object
#'
#' @return `integer` of length equal to the number of spectra.
#'
#' @noRd
.spectra_sample_index_first <- function(x, nspectra) {
if (length(x)) {
if (anyDuplicated(x[, 2L])) {
warning("Found at least one spectrum that is assigned to more ",
"than one sample. Will return the first sample for these. ",
"Consider using 'duplicates = \"all\"' to retrieve all ",
"mappings.")
res <- x[match(seq_len(nspectra), x[, 2L]), 1L]
} else {
res <- rep(NA_integer_, nspectra)
res[x[, 2L]] <- x[, 1L]
}
} else res <- rep(NA_integer_, nspectra)
res
}

#' Return an `list` of integer vectors with the sample indices for each
#' spectrum. For spectra without a sample assignment `integer()` is returned.
#'
#' @param x 2 column `matrix`, first column being sample index, second
#' spectra index.
#'
#' @param nspectra length of the object's `Spectra` object
#'
#' @return `list` of length equal to the number of spectra.
#'
#' @noRd
.spectra_sample_index_all <- function(x, nspectra) {
if (length(x)) {
res <- split(x[, 1L], f = factor(x[, 2L], levels = seq_len(nspectra)))
} else res <- replicate(nspectra, integer())
unname(res)
}
25 changes: 19 additions & 6 deletions R/MsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -84,12 +84,21 @@
#' [Spectra()] object, `spectra<-` takes a `Spectra` data as input and returns
#' the updated `MsExperiment`.
#'
#' - `spectraSampleIndex()`: returns an `integer` vector of length equal to
#' the number of spectra within the object with the indices of
#' the sample (in `sampleData()`) a spectrum is assigned to.
#' `NA_integer_` is returned for spectra that are not assigned to a sample
#' (using `linkSampleData()`). The function will throw an error if one of
#' the spectra is assigned to more than one sample.
#' - `spectraSampleIndex()`: depending on parameter `duplicates` it returns
#' either an `integer` (`duplicates = "first"`, the default) or a `list`
#' (`duplicates = "keep"`) of length equal to the number of spectra within
#' the object with the indices of the sample(s) (in `sampleData()`) a
#' spectrum is assigned to. With `duplicates = "first"`, an `integer` with
#' the index is returned for each spectrum. If a spectrum was assigned to
#' more than one sample a warning is shown and only the first sample index
#' is returned for that spectrum. For `duplicates = "keep"`, assignments are
#' returned as a `list` of `integer` vectors, each element being the
#' index(es) of the sample(s) a spectrum is assigned to. For spectra that are
#' not linked to any sample an `NA_integer_` is returned as index for
#' `duplicates = "first"` and an empty integer (`integer()`) for
#' `duplicates = "keep"`.
#' Note that the default `duplicates = "first"` will work in almost all use
#' cases, as generally, a spectrum will be assigned to a single sample.
#'
#' - `qdata()`, `qdata<-`: gets or sets the quantification data, which can be a
#' `QFeatures` or `SummarizedExperiment`.
Expand Down Expand Up @@ -157,6 +166,10 @@
#' Note that `linkSampleData` will **replace** a previously existing link to the
#' same data element.
#'
#' - `spectraSampleIndex()` is a convenience function that extracts for each
#' spectrum in the object's `spectra()` the index of the sample it is
#' associated with (see function's help above for more information).
#'
#' @section Subsetting and filtering:
#'
#' - `[`: `MsExperiment` objects can be subset **by samples** with `[i]`
Expand Down
28 changes: 21 additions & 7 deletions man/MsExperiment.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 46 additions & 1 deletion tests/testthat/test_MsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,44 @@ test_that(".update_sample_data_links_spectra works", {
expect_true(length(spectra(res[2L])) == 1)
})

test_that(".spectra_sample_index_first works", {
res <- .spectra_sample_index_first(matrix(nrow = 0, ncol = 2), 100)
expect_equal(length(res), 100)
expect_equal(res, rep(NA_integer_, 100))

a <- cbind(c(1, 1, 1, 1, 2, 2, 2, 2), 4:11)
res <- .spectra_sample_index_first(a, 20)
expect_equal(length(res), 20)
expect_equal(res[4:11], c(1, 1, 1, 1, 2, 2, 2, 2))
expect_true(all(is.na(res[12:20])))

a <- cbind(c(1, 1, 1, 1, 2, 2, 2, 2), c(4:10, 5))
expect_warning(res <- .spectra_sample_index_first(a, 20), "Found at least")
expect_equal(length(res), 20)
expect_equal(res[4:10], c(1, 1, 1, 1, 2, 2, 2))
expect_true(all(is.na(res[11:20])))
})

test_that(".spectra_sample_index_all works", {
res <- .spectra_sample_index_all(matrix(nrow = 0, ncol = 2), 100)
expect_equal(length(res), 100)
expect_true(is.list(res))
expect_true(all(lengths(res) == 0))
expect_equal(res, replicate(100, integer()))

a <- cbind(c(1, 1, 1, 1, 2, 2, 2, 2), 4:11)
res <- .spectra_sample_index_all(a, 20)
expect_equal(length(res), 20)
expect_equal(res[4:11], list(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L))
expect_true(all(lengths(res[12:20]) == 0))

a <- cbind(c(1, 1, 1, 1, 2, 2, 2, 2), c(4:10, 5))
res <- .spectra_sample_index_all(a, 20)
expect_equal(length(res), 20)
expect_equal(res[4:10], list(1L, c(1L, 2L), 1L, 1L, 2L, 2L, 2L))
expect_true(all(lengths(res[11:20]) == 0))
})

test_that("spectraSampleIndex works", {
a <- MsExperiment()
expect_equal(spectraSampleIndex(a), integer())
Expand All @@ -319,7 +357,14 @@ test_that("spectraSampleIndex works", {
expect_equal(res[!is.na(res)], c(2L, 1L, 1L, 1L))
expect_equal(res[c(132, 2, 342, 54)], c(1, 2, 1, 1))

res <- spectraSampleIndex(a, duplicates = "keep")
expect_equal(res[c(132, 2, 342, 54)], list(1L, 2L, 1L, 1L))

a@sampleDataLinks[["spectra"]] <- cbind(c(1L, 2L, 1L, 1L, 2L),
c(132L, 2L, 342L, 54L, 2L))
expect_error(spectraSampleIndex(a), "One or more")
expect_warning(res <- spectraSampleIndex(a), "Found at least")
expect_equal(res[c(132, 2, 342, 54)], c(1L, 2L, 1L, 1L))

res <- spectraSampleIndex(a, duplicates = "keep")
expect_equal(res[c(132, 2, 342, 54)], list(1L, c(2L, 2L), 1L, 1L))
})
15 changes: 14 additions & 1 deletion vignettes/MsExperiment.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,8 @@ The link was thus established between matching values in

```{r show_link}
sampleData(lmse)$raw_file
head(spectra(lmse)$dataOrigin)
spectra(lmse)$dataOrigin |>
head()
```

The figure below illustrates this link. With that last call we have thus
Expand All @@ -419,6 +420,18 @@ knitr::include_graphics("imgs/Links_03.png")
lmse
```

A convenience function to quickly extract the index of a sample a spectrum is
associated with is `spectraSampleIndex()`. This function returns an `integer`
vector of length equal to the number of spectra in the object with the row in
the object's `sampleData` a spectrum is linked to, or `NA_integer_` if a
spectrum is not linked to any sample.

```{r}
#' Show the sample assignment for the first few spectra
spectraSampleIndex(lmse) |>
head()
```

If we had also quantified *feature* values, we could also link them to the
samples. Below we create a simple, small `SummarizedExperiment` to represent
such quantified feature values and add that to our experiment. To show that
Expand Down

0 comments on commit 4532d99

Please sign in to comment.