Skip to content

Commit

Permalink
Merge pull request #15 from AlexsLemonade/jashapiro/8-seurat-conversion
Browse files Browse the repository at this point in the history
Add functions for Seurat conversion
  • Loading branch information
jashapiro authored Dec 13, 2024
2 parents 6e85ffc + 7c546fd commit 0424106
Show file tree
Hide file tree
Showing 23 changed files with 908 additions and 403 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
^\.lintr$
^\.pre-commit-config.yaml$
^data-raw$
^LICENSE\.md$
1 change: 1 addition & 0 deletions .Renviron
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
RENV_EXT_ENABLED = FALSE
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ repos:
exclude: '\.Rd'

- repo: https://github.com/crate-ci/typos
rev: v1.28.1
rev: v1.28.2
hooks:
- id: typos
exclude: '\.nb\.html'
Expand Down
7 changes: 6 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,12 @@ LazyData: true
Suggests:
testthat (>= 3.0.0),
scater,
scran,
Seurat,
splatter
splatter,
scuttle,
Matrix,
SeuratObject
Config/testthat/edition: 3
RoxygenNote: 7.3.2
Imports:
Expand All @@ -34,6 +38,7 @@ Imports:
purrr,
S4Vectors,
SingleCellExperiment,
SummarizedExperiment,
tibble,
tidyr
biocViews:
Expand Down
31 changes: 3 additions & 28 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,28 +1,3 @@
BSD 3-Clause License

Copyright (c) 2024, Alex's Lemonade Stand Foundation

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
YEAR: 2024
COPYRIGHT HOLDER: Alex's Lemonade Stand Foundation
ORGANIZATION: Alex's Lemonade Stand Foundation
28 changes: 28 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# BSD 3-Clause License

Copyright (c) 2024, Alex's Lemonade Stand Foundation

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@ export(calculate_silhouette)
export(calculate_stability)
export(ensembl_to_symbol)
export(extract_pc_matrix)
export(sce_to_seurat)
export(sce_to_symbols)
export(sum_duplicate_genes)
export(sweep_clusters)
import(SingleCellExperiment)
import(SummarizedExperiment)
import(methods)
importFrom(S4Vectors,`metadata<-`)
importFrom(S4Vectors,metadata)
Expand Down
78 changes: 52 additions & 26 deletions R/convert-gene-ids.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,21 @@
#'
#'
#' @param ensembl_ids A character vector of Ensembl gene ids to translate to
#' gene symbols.
#' @param reference The reference gene list to use for translation. One of `scpca`,
#' `10x2020`, `10x2024`. The `scpca` reference is the default.
#' @param sce A SingleCellExperiment object to use as a reference for gene symbols.
#' If provided, the `reference` argument will be ignored. The `sce` object must
#' include columns with the names `gene_ids` (containing Ensembl ids) and
#' `gene_symbol` (containing the symbols) to use for conversion.
#' @param unique Whether to use unique gene symbols, as would be done if
#' data had been read in with gene symbols by Seurat. Default is FALSE.
#' @param leave_na Whether to leave NA values in the output vector.
#' If FALSE, any missing values will be replaced with the input ensembl_id value.
#' Default is FALSE.
#' gene symbols.
#' @param reference The reference gene list to use for translation. One of
#' `scpca`, `10x2020`, `10x2024`. The `scpca` reference is the default.
#' @param sce A SingleCellExperiment object to use as a reference for gene
#' symbols. If provided, the `reference` argument will be ignored. The `sce`
#' object must include columns with the names `gene_ids` (containing Ensembl
#' ids) and `gene_symbol` (containing the symbols) to use for conversion.
#' @param unique Whether to use unique gene symbols, as would be done if data
#' had been read in with gene symbols by Seurat. Default is FALSE.
#' @param leave_na Whether to leave NA values in the output vector. If FALSE,
#' any missing values will be replaced with the input ensembl_id value.
#' Default is FALSE.
#' @param seurat_compatible Whether to return a vector that is compatible with
#' Seurat, translating and underscores to dashes.
#' Default is FALSE.
#'
#' @return A vector of gene symbols corresponding to the input Ensembl ids.
#' @export
Expand All @@ -51,7 +54,8 @@ ensembl_to_symbol <- function(
reference = c("scpca", "10x2020", "10x2024"),
sce = NULL,
unique = FALSE,
leave_na = FALSE) {
leave_na = FALSE,
seurat_compatible = FALSE) {
reference <- match.arg(reference)
stopifnot(
"`ensembl_ids` must be a character vector." = is.character(ensembl_ids),
Expand Down Expand Up @@ -85,10 +89,17 @@ ensembl_to_symbol <- function(
)
}
if (!leave_na && any(missing_symbols)) {
warning("Not all input ids have corresponding gene symbols, using input ids for missing values.")
message("Not all input ids have corresponding gene symbols, using input ids for missing values.")
gene_symbols[missing_symbols] <- ensembl_ids[missing_symbols]
}

if (seurat_compatible) {
if (any(grepl("_", gene_symbols))) {
warning("Replacing underscores ('_') with dashes ('-') in gene symbols for Seurat compatibility.")
}
gene_symbols <- gsub("_", "-", gene_symbols)
}

return(gene_symbols)
}

Expand All @@ -112,15 +123,19 @@ ensembl_to_symbol <- function(
#' (and not disabled by the `convert_hvg` and `convert_pca` arguments).
#'
#'
#' @param sce A SingleCellExperiment object containing gene ids and gene symbols.
#' @param reference The reference gene list for conversion. One of `sce`, `scpca`,
#' `10x2020`, or `10x2024`. If `sce` (the default) the internal row data is used.
#' @param unique Whether to use unique gene symbols, as would be done if
#' data had been read in with gene symbols by Seurat. Default is FALSE.
#' @param convert_hvg Logical indicating whether to convert highly variable genes to gene symbols.
#' Default is TRUE.
#' @param convert_pca Logical indicating whether to convert PCA rotation matrix to gene symbols.
#' Default is TRUE.
#' @param sce A SingleCellExperiment object containing gene ids and gene
#' symbols.
#' @param reference The reference gene list for conversion. One of `sce`,
#' `scpca`, `10x2020`, or `10x2024`. If `sce` (the default) the internal row
#' data is used.
#' @param unique Whether to use unique gene symbols, as would be done if data
#' had been read in with gene symbols by Seurat. Default is FALSE.
#' @param convert_hvg Logical indicating whether to convert highly variable
#' genes to gene symbols. Default is TRUE.
#' @param convert_pca Logical indicating whether to convert PCA rotation matrix
#' to gene symbols. Default is TRUE.
#' @param seurat_compatible Logical indicating whether to make gene symbols
#' Seurat-compatible by replacing underscores with dashes. Default is FALSE.
#'
#' @return A SingleCellExperiment object with row names set as gene symbols.
#' @export
Expand All @@ -145,7 +160,8 @@ sce_to_symbols <- function(
reference = c("sce", "scpca", "10x2020", "10x2024"),
unique = FALSE,
convert_hvg = TRUE,
convert_pca = TRUE) {
convert_pca = TRUE,
seurat_compatible = FALSE) {
reference <- match.arg(reference)
stopifnot(
"`sce` must be a SingleCellExperiment object." = is(sce, "SingleCellExperiment"),
Expand All @@ -164,9 +180,19 @@ sce_to_symbols <- function(
}

if (reference == "sce") {
gene_symbols <- ensembl_to_symbol(ensembl_ids, sce = sce, unique = unique)
gene_symbols <- ensembl_to_symbol(
ensembl_ids,
sce = sce,
unique = unique,
seurat_compatible = seurat_compatible
)
} else {
gene_symbols <- ensembl_to_symbol(ensembl_ids, reference = reference, unique = unique)
gene_symbols <- ensembl_to_symbol(
ensembl_ids,
reference = reference,
unique = unique,
seurat_compatible = seurat_compatible
)
}
row_ids <- gene_symbols
# set Ensembl ids as original ids for later translations
Expand Down
Loading

0 comments on commit 0424106

Please sign in to comment.