Skip to content

Commit

Permalink
Update scale_mgm() function using pooled SD and bump version.
Browse files Browse the repository at this point in the history
  • Loading branch information
Gene233 committed Mar 29, 2024
1 parent 8ff99c3 commit dfe6155
Show file tree
Hide file tree
Showing 5 changed files with 43 additions and 11 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: smartid
Title: Scoring and Marker Selection Method Based on Modified TF-IDF
Version: 0.99.4
Version: 0.99.5
Authors@R:
person("Jinjin", "Chen", email = "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-7923-5723"))
Expand Down
6 changes: 5 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

# smartid 0.99.2

* Added test for `gs_score` function.
* Added test for `gs_score()` function.

# smartid 0.99.3

Expand All @@ -17,3 +17,7 @@
# smartid 0.99.4

* Add details for TF, IDF, IAE functions.

# smartid 0.99.5

* Update `scale_mgm()` function, using pooled SD.
32 changes: 26 additions & 6 deletions R/scale_mgm.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
#' scale by mean of group mean in case extreme unbalanced data
#' scale by mean of group mean for imbalanced data
#'
#' @details
#' \deqn{z=\frac{x-\frac{\sum_k^{n_D}(\mu_k)}{n_D}}{s_{pooled}}}
#' where \eqn{s_{pooled}=\sqrt{\frac{\sum_k^{n_D}{(n_k-1){s_k}^2}}{\sum_k^{n_D}{n_k}-k}}}
#'
#' @param expr matrix
#' @param label a vector of group label
Expand All @@ -9,12 +13,28 @@
#' @examples
#' scale_mgm(matrix(rnorm(100), 10), label = rep(letters[1:2], 5))
scale_mgm <- function(expr, label) {
## compute sds
sds <- sparseMatrixStats::rowSds(expr, na.rm = TRUE)
# sds <- sapply(unique(label), \(i)
# sparseMatrixStats::rowSds(expr[, label == i], na.rm = TRUE)
# ## compute overall sds
# sds <- sparseMatrixStats::rowSds(expr, na.rm = TRUE)

# ## compute group sds
# sds <- vapply(unique(label), \(i)
# sparseMatrixStats::rowSds(expr[, label == i, drop = FALSE],
# na.rm = TRUE),
# rep(1, nrow(expr))
# ) # get sds of each group
# colnames(sds) <- unique(label)
# sds <- sparseMatrixStats::rowMeans2(sds)

## compute pooled sds
sds <- vapply(
unique(label), \(i)
sparseMatrixStats::rowVars(expr[, label == i, drop = FALSE],
na.rm = TRUE
),
rep(1, nrow(expr))
) # get vars of each group
ng <- table(label)[unique(label)] # get group sizes in the same order
sds <- sds %*% cbind(ng - 1)
sds <- as.numeric(sqrt(sds / sum(ng - 1)))

## compute group means
mgm <- vapply(
Expand Down
8 changes: 6 additions & 2 deletions man/scale_mgm.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion vignettes/smartid_Demo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,11 @@ names(metadata(data_sim))

## Scale and Transform Score

Scaling is needed to find the markers specific to the group, however, standard scaling might fail due to the rare populations. Here `smartid` uses a special scaling strategy `scale_mgm()`, which can scale imbalanced data by given group labels.
Scaling is needed to find the markers specific to the group, however, standard scaling might fail due to the rare populations. Here `smartid` uses a special scaling strategy `scale_mgm()`, which can scale imbalanced data by given group labels. By doing this, we can avoid the bias towards features with larger numerical ranges during feature selection.

The scale method is depicted as below:

$$z=\frac{x-\frac{\sum_k^{n_D}(\mu_k)}{n_D}}{s_{pooled}},\ s_{pooled}=\sqrt{\frac{\sum_k^{n_D}{(n_k-1){s_k}^2}}{\sum_k^{n_D}{n_k}-k}}$$

The score will be transformed using softmax before passing to EM algorithm.

Expand Down

0 comments on commit dfe6155

Please sign in to comment.