Skip to content

Commit

Permalink
Merge pull request #159 from tidymodels/RC-1.0.3
Browse files Browse the repository at this point in the history
  • Loading branch information
EmilHvitfeldt authored Jan 23, 2025
2 parents 95e578b + 151441c commit 693f3c4
Show file tree
Hide file tree
Showing 23 changed files with 99 additions and 93 deletions.
8 changes: 3 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: themis
Title: Extra Recipes Steps for Dealing with Unbalanced Data
Version: 1.0.2.9000
Version: 1.0.3.9000
Authors@R: c(
person("Emil", "Hvitfeldt", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-0679-1945")),
Expand All @@ -9,7 +9,7 @@ Authors@R: c(
Description: A dataset with an uneven number of cases in each class is
said to be unbalanced. Many models produce a subpar performance on
unbalanced datasets. A dataset can be balanced by increasing the
number of minority cases using SMOTE 2011 <arXiv:1106.1813>,
number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>,
BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008
<https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the
number of majority cases using NearMiss 2003
Expand All @@ -20,7 +20,7 @@ URL: https://github.com/tidymodels/themis, https://themis.tidymodels.org
BugReports: https://github.com/tidymodels/themis/issues
Depends:
R (>= 3.6),
recipes (>= 1.1.0.9000)
recipes (>= 1.1.0)
Imports:
cli,
gower,
Expand All @@ -42,8 +42,6 @@ Suggests:
ggplot2,
modeldata,
testthat (>= 3.0.0)
Remotes:
tidymodels/recipes
Config/Needs/website: tidyverse/tidytemplate
Config/testthat/edition: 3
Encoding: UTF-8
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# themis (development version)

# themis 1.0.3

## Improvements

* Calling `?tidy.step_*()` now sends you to the documentation for `step_*()` where the outcome is documented. (#142)

* Documentation now correctly specifies majority-to-minority and minority-to-majority. (#143, #110)

* Documentation for tidy methods for all steps has been improved to describe the return value more accurately. (#148)

* All messages, warnings and errors has been translated to use {cli} package (#153, #155).

# themis 1.0.2

## Improvements
Expand Down
8 changes: 4 additions & 4 deletions R/adasyn.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#' @inheritParams recipes::step_center
#' @inheritParams step_upsample
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand All @@ -24,8 +24,8 @@
#' the variable used to sample.
#'
#' @details
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' All columns used in this step must be numeric with no missing data.
#'
Expand All @@ -35,7 +35,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/bsmote.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#' @inheritParams recipes::step_center
#' @inheritParams step_upsample
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand Down Expand Up @@ -46,8 +46,8 @@
#' `neighbors` nearest neighbor of each example of the minority class.
#' The parameter `neighbors` controls how many of these neighbor are used.
#'
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' All columns used in this step must be numeric with no missing data.
#'
Expand All @@ -57,7 +57,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/downsample.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#'
#' @inheritParams recipes::step_center
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand Down Expand Up @@ -41,8 +41,8 @@
#' For any data with factor levels occurring with the same
#' frequency as the minority level, all data will be retained.
#'
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' Keep in mind that the location of down-sampling in the step
#' may have effects. For example, if centering and scaling,
Expand All @@ -51,7 +51,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/nearmiss.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#' @inheritParams step_downsample
#' @inheritParams step_smote
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand All @@ -27,8 +27,8 @@
#' This method retains the points from the majority class which have the
#' smallest mean distance to the k nearest points in the minority class.
#'
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' All columns used in this step must be numeric with no missing data.
#'
Expand All @@ -38,7 +38,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/rose.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#' @inheritParams recipes::step_center
#' @inheritParams step_upsample
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand Down Expand Up @@ -41,16 +41,16 @@
#' could lead to blur the boundaries between the regions of the feature space
#' associated with each class.
#'
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' When used in modeling, users should strongly consider using the
#' option `skip = TRUE` so that the extra sampling is _not_
#' conducted outside of the training set.
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/smote.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#' @inheritParams recipes::step_center
#' @inheritParams step_upsample
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand All @@ -31,8 +31,8 @@
#' `neighbors` nearest neighbor of each example of the minority class.
#' The parameter `neighbors` controls how many of these neighbor are used.
#'
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' All columns used in this step must be numeric with no missing data.
#'
Expand All @@ -42,7 +42,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/smotenc.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#' @inheritParams recipes::step_center
#' @inheritParams step_upsample
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand All @@ -33,8 +33,8 @@
#' `neighbors` nearest neighbor of each example of the minority class.
#' The parameter `neighbors` controls how many of these neighbor are used.
#'
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' Columns can be numeric and categorical with no missing data.
#'
Expand All @@ -44,7 +44,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/tomek.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#'
#' @inheritParams recipes::step_center
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand All @@ -27,16 +27,16 @@
#' A tomek link is defined as a pair of points from different classes and are
#' each others nearest neighbors.
#'
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' When used in modeling, users should strongly consider using the
#' option `skip = TRUE` so that the extra sampling is _not_
#' conducted outside of the training set.
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/upsample.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#'
#' @inheritParams recipes::step_center
#' @param ... One or more selector functions to choose which
#' variable is used to sample the data. See [selections()]
#' variable is used to sample the data. See [recipes::selections]
#' for more details. The selection should result in _single
#' factor variable_. For the `tidy` method, these are not
#' currently used.
Expand Down Expand Up @@ -41,12 +41,12 @@
#' For any data with factor levels occurring with the same
#' frequency as the majority level, all data will be retained.
#'
#' All columns in the data are sampled and returned by [juice()]
#' and [bake()].
#' All columns in the data are sampled and returned by [recipes::juice()]
#' and [recipes::bake()].
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ recipe(~., example_data) %>%

This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and machine learning, [join us on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- For questions and discussions about tidymodels packages, modeling, and machine learning, [join us on RStudio Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/themis/issues).

Expand Down
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,14 +100,14 @@ example_data %>%
The following methods all share the tuning parameter `over_ratio`, which
is the ratio of the minority-to-majority frequencies.

| name | function | Multi-class |
|-----------------------------------------------------------------|---------------------------|--------------------|
| Random minority over-sampling with replacement | `step_upsample()` | :heavy_check_mark: |
| Synthetic Minority Over-sampling Technique | `step_smote()` | :heavy_check_mark: |
| Borderline SMOTE-1 | `step_bsmote(method = 1)` | :heavy_check_mark: |
| Borderline SMOTE-2 | `step_bsmote(method = 2)` | :heavy_check_mark: |
| Adaptive synthetic sampling approach for imbalanced learning | `step_adasyn()` | :heavy_check_mark: |
| Generation of synthetic data by Randomly Over Sampling Examples | `step_rose()` | |
| name | function | Multi-class |
|----|----|----|
| Random minority over-sampling with replacement | `step_upsample()` | :heavy_check_mark: |
| Synthetic Minority Over-sampling Technique | `step_smote()` | :heavy_check_mark: |
| Borderline SMOTE-1 | `step_bsmote(method = 1)` | :heavy_check_mark: |
| Borderline SMOTE-2 | `step_bsmote(method = 2)` | :heavy_check_mark: |
| Adaptive synthetic sampling approach for imbalanced learning | `step_adasyn()` | :heavy_check_mark: |
| Generation of synthetic data by Randomly Over Sampling Examples | `step_rose()` | |

By setting `over_ratio = 1` you bring the number of samples of all
minority classes equal to 100% of the majority class.
Expand Down Expand Up @@ -143,11 +143,11 @@ Most of the the following methods all share the tuning parameter
`under_ratio`, which is the ratio of the majority-to-minority
frequencies.

| name | function | Multi-class | under_ratio |
|-------------------------------------------------|---------------------|--------------------|--------------------|
| name | function | Multi-class | under_ratio |
|----|----|----|----|
| Random majority under-sampling with replacement | `step_downsample()` | :heavy_check_mark: | :heavy_check_mark: |
| NearMiss-1 | `step_nearmiss()` | :heavy_check_mark: | :heavy_check_mark: |
| Extraction of majority-minority Tomek links | `step_tomek()` | | |
| NearMiss-1 | `step_nearmiss()` | :heavy_check_mark: | :heavy_check_mark: |
| Extraction of majority-minority Tomek links | `step_tomek()` | | |

By setting `under_ratio = 1` you bring the number of samples of all
majority classes equal to 100% of the minority class.
Expand Down Expand Up @@ -186,7 +186,7 @@ By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and
machine learning, [join us on RStudio
Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an
issue](https://github.com/tidymodels/themis/issues).
Expand Down
6 changes: 4 additions & 2 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
## Release Summary

This is the 10th CRAN release of themis.
Fixed arXiv issue in description, and should have resolved all other notes.

This is the 11th CRAN release of themis.

## R CMD check results

0 errors | 0 warnings | 0 note

## revdepcheck results

We checked 2 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.
We checked 3 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.

* We saw 0 new problems
* We failed to check 0 packages
Loading

0 comments on commit 693f3c4

Please sign in to comment.