Merge pull request #159 from tidymodels/RC-1.0.3

tidymodels · Jan 23, 2025 · 693f3c4 · 693f3c4
2 parents 95e578b + 151441c
commit 693f3c4
Show file tree

Hide file tree

Showing 23 changed files with 99 additions and 93 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: themis
 Title: Extra Recipes Steps for Dealing with Unbalanced Data
-Version: 1.0.2.9000
+Version: 1.0.3.9000
 Authors@R: c(
     person("Emil", "Hvitfeldt", , "[email protected]", role = c("aut", "cre"),
            comment = c(ORCID = "0000-0002-0679-1945")),
@@ -9,7 +9,7 @@ Authors@R: c(
 Description: A dataset with an uneven number of cases in each class is
     said to be unbalanced. Many models produce a subpar performance on
     unbalanced datasets. A dataset can be balanced by increasing the
-    number of minority cases using SMOTE 2011 <arXiv:1106.1813>,
+    number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>,
     BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008
     <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the
     number of majority cases using NearMiss 2003
@@ -20,7 +20,7 @@ URL: https://github.com/tidymodels/themis, https://themis.tidymodels.org
 BugReports: https://github.com/tidymodels/themis/issues
 Depends: 
     R (>= 3.6),
-    recipes (>= 1.1.0.9000)
+    recipes (>= 1.1.0)
 Imports:
     cli,
     gower,
@@ -42,8 +42,6 @@ Suggests:
     ggplot2,
     modeldata,
     testthat (>= 3.0.0)
-Remotes:
-    tidymodels/recipes
 Config/Needs/website: tidyverse/tidytemplate
 Config/testthat/edition: 3
 Encoding: UTF-8

diff --git a/NEWS.md b/NEWS.md
@@ -1,11 +1,17 @@
 # themis (development version)
 
+# themis 1.0.3
+
+## Improvements
+
 * Calling `?tidy.step_*()` now sends you to the documentation for `step_*()` where the outcome is documented. (#142)
 
 * Documentation now correctly specifies majority-to-minority and minority-to-majority. (#143, #110)
 
 * Documentation for tidy methods for all steps has been improved to describe the return value more accurately. (#148)
 
+* All messages, warnings and errors has been translated to use {cli} package (#153, #155).
+
 # themis 1.0.2
 
 ## Improvements

diff --git a/R/adasyn.R b/R/adasyn.R
@@ -6,7 +6,7 @@
 #' @inheritParams recipes::step_center
 #' @inheritParams step_upsample
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -24,8 +24,8 @@
 #'  the variable used to sample.
 #'
 #' @details
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' All columns used in this step must be numeric with no missing data.
 #'
@@ -35,7 +35,7 @@
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/R/bsmote.R b/R/bsmote.R
@@ -7,7 +7,7 @@
 #' @inheritParams recipes::step_center
 #' @inheritParams step_upsample
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -46,8 +46,8 @@
 #' `neighbors` nearest neighbor of each example of the minority class.
 #' The parameter `neighbors` controls how many of these neighbor are used.
 #'
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' All columns used in this step must be numeric with no missing data.
 #'
@@ -57,7 +57,7 @@
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/R/downsample.R b/R/downsample.R
@@ -6,7 +6,7 @@
 #'
 #' @inheritParams recipes::step_center
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -41,8 +41,8 @@
 #' For any data with factor levels occurring with the same
 #'  frequency as the minority level, all data will be retained.
 #'
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' Keep in mind that the location of down-sampling in the step
 #'  may have effects. For example, if centering and scaling,
@@ -51,7 +51,7 @@
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/R/nearmiss.R b/R/nearmiss.R
@@ -8,7 +8,7 @@
 #' @inheritParams step_downsample
 #' @inheritParams step_smote
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -27,8 +27,8 @@
 #' This method retains the points from the majority class which have the
 #' smallest mean distance to the k nearest points in the minority class.
 #'
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' All columns used in this step must be numeric with no missing data.
 #'
@@ -38,7 +38,7 @@
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/R/rose.R b/R/rose.R
@@ -7,7 +7,7 @@
 #' @inheritParams recipes::step_center
 #' @inheritParams step_upsample
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -41,16 +41,16 @@
 #' could lead to blur the boundaries between the regions of the feature space
 #' associated with each class.
 #'
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' When used in modeling, users should strongly consider using the
 #'  option `skip = TRUE` so that the extra sampling is _not_
 #'  conducted outside of the training set.
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/R/smote.R b/R/smote.R
@@ -6,7 +6,7 @@
 #' @inheritParams recipes::step_center
 #' @inheritParams step_upsample
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -31,8 +31,8 @@
 #' `neighbors` nearest neighbor of each example of the minority class.
 #' The parameter `neighbors` controls how many of these neighbor are used.
 #'
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' All columns used in this step must be numeric with no missing data.
 #'
@@ -42,7 +42,7 @@
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/R/smotenc.R b/R/smotenc.R
@@ -8,7 +8,7 @@
 #' @inheritParams recipes::step_center
 #' @inheritParams step_upsample
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -33,8 +33,8 @@
 #' `neighbors` nearest neighbor of each example of the minority class.
 #' The parameter `neighbors` controls how many of these neighbor are used.
 #'
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' Columns can be numeric and categorical with no missing data.
 #'
@@ -44,7 +44,7 @@
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/R/tomek.R b/R/tomek.R
@@ -5,7 +5,7 @@
 #'
 #' @inheritParams recipes::step_center
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -27,16 +27,16 @@
 #' A tomek link is defined as a pair of points from different classes and are
 #' each others nearest neighbors.
 #'
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' When used in modeling, users should strongly consider using the
 #'  option `skip = TRUE` so that the extra sampling is _not_
 #'  conducted outside of the training set.
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/R/upsample.R b/R/upsample.R
@@ -6,7 +6,7 @@
 #'
 #' @inheritParams recipes::step_center
 #' @param ... One or more selector functions to choose which
-#'  variable is used to sample the data. See [selections()]
+#'  variable is used to sample the data. See [recipes::selections]
 #'  for more details. The selection should result in _single
 #'  factor variable_. For the `tidy` method, these are not
 #'  currently used.
@@ -41,12 +41,12 @@
 #' For any data with factor levels occurring with the same
 #'  frequency as the majority level, all data will be retained.
 #'
-#' All columns in the data are sampled and returned by [juice()]
-#'  and [bake()].
+#' All columns in the data are sampled and returned by [recipes::juice()]
+#'  and [recipes::bake()].
 #'
 #' # Tidying
 #'
-#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
+#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is retruned with
 #'  columns `terms` and `id`:
 #'
 #' \describe{

diff --git a/README.Rmd b/README.Rmd
@@ -159,7 +159,7 @@ recipe(~., example_data) %>%
 
 This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
 
-- For questions and discussions about tidymodels packages, modeling, and machine learning, [join us on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
+- For questions and discussions about tidymodels packages, modeling, and machine learning, [join us on RStudio Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).
 
 - If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/themis/issues).
 

diff --git a/README.md b/README.md
@@ -100,14 +100,14 @@ example_data %>%
 The following methods all share the tuning parameter `over_ratio`, which
 is the ratio of the minority-to-majority frequencies.
 
-| name                                                            | function                  | Multi-class        |
-|-----------------------------------------------------------------|---------------------------|--------------------|
-| Random minority over-sampling with replacement                  | `step_upsample()`         | :heavy_check_mark: |
-| Synthetic Minority Over-sampling Technique                      | `step_smote()`            | :heavy_check_mark: |
-| Borderline SMOTE-1                                              | `step_bsmote(method = 1)` | :heavy_check_mark: |
-| Borderline SMOTE-2                                              | `step_bsmote(method = 2)` | :heavy_check_mark: |
-| Adaptive synthetic sampling approach for imbalanced learning    | `step_adasyn()`           | :heavy_check_mark: |
-| Generation of synthetic data by Randomly Over Sampling Examples | `step_rose()`             |                    |
+| name | function | Multi-class |
+|----|----|----|
+| Random minority over-sampling with replacement | `step_upsample()` | :heavy_check_mark: |
+| Synthetic Minority Over-sampling Technique | `step_smote()` | :heavy_check_mark: |
+| Borderline SMOTE-1 | `step_bsmote(method = 1)` | :heavy_check_mark: |
+| Borderline SMOTE-2 | `step_bsmote(method = 2)` | :heavy_check_mark: |
+| Adaptive synthetic sampling approach for imbalanced learning | `step_adasyn()` | :heavy_check_mark: |
+| Generation of synthetic data by Randomly Over Sampling Examples | `step_rose()` |  |
 
 By setting `over_ratio = 1` you bring the number of samples of all
 minority classes equal to 100% of the majority class.
@@ -143,11 +143,11 @@ Most of the the following methods all share the tuning parameter
 `under_ratio`, which is the ratio of the majority-to-minority
 frequencies.
 
-| name                                            | function            | Multi-class        | under_ratio        |
-|-------------------------------------------------|---------------------|--------------------|--------------------|
+| name | function | Multi-class | under_ratio |
+|----|----|----|----|
 | Random majority under-sampling with replacement | `step_downsample()` | :heavy_check_mark: | :heavy_check_mark: |
-| NearMiss-1                                      | `step_nearmiss()`   | :heavy_check_mark: | :heavy_check_mark: |
-| Extraction of majority-minority Tomek links     | `step_tomek()`      |                    |                    |
+| NearMiss-1 | `step_nearmiss()` | :heavy_check_mark: | :heavy_check_mark: |
+| Extraction of majority-minority Tomek links | `step_tomek()` |  |  |
 
 By setting `under_ratio = 1` you bring the number of samples of all
 majority classes equal to 100% of the minority class.
@@ -186,7 +186,7 @@ By contributing to this project, you agree to abide by its terms.
 
 - For questions and discussions about tidymodels packages, modeling, and
   machine learning, [join us on RStudio
-  Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
+  Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).
 
 - If you think you have encountered a bug, please [submit an
   issue](https://github.com/tidymodels/themis/issues).

diff --git a/cran-comments.md b/cran-comments.md
@@ -1,14 +1,16 @@
 ## Release Summary
 
-This is the 10th CRAN release of themis.
+Fixed arXiv issue in description, and should have resolved all other notes.
+
+This is the 11th CRAN release of themis.
 
 ## R CMD check results
 
 0 errors | 0 warnings | 0 note
 
 ## revdepcheck results
 
-We checked 2 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.
+We checked 3 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.
 
  * We saw 0 new problems
  * We failed to check 0 packages