Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit NA-handling 🛠️ #10

Merged
merged 6 commits into from
Dec 6, 2024
Merged

Explicit NA-handling 🛠️ #10

merged 6 commits into from
Dec 6, 2024

Conversation

serkor1
Copy link
Owner

@serkor1 serkor1 commented Dec 6, 2024

About

Issue #8 concerns non-transparent NA-handling in pairwise regression and classification metrics. When developing this package it was intended that NA-values were to be handled outside of the functions using wrappers; this was the main idea up until 4bd30bd (See point 5).

Explicit and transparent NA-handling is essential for any package. Posting this Issue #8 significantly improved the quality of {SLmetrics} - thank you @EmilHvitfeldt.

Changes

  • All pair-wise regression and classification functions now uses na.rm-arguments, to explicitly handle NA-values.
  • All pair-wise regression and classification functions now uses std::vector instead of NumericVector, it appears to faster (See 'On performance')

On performance

rm(list = ls()); gc()
#>          used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 542439 29.0    1186437 63.4   726777 38.9
#> Vcells 977321  7.5    8388608 64.0  1972559 15.1
Rcpp::sourceCpp(
  "src/classification_Accuracy.cpp"
)

# 1) generate classes
# 1.1) actual classes
actual <- factor(
  sample(
    x       = letters[1:3],
    size    = 2e4,
    replace = TRUE
  )
)

# 1.2) predicted classes
predicted <-  factor(
  sample(
    x       = letters[1:3],
    size    = 2e4,
    replace = TRUE
  )
)

# 2) benchmarrk
microbenchmark::microbenchmark(
  `(without NA-handlins)` = SLmetrics::accuracy(actual, predicted),
  `(With NA handling)`    = accuracy.factor(actual, predicted)
)
#> Unit: microseconds
#>                   expr    min      lq      mean median     uq      max neval
#>  (without NA-handlins) 16.051 16.1670 107.41617 16.446 16.522 5968.758   100
#>     (With NA handling) 15.461 15.5465  15.73808 15.791 15.877   16.770   100

Created on 2024-12-06 with reprex v2.1.1

Note

This PR ONLY addresses the raised issue for .factor and .numeric methods. The cmatrix-method needs
significant rework.

This will be fixed soon (TM)

* The `accuracy()`-function now handles missings values according to na.rm
* The `accuracy()`-function uses `std::vector` to offset the loss in speed from missing values checking
* Updated tools; the modifyRcppExports is now more dynamic and uses regex instead of hardcoded functions.

NOTE: All checks passed locally.
* The `zerooneloss()`-function now accepts the argument `na.rm`
* Internal optimaztion of the function to make it faster
* The NEWS have been updated to reflect the new changes.
* All functions now has an argument `na.rm` to explicitly handle missing values.
* Migrated to std::vector instead of NumericVector
* Fixed a small bug in `accuracy.Rd`
* Unit-tests: The unit-tests now tests if misssing values are being handled according to the `na.rm`-argument
* NA-handling in functions: `rsq()`-, `ccc()`-, `mae()`-, `mape()`-, `mse()`-, `pinball()`-, `rae()`-, `rmse()`-, `rrmse()`- and `smape()`-functions now all uses `std::vector` instead of `NumericVector`, and has explicit NA-handling.

All tests passed locally.
@serkor1 serkor1 added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 6, 2024
@serkor1 serkor1 changed the title Expllicit NA-handling 🛠️ Explicit NA-handling 🛠️ Dec 6, 2024
@serkor1 serkor1 merged commit 6ff6107 into development Dec 6, 2024
6 of 7 checks passed
@serkor1 serkor1 deleted the patch-1 branch December 6, 2024 06:21
serkor1 added a commit that referenced this pull request Dec 6, 2024
* [UPDATE] `accuracy()`-function

* The `accuracy()`-function now handles missings values according to na.rm
* The `accuracy()`-function uses `std::vector` to offset the loss in speed from missing values checking
* Updated tools; the modifyRcppExports is now more dynamic and uses regex instead of hardcoded functions.

NOTE: All checks passed locally.

* [UPDATE] `zerooneloss()`-function 🚀

* The `zerooneloss()`-function now accepts the argument `na.rm`
* Internal optimaztion of the function to make it faster

* [DOCUMENTATION] Updated NEWS 📚

* The NEWS have been updated to reflect the new changes.

* [UPDATE] `RMSE()`- `RMSLE()`- and `huberloss()` 🚀

* All functions now has an argument `na.rm` to explicitly handle missing values.
* Migrated to std::vector instead of NumericVector
* Fixed a small bug in `accuracy.Rd`

* [UPDATE] Unit-tests and NA-handlings 🔥

* Unit-tests: The unit-tests now tests if misssing values are being handled according to the `na.rm`-argument
* NA-handling in functions: `rsq()`-, `ccc()`-, `mae()`-, `mape()`-, `mse()`-, `pinball()`-, `rae()`-, `rmse()`-, `rrmse()`- and `smape()`-functions now all uses `std::vector` instead of `NumericVector`, and has explicit NA-handling.

All tests passed locally.

* [VERSION-BUMP] 0.1-0 ---> 0.1-1 🪜

> [!NOTE]
>
> Checks fail on PR for OSX. But all else passes.
serkor1 added a commit that referenced this pull request Dec 8, 2024
* Explicit `NA`-handling 🛠️  (#10)

* [UPDATE] `accuracy()`-function

* The `accuracy()`-function now handles missings values according to na.rm
* The `accuracy()`-function uses `std::vector` to offset the loss in speed from missing values checking
* Updated tools; the modifyRcppExports is now more dynamic and uses regex instead of hardcoded functions.

NOTE: All checks passed locally.

* [UPDATE] `zerooneloss()`-function 🚀

* The `zerooneloss()`-function now accepts the argument `na.rm`
* Internal optimaztion of the function to make it faster

* [DOCUMENTATION] Updated NEWS 📚

* The NEWS have been updated to reflect the new changes.

* [UPDATE] `RMSE()`- `RMSLE()`- and `huberloss()` 🚀

* All functions now has an argument `na.rm` to explicitly handle missing values.
* Migrated to std::vector instead of NumericVector
* Fixed a small bug in `accuracy.Rd`

* [UPDATE] Unit-tests and NA-handlings 🔥

* Unit-tests: The unit-tests now tests if misssing values are being handled according to the `na.rm`-argument
* NA-handling in functions: `rsq()`-, `ccc()`-, `mae()`-, `mape()`-, `mse()`-, `pinball()`-, `rae()`-, `rmse()`-, `rrmse()`- and `smape()`-functions now all uses `std::vector` instead of `NumericVector`, and has explicit NA-handling.

All tests passed locally.

* [VERSION-BUMP] 0.1-0 ---> 0.1-1 🪜

> [!NOTE]
>
> Checks fail on PR for OSX. But all else passes.

* [BUG-FIX] `plot()`-method for `ROC()` and `prROC()` 🔨 (#11)

* The plots are now correctly displaying lines when `panels = FALSE`
* NEWS.md updated accordingly.
serkor1 added a commit that referenced this pull request Dec 8, 2024
* [UPDATE] `accuracy()`-function

* The `accuracy()`-function now handles missings values according to na.rm
* The `accuracy()`-function uses `std::vector` to offset the loss in speed from missing values checking
* Updated tools; the modifyRcppExports is now more dynamic and uses regex instead of hardcoded functions.

NOTE: All checks passed locally.

* [UPDATE] `zerooneloss()`-function 🚀

* The `zerooneloss()`-function now accepts the argument `na.rm`
* Internal optimaztion of the function to make it faster

* [DOCUMENTATION] Updated NEWS 📚

* The NEWS have been updated to reflect the new changes.

* [UPDATE] `RMSE()`- `RMSLE()`- and `huberloss()` 🚀

* All functions now has an argument `na.rm` to explicitly handle missing values.
* Migrated to std::vector instead of NumericVector
* Fixed a small bug in `accuracy.Rd`

* [UPDATE] Unit-tests and NA-handlings 🔥

* Unit-tests: The unit-tests now tests if misssing values are being handled according to the `na.rm`-argument
* NA-handling in functions: `rsq()`-, `ccc()`-, `mae()`-, `mape()`-, `mse()`-, `pinball()`-, `rae()`-, `rmse()`-, `rrmse()`- and `smape()`-functions now all uses `std::vector` instead of `NumericVector`, and has explicit NA-handling.

All tests passed locally.

* [VERSION-BUMP] 0.1-0 ---> 0.1-1 🪜

> [!NOTE]
>
> Checks fail on PR for OSX. But all else passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant