Refactor datatype logic #68

mayer79 · 2024-07-21T18:54:31Z

This is a large PR that refactors the way missRanger() deals with variables that cannot be directly modeled by ranger(). The new implementation is slightly more picky, but also more safe.

It is an important step towards out-of-sample application (#58).

Here a summary:

Columns of special type like date/time can't be imputed anymore.
pmm() is more picky: xtrain and xtest must both be either numeric, logical, or factor (with identical levels).
Now requires ranger >= 0.16.0.
More compact vignettes.
Many relevant ranger() arguments are now explicit arguments in missRanger() to improve tab-completion experience:
- num.trees = 500
- mtry = NULL
- min.node.size = NULL
- min.bucket = NULL
- max.depth = NULL
- replace = TRUE
- sample.fraction = if (replace) 1 else 0.632
- case.weights = NULL
- num.threads = NULL
- save.memory = FALSE
Slightly more info before fitting.

codecov-commenter · 2024-07-21T18:55:53Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 87.17949% with 15 lines in your changes missing coverage. Please review.

Project coverage is 87.83%. Comparing base (20247e3) to head (77037c1).
Report is 11 commits behind head on main.

Files	Patch %	Lines
R/missRanger.R	84.94%	14 Missing ⚠️
R/pmm.R	94.11%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #68       +/-   ##
===========================================
- Coverage   97.95%   87.83%   -10.13%     
===========================================
  Files           5        5               
  Lines         245      263       +18     
===========================================
- Hits          240      231        -9     
- Misses          5       32       +27

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mayer79 added 24 commits July 19, 2024 16:57

Add unit tests for ranger()

3a55fb1

More clearer pmm()

5f28dd7

Add four explicit args passed to ranger()

a55bdef

Clearer info in summary()

eaad720

Revise vignettes

8cf73ce

Expose more ranger() arguments

2cad4d2

Avoid subsetting NULL object in case.weights

b503fbf

Typo in error message

4895508

More formula parsing to helper functions

c0fb921

Update unit tests

9f9e4af

More unit tests for ranger()

27d7236

to_impute is sorted -> unit tests

3efdaab

More picky pmm()

f9747c3

More compact examples

46f3042

Add outlook with changes planned for Version 3

37facde

Slightly better tests for ranger()

7917581

WIP: replace the convert/revert logic

f9439f5

Add tests for helpers

c538126

Update vignettes

d5518e7

Minimal ranger version set

b05f15a

Better messages

727c484

Update tests for imputeUnivariate()

bb31537

Revise message returns

47a239f

Revise unit tests for missRanger()

aa29d20

mayer79 added 4 commits July 21, 2024 20:57

Slightly different example in missRanger()

77037c1

add unit test for verbosity

12e75f2

improve test coverage for pmm()

a5e5947

Trying to not fail a random error in mac unit tests

b80ea4a

mayer79 merged commit 2af3e44 into main Jul 21, 2024
7 checks passed

mayer79 deleted the refactor-datatype-logic branch July 21, 2024 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor datatype logic #68

Refactor datatype logic #68

mayer79 commented Jul 21, 2024 •

edited

Loading

codecov-commenter commented Jul 21, 2024 •

edited

Loading

Refactor datatype logic #68

Refactor datatype logic #68

Conversation

mayer79 commented Jul 21, 2024 • edited Loading

codecov-commenter commented Jul 21, 2024 • edited Loading

Codecov Report

mayer79 commented Jul 21, 2024 •

edited

Loading

codecov-commenter commented Jul 21, 2024 •

edited

Loading