docs: clarify how variables are filtered out in step_corr() #1518
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I was curious on how step_corr() was filtering out variables and found one answer here by Max https://forum.posit.co/t/how-does-step-corr-pick-which-variable-to-keep/46496/3
Thought I'd add some of the prose into the
help(step_corr)
page in this commit/PR so others know a little bit about how it works.I was also curious about how it actually chooses, and went into the code itself https://github.com/tidymodels/recipes/blob/602dd48ecfe95535457ee422d856054a45833df4/R/corr.R#L248C1-L257C36:
It took a bit to figure out that it really is kind of random (it is based on the "meaningless" numeric value assigned to levels of factors). I didn't want to add all the details of this matrix filtering in the help documentation, so I left code comments in there in case anyone else wants to read and understand. Feel free to suggest remove these comments if it feels too unnecessary.
Thanks!