-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
factor lump for ordinal factors #28
Comments
Hmmmm, this seems like it will need quite a different algorithm to |
I have a related use case where I want to symmetrically collapse ordered factors, grouping the top and bottom n levels. Say, converting responses on a Likert survey item into groups of hi-med-lo. Rafael's proposal might partially meet my needs. It feels like a new function separate from |
Sounds like there is some renewed interest in this. As a general approach, the same formals can be kept as The formals would boil down to: ord_lump(f, n, prop, from = c("left", "right"), other_level = "Other") with the following behaviour: (x <- factor(LETTERS[1:8], ordered = TRUE))
#> [1] A B C D E F G H
#> Levels: A < B < C < D < E < F < G < H
ord_lump(x, 3, from = "right")
#> [1] A B C Other Other Other Other Other
#> Levels: A < B < C < Other
ord_lump(x, -3, from = "right")
#> [1] A B C D E Other Other Other
#> Levels: A < B < C < D < E < Other
ord_lump(x, 3, from = "left")
#> [1] Other Other Other D E F G H
#> Levels: Other < D < E < F < G < H
ord_lump(x, prop = 0.25, from = "left")
#> [1] Other Other C D E F G H
#> Levels: Other < C < D < E < F < G < H
# unordered factor
y <- factor(LETTERS[1:8])
ord_lump(y, 3, from = "left")
#> [1] Other Other Other D E F G H
#> Levels: Other D E F G H @sfirke's use case might be tricky to implement intuitively, since the lumping would need to occur from both sides, and a tie-breaking rule chosen when the middle interval is even. Maybe best left as a two-step process relying on a lumping function? |
I could see that Then using data from the above example, here's how I imagine it:
If that's too complex, I think I could hack what I want together from multiple chained |
I was interested in this behaviour for tabular data, where I want to be able to collapse levels with small counts into the level closest to them (as a privacy/disclosure control) protection and found this discussion. In case someone else comes here with a similar problem, please see below for some crude code.
|
Hi Hadley,
Do you have thoughts on creating an analogue (or generic) of
fct_lump()
for ordinal factors?The utility I am looking for is the ability to keep contiguous levels together. A simple solution for this could be to lump ordinal levels directionally (either from the left or right), instead of by frequency.
Here is an example of the current behaviour:
I imagine that in many cases, lumping categories
a
andc
such that they are placed aboveb
,d
, ande
will not created a desired result.Here is what I have in mind for
ord_lump()
:If you think this is worth it, I can propose a PR. An alternative would be to work the change into
fct_lump
by addingtype = c("freq", "left", "right")
into the formals. Let me know what you think.Cheers,
Rafael
The text was updated successfully, but these errors were encountered: