You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm encountering unexpected behavior from cur_group_id(). In particular, I expected that operations within a mutate() call surrounding cur_group_id() would treat the output of cur_group_id() as a vector along the rows of the tibble.
In the below reprex, calling duplicated(cur_group_id()) in the mutate call gives a different result than calling cur_group_id() in the mutate call, extracting the result, and then calling duplicated.
library(tidyverse)
data(mtcars)
# Pick a column to group by (that has repeated values)mt_tibble= as_tibble(mtcars) |>
group_by(cyl)
# Call 'duplicated' outside the 'mutate' callmt_tibble|>
mutate(gid= cur_group_id()) |>
pull(gid) |>
duplicated() |>
table()
## FALSE TRUE ## 3 29# Call 'duplicated' inside 'mutate' callmt_tibble|>
mutate(gid= duplicated(cur_group_id())) |>
pull(gid) |>
table()
## FALSE ## 32
I'm interested to know if I'm fundamentally misusing cur_group_id(), since I'm not very experienced with it, but I also think that if this is considered expected behavior, it is probably counterintuitive for many users, and seems inconsistent with other dplyr behavior (for example, mutate(n = n() * 2) is a perfectly valid operation that elementwise doubles the values output by n())
cur_group_id() returns a single value, the current group id. The duplicated(cur_group_id()) expression is called 3 times, once for each cyl group, so it basically gets called as duplicated(1), duplicated(2), and duplicated(3), all of which return FALSE, and then that FALSE is recycled to the size of each group.
So nothing is really wrong here, but I don't think this is a common usage of duplicated() or cur_group_id().
Hello, I'm encountering unexpected behavior from
cur_group_id()
. In particular, I expected that operations within amutate()
call surroundingcur_group_id()
would treat the output ofcur_group_id()
as a vector along the rows of the tibble.In the below reprex, calling
duplicated(cur_group_id())
in themutate
call gives a different result than callingcur_group_id()
in themutate
call, extracting the result, and then callingduplicated
.I'm interested to know if I'm fundamentally misusing
cur_group_id()
, since I'm not very experienced with it, but I also think that if this is considered expected behavior, it is probably counterintuitive for many users, and seems inconsistent with otherdplyr
behavior (for example,mutate(n = n() * 2)
is a perfectly valid operation that elementwise doubles the values output byn()
)Thanks!
Best,
-Nick
Session info (using the latest
dplyr
1.1.2):The text was updated successfully, but these errors were encountered: