Unexpected differences in `cur_group_id()` operations in vs. outside `mutate` calls #6889

Nick-Eagles · 2023-07-21T20:17:49Z

Hello, I'm encountering unexpected behavior from cur_group_id(). In particular, I expected that operations within a mutate() call surrounding cur_group_id() would treat the output of cur_group_id() as a vector along the rows of the tibble.

In the below reprex, calling duplicated(cur_group_id()) in the mutate call gives a different result than calling cur_group_id() in the mutate call, extracting the result, and then calling duplicated.

library(tidyverse)
data(mtcars)

#   Pick a column to group by (that has repeated values)
mt_tibble = as_tibble(mtcars) |>
    group_by(cyl)

#  Call 'duplicated' outside the 'mutate' call
mt_tibble |>
    mutate(gid = cur_group_id()) |>
    pull(gid) |>
    duplicated() |>
    table()

## FALSE  TRUE 
##     3    29

# Call 'duplicated' inside 'mutate' call
mt_tibble |>
    mutate(gid = duplicated(cur_group_id())) |>
    pull(gid) |>
    table()

## FALSE 
##    32

I'm interested to know if I'm fundamentally misusing cur_group_id(), since I'm not very experienced with it, but I also think that if this is considered expected behavior, it is probably counterintuitive for many users, and seems inconsistent with other dplyr behavior (for example, mutate(n = n() * 2) is a perfectly valid operation that elementwise doubles the values output by n())

Thanks!

Best,

-Nick

Session info (using the latest dplyr 1.1.2):

─ Session info ────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 Patched (2023-07-21 r84719)
 os       CentOS Linux 7 (Core)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       US/Eastern
 date     2023-07-21
 pandoc   3.1.1 @ /jhpce/shared/jhpce/core/conda/miniconda3-4.11.0/envs/svnR-4.3/bin/pandoc

─ Packages ────────────────────────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cli           3.6.1   2023-03-23 [2] CRAN (R 4.3.0)
 colorspace    2.1-0   2023-01-23 [2] CRAN (R 4.3.0)
 dplyr       * 1.1.2   2023-04-20 [2] CRAN (R 4.3.0)
 fansi         1.0.4   2023-01-22 [2] CRAN (R 4.3.0)
 forcats     * 1.0.0   2023-01-29 [2] CRAN (R 4.3.0)
 generics      0.1.3   2022-07-05 [2] CRAN (R 4.3.0)
 ggplot2     * 3.4.2   2023-04-03 [2] CRAN (R 4.3.0)
 glue          1.6.2   2022-02-24 [2] CRAN (R 4.3.0)
 gtable        0.3.3   2023-03-21 [2] CRAN (R 4.3.0)
 hms           1.1.3   2023-03-21 [2] CRAN (R 4.3.0)
 lifecycle     1.0.3   2022-10-07 [2] CRAN (R 4.3.0)
 lubridate   * 1.9.2   2023-02-10 [2] CRAN (R 4.3.0)
 magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.3.0)
 munsell       0.5.0   2018-06-12 [2] CRAN (R 4.3.0)
 pillar        1.9.0   2023-03-22 [2] CRAN (R 4.3.0)
 pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.3.0)
 purrr       * 1.0.1   2023-01-10 [2] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [2] CRAN (R 4.3.0)
 readr       * 2.1.4   2023-02-10 [2] CRAN (R 4.3.0)
 rlang         1.1.1   2023-04-28 [2] CRAN (R 4.3.0)
 scales        1.2.1   2022-08-20 [2] CRAN (R 4.3.0)
 sessioninfo * 1.2.2   2021-12-06 [2] CRAN (R 4.3.0)
 stringi       1.7.12  2023-01-11 [2] CRAN (R 4.3.0)
 stringr     * 1.5.0   2022-12-02 [2] CRAN (R 4.3.0)
 tibble      * 3.2.1   2023-03-20 [2] CRAN (R 4.3.0)
 tidyr       * 1.3.0   2023-01-24 [2] CRAN (R 4.3.0)
 tidyselect    1.2.0   2022-10-10 [2] CRAN (R 4.3.0)
 tidyverse   * 2.0.0   2023-02-22 [2] CRAN (R 4.3.0)
 timechange    0.2.0   2023-01-11 [2] CRAN (R 4.3.0)
 tzdb          0.4.0   2023-05-12 [2] CRAN (R 4.3.0)
 utf8          1.2.3   2023-01-31 [2] CRAN (R 4.3.0)
 vctrs         0.6.3   2023-06-14 [2] CRAN (R 4.3.1)
 withr         2.5.0   2022-03-03 [2] CRAN (R 4.3.0)

 [1] /users/neagles/R/4.3
 [2] /jhpce/shared/jhpce/core/conda/miniconda3-4.11.0/envs/svnR-4.3/R/4.3/lib64/R/site-library
 [3] /jhpce/shared/jhpce/core/conda/miniconda3-4.11.0/envs/svnR-4.3/R/4.3/lib64/R/library

The text was updated successfully, but these errors were encountered:

DavisVaughan · 2023-07-22T15:19:55Z

cur_group_id() returns a single value, the current group id. The duplicated(cur_group_id()) expression is called 3 times, once for each cyl group, so it basically gets called as duplicated(1), duplicated(2), and duplicated(3), all of which return FALSE, and then that FALSE is recycled to the size of each group.

So nothing is really wrong here, but I don't think this is a common usage of duplicated() or cur_group_id().

This question may be better for Posit Community https://community.rstudio.com/

DavisVaughan closed this as completed Jul 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected differences in `cur_group_id()` operations in vs. outside `mutate` calls #6889

Unexpected differences in `cur_group_id()` operations in vs. outside `mutate` calls #6889

Nick-Eagles commented Jul 21, 2023

DavisVaughan commented Jul 22, 2023

Unexpected differences in cur_group_id() operations in vs. outside mutate calls #6889

Unexpected differences in cur_group_id() operations in vs. outside mutate calls #6889

Comments

Nick-Eagles commented Jul 21, 2023

DavisVaughan commented Jul 22, 2023

Unexpected differences in `cur_group_id()` operations in vs. outside `mutate` calls #6889

Unexpected differences in `cur_group_id()` operations in vs. outside `mutate` calls #6889