Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row_number() does not repsect .by= in mutate #7075

Closed
ggrothendieck opened this issue Aug 26, 2024 · 1 comment
Closed

row_number() does not repsect .by= in mutate #7075

ggrothendieck opened this issue Aug 26, 2024 · 1 comment

Comments

@ggrothendieck
Copy link

row_number() ignores .by= in mutate

dat <- data.frame(x = head(letters, 6), y = LETTERS[1:2])
dat %>% 
  mutate(z = first(row_number()), .by = y)
##   x y z
## 1 a A 1
## 2 b B 1
## 3 c A 1
## 4 d B 1
## 5 e A 1
## 6 f B 1

I would have expected the same output as

dat %>% 
  mutate(r = row_number()) %>%
  mutate(z = first(r), .by = y) %>%
  select(-r)
##   x y z
## 1 a A 1
## 2 b B 2
## 3 c A 1
## 4 d B 2
## 5 e A 1
## 6 f B 2
@DavisVaughan
Copy link
Member

Everything looks to be working as expected here.

In this case, the first mutate generates 1:6 because it is ungrouped. The second mutate calls first() 2 times, once on the vector c(1, 3, 5), i.e. the A group, and once on c(2, 4, 6), i.e. the B group. So you get 1 and 2 as your results, recycled to the group size.

dat %>% 
  mutate(r = row_number()) %>%
  mutate(z = first(r), .by = y)
#>   x y r z
#> 1 a A 1 1
#> 2 b B 2 2
#> 3 c A 3 1
#> 4 d B 4 2
#> 5 e A 5 1
#> 6 f B 6 2

In this case, row_number() is computed 2 times, first for the A group of y, so you get c(1, 2, 3). And then again for the B group of y, so you again get c(1, 2, 3) within that group. Then you just take the first() of both of those vectors which is why you see 1 everywhere.

dat %>% 
  mutate(z = first(row_number()), .by = y)
#>   x y z
#> 1 a A 1
#> 2 b B 1
#> 3 c A 1
#> 4 d B 1
#> 5 e A 1
#> 6 f B 1

It doesn't have anything to do with .by, this is also how group_by() has always worked with row_number()

dat %>% 
  group_by(y) %>%
  mutate(z = row_number())
#> # A tibble: 6 × 3
#> # Groups:   y [2]
#>   x     y         z
#>   <chr> <chr> <int>
#> 1 a     A         1
#> 2 b     B         1
#> 3 c     A         2
#> 4 d     B         2
#> 5 e     A         3
#> 6 f     B         3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants