Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbplyr across() behavior differs from dplyr in a grouped context #1493

Closed
lschneiderbauer opened this issue Apr 30, 2024 · 1 comment · Fixed by #1494
Closed

dbplyr across() behavior differs from dplyr in a grouped context #1493

lschneiderbauer opened this issue Apr 30, 2024 · 1 comment · Fixed by #1494

Comments

@lschneiderbauer
Copy link

When using across() within a grouped context, everything() only selects all non-grouped variables. This is also documented in the code example of group_cols():

# Remove the grouping variables from mutate selections:
gdf %>% mutate_at(vars(-group_cols()), `/`, 100)
# -> No longer necessary with across()
gdf %>% mutate(across(everything(), ~ . / 100))

And not surprisingly that is also what happens:

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <-
  tibble(
    id = c(1),
    y = c("a"),
    z = c("a")
  )

df |> 
  summarize(
    across(everything(), \(x) "test"),
    .by = id
  )
#> # A tibble: 1 × 3
#>      id y     z    
#>   <dbl> <chr> <chr>
#> 1     1 test  test

Created on 2024-04-30 with reprex v2.1.0

When using the exact same verbs with the dbplyr backend, everthing() really seems to select every column instead:

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attache Paket: 'dbplyr'
#> Die folgenden Objekte sind maskiert von 'package:dplyr':
#> 
#>     ident, sql

df <-
  tibble(
    id = c(1),
    y = c("a"),
    z = c("a")
  )


tbl_lazy(df, dbplyr::simulate_dbi()) |> 
  summarize(
    across(everything(), \(x) "test"),
    .by = id
  )
#> <SQL>
#> SELECT 'test' AS `id`, 'test' AS `y`, 'test' AS `z`
#> FROM `df`
#> GROUP BY `id`

Created on 2024-04-30 with reprex v2.1.0

I expect this to result in

#> <SQL>
#> SELECT `id`, 'test' AS `y`, 'test' AS `z`
#> FROM `df`
#> GROUP BY `id`
@mgirlich
Copy link
Collaborator

mgirlich commented May 2, 2024

Thanks for reporting. dbplyr handled groups in across() already correctly but not for summarise(.by).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants