Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine select inline criteria to keep arranged computed columns #1446

Merged
merged 6 commits into from
Mar 4, 2024

Conversation

ejneer
Copy link
Contributor

@ejneer ejneer commented Feb 7, 2024

Fixes #1437

This will stop arranged computed columns from being inlined away from subqueries. Of course the subquery ORDER BY warning will be thrown, but I think that is expected behavior.

Before:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

db <- lazy_frame(x = 1, y = 2) 

db %>% 
  mutate(z = 1) %>% 
  arrange(x, z) %>%
  select(x)
#> <SQL>
#> SELECT `x`
#> FROM `df`
#> ORDER BY `x`, `z`

Created on 2024-02-07 with reprex v2.0.2

After:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

db <- lazy_frame(x = 1, y = 2) 

db %>% 
  mutate(z = 1) %>% 
  arrange(x, z) %>%
  select(x)
#> Warning: ORDER BY is ignored in subqueries without LIMIT
#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
#> <SQL>
#> SELECT `x`
#> FROM (
#>   SELECT `df`.*, 1.0 AS `z`
#>   FROM `df`
#> ) AS `q01`

Created on 2024-02-07 with reprex v2.0.2

R/verb-select.R Outdated Show resolved Hide resolved
R/verb-select.R Outdated Show resolved Hide resolved
@hadley hadley requested a review from mgirlich February 14, 2024 18:43
@hadley
Copy link
Member

hadley commented Feb 14, 2024

Thanks for taking a stab at this!

@hadley
Copy link
Member

hadley commented Feb 20, 2024

@mgirlich could you please take a look at this PR too?

R/verb-select.R Outdated Show resolved Hide resolved
is_bijective_projection <- function(vars, names_prev) {
vars <- unname(vars)
identical(sort(vars), names_prev)
select_can_be_inlined <- function(lazy_query, vars) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few tweaks here to hopefully make the logic a bit clearer.

lf <- lazy_frame(x = 1)

# shouldn't inline
out <- lf %>% mutate(z = 2) %>% arrange(x, z) %>% select(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is testing the right thing because when I run it, I see:

SELECT `x`
FROM (
  SELECT `df`.*, 2.0 AS `z`
  FROM `df`
) AS `q01`
Warning message:
ORDER BY is ignored in subqueries without LIMIT
ℹ Do you need to move arrange() later in the pipeline or use window_order() instead? 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe that is the expected behaviour? But definitely worth a snapshot test, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is expected. I've re-added the snapshot I removed in my prior commit.

@hadley hadley merged commit 0739c75 into tidyverse:main Mar 4, 2024
13 checks passed
@hadley
Copy link
Member

hadley commented Mar 4, 2024

Thanks for your work on this @ejneer — it's much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

select() fails after specific sequence of dplyr commands
3 participants