Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes in dbplyr 2.5.0 #1480

Closed
catalamarti opened this issue Mar 20, 2024 · 9 comments
Closed

Breaking changes in dbplyr 2.5.0 #1480

catalamarti opened this issue Mar 20, 2024 · 9 comments

Comments

@catalamarti
Copy link

hi @hadley @mgirlich

the last release of dbplyr (2.5.0) broke the following packages:

It is a difficult fix that requires to come up with a different approach and implement it in all the packages before they are kick out of cran. Unless dbplyr goes back to the prior behavior on time.

I will try to explain with a series of reprex:

library(DBI)
#> Warning: package 'DBI' was built under R version 4.2.3
library(CDMConnector)
#> Warning: package 'CDMConnector' was built under R version 4.2.3
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(duckdb)


con <- DBI::dbConnect(duckdb(), path = ":memory:")
test_data <- data.frame(person = 1L,
                        date_1 = as.Date("2001-01-01"))
db_test_data <- copy_to(con, test_data, overwrite = TRUE)
db_test_data <- db_test_data  %>%
  dplyr::mutate(date_2 = date_1 + years(1)) 

year_req <- 1

# without CDMConnector::dateadd it works
priorHistoryDates <- glue::glue('date_1 + years({year_req})') %>%
  rlang::parse_exprs() %>%
  rlang::set_names(glue::glue("date_with_prior_history_{year_req}"))
db_test_data %>% 
  mutate(!!!priorHistoryDates)
#> # Source:   SQL [1 x 4]
#> # Database: DuckDB v0.9.2 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#>   person date_1     date_2     date_with_prior_history_1
#>    <int> <date>     <date>     <date>                   
#> 1      1 2001-01-01 2002-01-01 2002-01-01

# with CDMConnector::dateadd it does not
priorHistoryDates <- glue::glue('CDMConnector::dateadd("date_2",
                      {year_req}, interval = "year")') %>%
  rlang::parse_exprs() %>%
  rlang::set_names(glue::glue("date_with_prior_history_{year_req}"))
db_test_data %>% 
  mutate(!!!priorHistoryDates)
#> Error in `CDMConnector::dateadd()`:
#> ! No known SQL translation
#> Backtrace:
#>      ▆
#>   1. ├─base::tryCatch(...)
#>   2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   3. │   ├─base (local) tryCatchOne(...)
#>   4. │   │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   5. │   └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>   6. │     └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   7. │       └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   8. ├─base::withCallingHandlers(...)
#>   9. ├─base::saveRDS(...)
#>  10. ├─base::do.call(...)
#>  11. ├─base (local) `<fn>`(...)
#>  12. ├─global `<fn>`(input = base::quote("soot-cub_reprex.R"))
#>  13. │ └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#>  14. │   └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#>  15. │     └─knitr:::process_file(text, output)
#>  16. │       ├─knitr:::handle_error(...)
#>  17. │       │ └─base::withCallingHandlers(...)
#>  18. │       ├─base::withCallingHandlers(...)
#>  19. │       ├─knitr:::process_group(group)
#>  20. │       └─knitr:::process_group.block(group)
#>  21. │         └─knitr:::call_block(x)
#>  22. │           └─knitr:::block_exec(params)
#>  23. │             └─knitr:::eng_r(options)
#>  24. │               ├─knitr:::in_input_dir(...)
#>  25. │               │ └─knitr:::in_dir(input_dir(), expr)
#>  26. │               └─knitr (local) evaluate(...)
#>  27. │                 └─evaluate::evaluate(...)
#>  28. │                   └─evaluate:::evaluate_call(...)
#>  29. │                     ├─evaluate (local) handle(...)
#>  30. │                     │ └─base::try(f, silent = TRUE)
#>  31. │                     │   └─base::tryCatch(...)
#>  32. │                     │     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  33. │                     │       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  34. │                     │         └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>  35. │                     ├─base::withCallingHandlers(...)
#>  36. │                     ├─base::withVisible(value_fun(ev$value, ev$visible))
#>  37. │                     └─knitr (local) value_fun(ev$value, ev$visible)
#>  38. │                       └─knitr (local) fun(x, options = options)
#>  39. │                         ├─base::withVisible(knit_print(x, ...))
#>  40. │                         ├─knitr::knit_print(x, ...)
#>  41. │                         └─rmarkdown:::knit_print.tbl_sql(x, ...)
#>  42. │                           ├─context$df_print(x)
#>  43. │                           └─dbplyr:::print.tbl_sql(x)
#>  44. │                             ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
#>  45. │                             │ ├─base::cat(paste0(..., "\n"), sep = "")
#>  46. │                             │ └─base::paste0(..., "\n")
#>  47. │                             ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
#>  48. │                             └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
#>  49. │                               └─pillar:::format_tbl(...)
#>  50. │                                 └─pillar::tbl_format_setup(...)
#>  51. │                                   ├─pillar:::tbl_format_setup_dispatch(...)
#>  52. │                                   └─pillar:::tbl_format_setup.tbl(...)
#>  53. │                                     └─pillar:::df_head(x, n + 1)
#>  54. │                                       ├─base::as.data.frame(head(x, n))
#>  55. │                                       └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
#>  56. │                                         ├─base::as.data.frame(collect(x, n = n))
#>  57. │                                         ├─dplyr::collect(x, n = n)
#>  58. │                                         └─dbplyr:::collect.tbl_sql(x, n = n)
#>  59. │                                           └─dbplyr::db_sql_render(x$src$con, x, cte = cte)
#>  60. │                                             ├─dbplyr::db_sql_render(con, sql, ..., sql_options = sql_options)
#>  61. │                                             └─dbplyr:::db_sql_render.DBIConnection(con, sql, ..., sql_options = sql_options)
#>  62. │                                               ├─dbplyr::sql_render(sql, con = con, ..., sql_options = sql_options)
#>  63. │                                               └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., sql_options = sql_options)
#>  64. │                                                 ├─dbplyr::sql_render(...)
#>  65. │                                                 └─dbplyr:::sql_render.lazy_query(...)
#>  66. │                                                   ├─dbplyr::sql_build(query, con = con, sql_options = sql_options)
#>  67. │                                                   └─dbplyr:::sql_build.lazy_select_query(query, con = con, sql_options = sql_options)
#>  68. │                                                     └─dbplyr:::get_select_sql(...)
#>  69. │                                                       └─dbplyr:::translate_select_sql(con, select)
#>  70. │                                                         └─dbplyr::translate_sql_(...)
#>  71. │                                                           └─base::lapply(...)
#>  72. │                                                             └─dbplyr (local) FUN(X[[i]], ...)
#>  73. │                                                               ├─dbplyr::escape(eval_tidy(x, mask), con = con)
#>  74. │                                                               └─rlang::eval_tidy(x, mask)
#>  75. └─CDMConnector::dateadd
#>  76.   └─cli::cli_abort(...)
#>  77.     └─rlang::abort(...)
db_test_data %>% 
  mutate(!!!priorHistoryDates) |> 
  dplyr::show_query()
#> Error in `CDMConnector::dateadd()`:
#> ! No known SQL translation
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::show_query(db_test_data %>% mutate(!!!priorHistoryDates))
#>   2. ├─dbplyr:::show_query.tbl_lazy(db_test_data %>% mutate(!!!priorHistoryDates))
#>   3. │ └─dbplyr::remote_query(x, cte = cte, sql_options = sql_options)
#>   4. │   └─dbplyr::db_sql_render(remote_con(x), x, cte = cte, sql_options = sql_options)
#>   5. │     ├─dbplyr::db_sql_render(con, sql, ..., sql_options = sql_options)
#>   6. │     └─dbplyr:::db_sql_render.DBIConnection(con, sql, ..., sql_options = sql_options)
#>   7. │       ├─dbplyr::sql_render(sql, con = con, ..., sql_options = sql_options)
#>   8. │       └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., sql_options = sql_options)
#>   9. │         ├─dbplyr::sql_render(...)
#>  10. │         └─dbplyr:::sql_render.lazy_query(...)
#>  11. │           ├─dbplyr::sql_build(query, con = con, sql_options = sql_options)
#>  12. │           └─dbplyr:::sql_build.lazy_select_query(query, con = con, sql_options = sql_options)
#>  13. │             └─dbplyr:::get_select_sql(...)
#>  14. │               └─dbplyr:::translate_select_sql(con, select)
#>  15. │                 └─dbplyr::translate_sql_(...)
#>  16. │                   └─base::lapply(...)
#>  17. │                     └─dbplyr (local) FUN(X[[i]], ...)
#>  18. │                       ├─dbplyr::escape(eval_tidy(x, mask), con = con)
#>  19. │                       └─rlang::eval_tidy(x, mask)
#>  20. └─CDMConnector::dateadd
#>  21.   └─cli::cli_abort(...)
#>  22.     └─rlang::abort(...)

# without explictly calling CDMConnector
priorHistoryDates <- glue::glue('dateadd("date_2",
                      {year_req}, interval = "year")') %>%
  rlang::parse_exprs() %>%
  rlang::set_names(glue::glue("date_with_prior_history_{year_req}"))
db_test_data %>% 
  mutate(!!!priorHistoryDates)
#> Warning: Named arguments ignored for SQL dateadd
#> Error in `collect()`:
#> ! Failed to collect lazy table.
#> Caused by error:
#> ! Parser Error: syntax error at or near "AS"
#> LINE 3:   dateadd('date_2', 1.0, 'year' AS "interval") AS date_...
#>                                         ^
#> Backtrace:
#>      ▆
#>   1. ├─base::tryCatch(...)
#>   2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   3. │   ├─base (local) tryCatchOne(...)
#>   4. │   │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   5. │   └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>   6. │     └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   7. │       └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   8. ├─base::withCallingHandlers(...)
#>   9. ├─base::saveRDS(...)
#>  10. ├─base::do.call(...)
#>  11. ├─base (local) `<fn>`(...)
#>  12. ├─global `<fn>`(input = base::quote("soot-cub_reprex.R"))
#>  13. │ └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#>  14. │   └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#>  15. │     └─knitr:::process_file(text, output)
#>  16. │       ├─knitr:::handle_error(...)
#>  17. │       │ └─base::withCallingHandlers(...)
#>  18. │       ├─base::withCallingHandlers(...)
#>  19. │       ├─knitr:::process_group(group)
#>  20. │       └─knitr:::process_group.block(group)
#>  21. │         └─knitr:::call_block(x)
#>  22. │           └─knitr:::block_exec(params)
#>  23. │             └─knitr:::eng_r(options)
#>  24. │               ├─knitr:::in_input_dir(...)
#>  25. │               │ └─knitr:::in_dir(input_dir(), expr)
#>  26. │               └─knitr (local) evaluate(...)
#>  27. │                 └─evaluate::evaluate(...)
#>  28. │                   └─evaluate:::evaluate_call(...)
#>  29. │                     ├─evaluate (local) handle(...)
#>  30. │                     │ └─base::try(f, silent = TRUE)
#>  31. │                     │   └─base::tryCatch(...)
#>  32. │                     │     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  33. │                     │       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  34. │                     │         └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>  35. │                     ├─base::withCallingHandlers(...)
#>  36. │                     ├─base::withVisible(value_fun(ev$value, ev$visible))
#>  37. │                     └─knitr (local) value_fun(ev$value, ev$visible)
#>  38. │                       └─knitr (local) fun(x, options = options)
#>  39. │                         ├─base::withVisible(knit_print(x, ...))
#>  40. │                         ├─knitr::knit_print(x, ...)
#>  41. │                         └─rmarkdown:::knit_print.tbl_sql(x, ...)
#>  42. │                           ├─context$df_print(x)
#>  43. │                           └─dbplyr:::print.tbl_sql(x)
#>  44. │                             ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
#>  45. │                             │ ├─base::cat(paste0(..., "\n"), sep = "")
#>  46. │                             │ └─base::paste0(..., "\n")
#>  47. │                             ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
#>  48. │                             └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
#>  49. │                               └─pillar:::format_tbl(...)
#>  50. │                                 └─pillar::tbl_format_setup(...)
#>  51. │                                   ├─pillar:::tbl_format_setup_dispatch(...)
#>  52. │                                   └─pillar:::tbl_format_setup.tbl(...)
#>  53. │                                     └─pillar:::df_head(x, n + 1)
#>  54. │                                       ├─base::as.data.frame(head(x, n))
#>  55. │                                       └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
#>  56. │                                         ├─base::as.data.frame(collect(x, n = n))
#>  57. │                                         ├─dplyr::collect(x, n = n)
#>  58. │                                         └─dbplyr:::collect.tbl_sql(x, n = n)
#>  59. │                                           ├─base::withCallingHandlers(...)
#>  60. │                                           ├─dbplyr::db_collect(...)
#>  61. │                                           └─dbplyr:::db_collect.DBIConnection(...)
#>  62. │                                             ├─DBI::dbSendQuery(con, sql)
#>  63. │                                             └─duckdb::dbSendQuery(con, sql)
#>  64. │                                               └─duckdb (local) .local(conn, statement, ...)
#>  65. │                                                 └─duckdb:::rapi_prepare(conn@conn_ref, statement)
#>  66. └─base::.handleSimpleError(...)
#>  67.   └─dbplyr (local) h(simpleError(msg, call))
#>  68.     └─cli::cli_abort("Failed to collect lazy table.", parent = cnd)
#>  69.       └─rlang::abort(...)
db_test_data %>% 
  mutate(!!!priorHistoryDates) |> 
  dplyr::show_query()
#> Warning: Named arguments ignored for SQL dateadd
#> <SQL>
#> SELECT
#>   q01.*,
#>   dateadd('date_2', 1.0, 'year' AS "interval") AS date_with_prior_history_1
#> FROM (
#>   SELECT test_data.*, date_1 + TO_YEARS(CAST(1.0 AS INTEGER)) AS date_2
#>   FROM test_data
#> ) q01

Created on 2024-03-20 with reprex v2.0.2

dateadd it is a function that we have in CDMConnector to provide the translation to add a number to a date in different dbms:
https://github.com/darwin-eu/CDMConnector/blob/105b9d3df7e3fb2e66a14928e51557c22e8881ca/R/dateadd.R#L25-L65

The long term plan is to contribute to dbplyr as @ablack3 did here #1357 so long term this utility functions are not needed. But for the moment while it is not implemented in all dbms we need a work around to make this work.

@edgararuiz
Copy link
Collaborator

sparklyr is having the same issues when printing to console, so far I've tracked it down to head() producing the table.* call. It looks like when dbplyr pulls the first row, I guess to get the column names, that call triggers either a head() call or some lower level call in that same vein, which in turn outputs the error

Here is the issue in sparklyr: sparklyr/sparklyr#3429

Reprex:

library(sparklyr)
packageVersion("dbplyr")
#> [1] '2.5.0'
sc <- spark_connect("local", version = "3.3.0") 
tbl_mtcars <- copy_to(sc, mtcars)
dbplyr::remote_query(head(tbl_mtcars, 1))
#> <SQL> SELECT `mtcars`.`*`
#> FROM `mtcars`
#> LIMIT 1

@mgirlich
Copy link
Collaborator

mgirlich commented Mar 22, 2024

@edgararuiz Are you saying that spark only supports * when it's not qualified? I.e. this would work

SELECT `*`
FROM `mtcars`
LIMIT 1

but not this

SELECT `mtcars`.`*`
FROM `mtcars`
LIMIT 1

(the relevant change would then come from #1278)

@mgirlich
Copy link
Collaborator

@catalamarti I think your reprex boils down to

library(DBI)
library(CDMConnector)
library(dplyr)
library(duckdb)


con <- DBI::dbConnect(duckdb(), path = ":memory:")
test_data <- data.frame(person = 1L,
                        date_1 = as.Date("2001-01-01"))
db_test_data <- copy_to(con, test_data, overwrite = TRUE)
db_test_data <- db_test_data  %>%
  dplyr::mutate(date_2 = date_1 + years(1)) 

db_test_data %>% 
  mutate(date_with_prior_history_1 = CDMConnector::dateadd("date_2", 1, interval = "year"))

but if you replace the last line with

db_test_data %>% 
  mutate(date_with_prior_history_1 = !!CDMConnector::dateadd("date_2", 1, interval = "year"))

it works. This is similar to what you do in your examples. Would that work for you?
@hadley this only worked in dbplyr < 2.5.0 due to qualifying the function with the package, i.e. using CDMConnector::dateadd instead of dateadd.

@catalamarti
Copy link
Author

hi @mgirlich so yes your approach works and this works for the usual use of the function.
But in packages where I want to put several variables in the same mutate (to generate efficient sql) depending on external variables I can no longer use !!! function as it breaks.
reprex:

library(DBI)
library(CDMConnector)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(duckdb)


con <- DBI::dbConnect(duckdb(), path = ":memory:")
test_data <- data.frame(person = 1L,
                        date_1 = as.Date("2001-01-01"))
db_test_data <- copy_to(con, test_data, overwrite = TRUE)
db_test_data <- db_test_data  %>%
  dplyr::mutate(date_2 = date_1 + years(1)) 

# This works
db_test_data %>% 
  mutate(date_with_prior_history_1 = !!CDMConnector::dateadd("date_2", 1, interval = "year"))
#> # Source:   SQL [1 x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#>   person date_1     date_2              date_with_prior_history_1
#>    <int> <date>     <dttm>              <dttm>                   
#> 1      1 2001-01-01 2002-01-01 00:00:00 2003-01-01 00:00:00

# This doesn't
x <- rlang::parse_exprs("CDMConnector::dateadd('date_2', 1, interval = 'year')") %>%
  rlang::set_names(glue::glue("date_with_prior_history_1"))
db_test_data %>% 
  mutate(!!!x)
#> Error in `CDMConnector::dateadd()`:
#> ! No known SQL translation

# Use case
ph <- 1:3
x <- glue::glue("CDMConnector::dateadd('date_2', {ph}, interval = 'year')") %>%
  rlang::parse_exprs() %>%
  rlang::set_names(glue::glue("date_with_prior_history_{ph}"))
db_test_data %>% 
  mutate(!!!x)
#> Error in `CDMConnector::dateadd()`:
#> ! No known SQL translation

Created on 2024-03-22 with reprex v2.1.0

Sorry for not providing more context before

@mgirlich
Copy link
Collaborator

You can simply wrap the thing with local():

x <- rlang::parse_exprs("local(CDMConnector::dateadd('date_2', 1, interval = 'year'))") %>%
  rlang::set_names(glue::glue("date_with_prior_history_1"))
db_test_data %>% 
  mutate(!!!x)

@catalamarti
Copy link
Author

Thank you very much @mgirlich that works for us, we will amend the packages.
FYI: @edward-burn @ablack3 @ilovemane @MimiYuchenGuo ...

@edgararuiz
Copy link
Collaborator

@mgirlich - Correct, the back ticks cause an issue. But at this time, I made a change in sparklyr that prevents the failure. I matched sparklyr's dbQuoteIdentifier() to what the other methods do, which is to return x as-is if x is SQL class. This is probably why other backends did not have that issue.

https://github.com/sparklyr/sparklyr/blob/da3b235546503de00c22e3e7fed5b9ebd94e4180/R/dbi_spark_connection.R#L63

@hadley
Copy link
Member

hadley commented Apr 1, 2024

@catalamarti many apologies for this screw up — I had spotted the problems in my revdep checks but I incorrectly attributed them to a problem that I thought I had fixed in CDMConnector. I know it's miserable to receive this sort of email from CRAN and I'm sorry my mistake has caused extra work for you 😞.

(And apologies for taking so long to respond to this thread; in hindsight it was clearly a bad idea to submit dbplyr to CRAN just before I left for two weeks vacation 😬)

@catalamarti
Copy link
Author

Thank you very much @mgirlich @hadley for me we can close the issue.

@hadley hadley closed this as completed Apr 10, 2024
meztez added a commit to Boostao/SSGBM-VRI-BEM that referenced this issue May 10, 2024
meztez added a commit to Boostao/SSGBM-VRI-BEM that referenced this issue May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants