Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbplyr::in_schema is not working #523

Closed
ablack3 opened this issue Mar 7, 2023 · 2 comments
Closed

dbplyr::in_schema is not working #523

ablack3 opened this issue Mar 7, 2023 · 2 comments
Labels
bug an unexpected problem or unintended behavior DBI 🗃️
Milestone

Comments

@ablack3
Copy link

ablack3 commented Mar 7, 2023

I would like to use dplyr::tbl to create a reference to a table in bigquery. I want to be able to use dbplyr::in_schema to select both the dataset and table I want. Even though bigquery connections have a default dataset it is possible to query tables in datasets (aka schemas) other than the default.

library(bigrquery)
bq_auth(path = Sys.getenv("BIGQUERY_SERVICE_ACCOUNT_JSON_PATH"))

cdm_schema <- Sys.getenv("BIGQUERY_CDM_SCHEMA")
write_schema <- Sys.getenv("BIGQUERY_SCRATCH_SCHEMA")

con <- DBI::dbConnect(
  bigrquery::bigquery(),
  project = Sys.getenv("BIGQUERY_PROJECT_ID"),
  dataset = cdm_schema
)

DBI::dbGetQuery(con, "select count(*) as n from person")
#> # A tibble: 1 × 1
#>        n
#>    <int>
#> 1 116352

# try to create a tbl reference using fully qualified dataset.table name
dplyr::tbl(con, DBI::Id(schema = "synpuf_110k", table = "person"))
#> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
#> Error in `signal_reason()`:
#> ! Invalid dataset ID "`synpuf_110k`". Dataset IDs must be alphanumeric (plus underscores and dashes) and must be at most 1024 characters long. [invalid]

#> Backtrace:
#>      ▆
#>   1. ├─dplyr::tbl(con, DBI::Id(schema = "synpuf_110k", table = "person"))
#>   2. └─dplyr:::tbl.DBIConnection(...)
#>   3.   ├─dplyr::tbl(...)
#>   4.   └─dbplyr:::tbl.src_dbi(...)
#>   5.     └─dbplyr::tbl_sql(c(subclass, "dbi"), src = src, from = from, ...)
#>   6.       ├─vars %||% dbplyr_query_fields(src$con, from_sql)
#>   7.       └─dbplyr:::dbplyr_query_fields(src$con, from_sql)
#>   8.         └─dbplyr:::dbplyr_fallback(con, "db_query_fields", ...)
#>   9.           ├─rlang::eval_bare(expr((!!fun)(con, ...)))
#>  10.           ├─dplyr::db_query_fields(con, ...)
#>  11.           └─bigrquery:::db_query_fields.BigQueryConnection(con, ...)
#>  12.             └─bigrquery::bq_table_fields(tb)
#>  13.               └─bigrquery::bq_table_meta(x, fields = "schema")
#>  14.                 └─bigrquery:::bq_get(url, query = list(fields = fields))
#>  15.                   └─bigrquery:::process_request(req, raw = raw)
#>  16.                     └─bigrquery:::bq_check_response(status, type, content)
#>  17.                       └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#>  18.                         └─rlang::abort(message, class = paste0("bigrquery_", reason))

dplyr::tbl(con, dbplyr::sql("synpuf_110k.person"))
#> Error in `signal_reason()`:
#> ! Syntax error: Expected end of input but got identifier "synpuf_110k" at [1:1] [invalidQuery]

#> Backtrace:
#>      ▆
#>   1. ├─dplyr::tbl(con, dbplyr::sql("synpuf_110k.person"))
#>   2. └─dplyr:::tbl.DBIConnection(con, dbplyr::sql("synpuf_110k.person"))
#>   3.   ├─dplyr::tbl(...)
#>   4.   └─dbplyr:::tbl.src_dbi(...)
#>   5.     └─dbplyr::tbl_sql(c(subclass, "dbi"), src = src, from = from, ...)
#>   6.       ├─vars %||% dbplyr_query_fields(src$con, from_sql)
#>   7.       └─dbplyr:::dbplyr_query_fields(src$con, from_sql)
#>   8.         └─dbplyr:::dbplyr_fallback(con, "db_query_fields", ...)
#>   9.           ├─rlang::eval_bare(expr((!!fun)(con, ...)))
#>  10.           ├─dplyr::db_query_fields(con, ...)
#>  11.           └─bigrquery:::db_query_fields.BigQueryConnection(con, ...)
#>  12.             └─bigrquery:::bq_query_fields(sql, con@billing, default_dataset = ds)
#>  13.               └─bigrquery:::bq_post(url, body = bq_body(body, ...), query = list(fields = "statistics(query(schema(fields)))"))
#>  14.                 └─bigrquery:::process_request(req)
#>  15.                   └─bigrquery:::bq_check_response(status, type, content)
#>  16.                     └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#>  17.                       └─rlang::abort(message, class = paste0("bigrquery_", reason))

dplyr::tbl(con, dbplyr::in_schema("synpuf_110k", "person"))
#> Error in `signal_reason()`:
#> ! Invalid dataset ID "`synpuf_110k`". Dataset IDs must be alphanumeric (plus underscores and dashes) and must be at most 1024 characters long. [invalid]

#> Backtrace:
#>      ▆
#>   1. ├─dplyr::tbl(con, dbplyr::in_schema("synpuf_110k", "person"))
#>   2. └─dplyr:::tbl.DBIConnection(...)
#>   3.   ├─dplyr::tbl(...)
#>   4.   └─dbplyr:::tbl.src_dbi(...)
#>   5.     └─dbplyr::tbl_sql(c(subclass, "dbi"), src = src, from = from, ...)
#>   6.       ├─vars %||% dbplyr_query_fields(src$con, from_sql)
#>   7.       └─dbplyr:::dbplyr_query_fields(src$con, from_sql)
#>   8.         └─dbplyr:::dbplyr_fallback(con, "db_query_fields", ...)
#>   9.           ├─rlang::eval_bare(expr((!!fun)(con, ...)))
#>  10.           ├─dplyr::db_query_fields(con, ...)
#>  11.           └─bigrquery:::db_query_fields.BigQueryConnection(con, ...)
#>  12.             └─bigrquery::bq_table_fields(tb)
#>  13.               └─bigrquery::bq_table_meta(x, fields = "schema")
#>  14.                 └─bigrquery:::bq_get(url, query = list(fields = fields))
#>  15.                   └─bigrquery:::process_request(req, raw = raw)
#>  16.                     └─bigrquery:::bq_check_response(status, type, content)
#>  17.                       └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#>  18.                         └─rlang::abort(message, class = paste0("bigrquery_", reason))

# this works
DBI::dbGetQuery(con, "select count(*) as n from `synpuf_110k`.`person`")
#> # A tibble: 1 × 1
#>        n
#>    <int>
#> 1 116352

# db_query_fields seems to be the problem
dplyr::db_query_fields(con, "synpuf_110k.person")
#>  [1] "person_id"                   "gender_concept_id"          
#>  [3] "year_of_birth"               "month_of_birth"             
#>  [5] "day_of_birth"                "birth_DATETIME"             
#>  [7] "race_concept_id"             "ethnicity_concept_id"       
#>  [9] "location_id"                 "provider_id"                
#> [11] "care_site_id"                "person_source_value"        
#> [13] "gender_source_value"         "gender_source_concept_id"   
#> [15] "race_source_value"           "race_source_concept_id"     
#> [17] "ethnicity_source_value"      "ethnicity_source_concept_id"
dplyr::db_query_fields(con, dbplyr::sql("synpuf_110k.person"))
#> Error in `signal_reason()`:
#> ! Syntax error: Expected end of input but got identifier "synpuf_110k" at [1:1] [invalidQuery]

#> Backtrace:
#>     ▆
#>  1. ├─dplyr::db_query_fields(con, dbplyr::sql("synpuf_110k.person"))
#>  2. └─bigrquery:::db_query_fields.BigQueryConnection(con, dbplyr::sql("synpuf_110k.person"))
#>  3.   └─bigrquery:::bq_query_fields(sql, con@billing, default_dataset = ds)
#>  4.     └─bigrquery:::bq_post(url, body = bq_body(body, ...), query = list(fields = "statistics(query(schema(fields)))"))
#>  5.       └─bigrquery:::process_request(req)
#>  6.         └─bigrquery:::bq_check_response(status, type, content)
#>  7.           └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#>  8.             └─rlang::abort(message, class = paste0("bigrquery_", reason))
dplyr::db_query_fields(con, dbplyr::sql("`synpuf_110k`.`person`"))
#> Error in `signal_reason()`:
#> ! Syntax error: Expected end of input but got identifier `synpuf_110k` at [1:1] [invalidQuery]

#> Backtrace:
#>     ▆
#>  1. ├─dplyr::db_query_fields(con, dbplyr::sql("`synpuf_110k`.`person`"))
#>  2. └─bigrquery:::db_query_fields.BigQueryConnection(con, dbplyr::sql("`synpuf_110k`.`person`"))
#>  3.   └─bigrquery:::bq_query_fields(sql, con@billing, default_dataset = ds)
#>  4.     └─bigrquery:::bq_post(url, body = bq_body(body, ...), query = list(fields = "statistics(query(schema(fields)))"))
#>  5.       └─bigrquery:::process_request(req)
#>  6.         └─bigrquery:::bq_check_response(status, type, content)
#>  7.           └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#>  8.             └─rlang::abort(message, class = paste0("bigrquery_", reason))

DBI::dbDisconnect(con)

Created on 2023-03-07 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-03-07
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  askpass       1.1        2019-01-13 [1] CRAN (R 4.2.0)
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  bigrquery   * 1.4.1      2022-10-27 [1] CRAN (R 4.2.0)
#>  bit           4.0.5      2022-11-15 [1] CRAN (R 4.2.0)
#>  bit64         4.0.5      2020-08-30 [1] CRAN (R 4.2.0)
#>  brio          1.1.3      2021-11-30 [1] CRAN (R 4.2.0)
#>  cli           3.6.0      2023-01-09 [1] CRAN (R 4.2.0)
#>  curl          5.0.0      2023-01-12 [1] CRAN (R 4.2.0)
#>  DBI           1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dbplyr        2.3.0.9000 2023-02-14 [1] Github (tidyverse/dbplyr@9607491)
#>  digest        0.6.31     2022-12-11 [1] CRAN (R 4.2.0)
#>  dplyr         1.1.0      2023-01-29 [1] CRAN (R 4.2.0)
#>  evaluate      0.20       2023-01-17 [1] CRAN (R 4.2.0)
#>  fansi         1.0.4      2023-01-22 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.6.1      2023-02-06 [1] CRAN (R 4.2.0)
#>  gargle        1.2.1      2022-09-08 [1] CRAN (R 4.2.0)
#>  generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.4      2022-12-07 [1] CRAN (R 4.2.0)
#>  httr          1.4.5      2023-02-24 [1] CRAN (R 4.2.2)
#>  jsonlite      1.8.4      2022-12-06 [1] CRAN (R 4.2.0)
#>  knitr         1.42       2023-01-25 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  openssl       2.0.5      2022-12-06 [1] CRAN (R 4.2.0)
#>  pillar        1.8.1      2022-08-19 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp          1.0.10     2023-01-22 [1] CRAN (R 4.2.0)
#>  reprex        2.0.2      2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang         1.0.6      2022-09-24 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.20       2023-01-19 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  tibble        3.1.8      2022-07-22 [1] CRAN (R 4.2.0)
#>  tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.2.0)
#>  utf8          1.2.3      2023-01-31 [1] CRAN (R 4.2.0)
#>  vctrs         0.5.2      2023-01-23 [1] CRAN (R 4.2.0)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.37       2023-01-31 [1] CRAN (R 4.2.0)
#>  yaml          2.3.7      2023-01-23 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@hadley hadley added feature a feature request or enhancement DBI 🗃️ labels Nov 2, 2023
@hadley
Copy link
Member

hadley commented Nov 7, 2023

Looks like most of this is already working:

library(bigrquery)
options(gargle_oauth_email = TRUE)

con <- DBI::dbConnect(bigrquery::bigquery(), project = bq_test_project())

df <- dplyr::tbl(con, DBI::Id(catalog = "publicdata", schema = "samples", table = "natality"))
#> ℹ The bigrquery package is using a cached token for '[email protected]'.

df <- dplyr::tbl(con, dbplyr::in_catalog("publicdata", "samples", "natality"))

df <- dplyr::tbl(con, dbplyr::sql("publicdata.samples.natality"))
#> Error in `db_query_fields.DBIConnection()`:
#> ! Can't query fields.
#> Caused by error in `bq_job_wait()` at bigrquery/R/dbi-result.R:21:3:
#> ! Job 'gargle-169921.job_WTy1stCGOc8I7KZ9v4f9NwfmlxQU.US' failed
#> ✖ Syntax error: Expected keyword JOIN but got ")" at [2:34] [invalidQuery]

Created on 2023-11-07 with reprex v2.0.2

@hadley hadley added bug an unexpected problem or unintended behavior and removed feature a feature request or enhancement labels Nov 7, 2023
@hadley hadley added this to the v1.5.0 milestone Nov 7, 2023
@hadley
Copy link
Member

hadley commented Nov 7, 2023

Pretty sure this needs a dbplyr fix: tidyverse/dbplyr#1396

@hadley hadley closed this as completed Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior DBI 🗃️
Projects
None yet
Development

No branches or pull requests

2 participants