-
Notifications
You must be signed in to change notification settings - Fork 115
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: Hadley Wickham <[email protected]>
- Loading branch information
Showing
4 changed files
with
369 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
--- | ||
output: hugodown::hugo_document | ||
|
||
slug: dbplyr-2-4-0 | ||
title: dbplyr 2.4.0 | ||
date: 2023-10-26 | ||
author: Hadley Wickham | ||
description: > | ||
dbplyr 2.4.0 brings improvements to SQL generation, better control over the | ||
generated SQL, some new translations, and a bunch of backend specific improvements. | ||
photo: | ||
url: https://unsplash.com/photos/AJqaubLEaN4 | ||
author: Parker Hilton | ||
|
||
# one of: "deep-dive", "learn", "package", "programming", "roundup", or "other" | ||
categories: [package] | ||
tags: [dplyr, dbplyr] | ||
--- | ||
|
||
```{=html} | ||
<!-- | ||
* also include something about dbplyr 2.3.1? | ||
* support for `join_by()` | ||
* many bugs introduced in 2.3.0 fixed | ||
TODO: | ||
* [x] Look over / edit the post's title in the yaml | ||
* [x] Edit (or delete) the description; note this appears in the Twitter card | ||
* [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) | ||
* [x] Find photo & update yaml metadata | ||
* [x] Create `thumbnail-sq.jpg`; height and width should be equal | ||
* [x] Create `thumbnail-wd.jpg`; width should be >5x height | ||
* [x] `hugodown::use_tidy_thumbnails()` | ||
* [x] Add intro sentence, e.g. the standard tagline for the package | ||
* [x] `usethis::use_tidy_thanks()` | ||
--> | ||
``` | ||
We're chuffed to announce the release of [dbplyr](http://dbplyr.tidyverse.org/) 2.4.0. | ||
dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you. | ||
|
||
You can install it from CRAN with: | ||
|
||
```{r, eval = FALSE} | ||
install.packages("dbplyr") | ||
``` | ||
|
||
This blog post will highlight some of the most important new features: eliminating subqueries when using multiple unions in a row, getting more control on the generated SQL, and a handful of new translations. | ||
As usual, release comes with a large number of improvements to translations for individual backends; see the full list in the [release notes](https://github.com/tidyverse/dbplyr/releases/tag/v2.4.0) | ||
|
||
```{r setup} | ||
library(dbplyr) | ||
library(dplyr, warn.conflicts = FALSE) | ||
``` | ||
|
||
## SQL optimisation | ||
|
||
dbplyr now produces fewer subqueries when combining tables with `union()` and `union_all()` resulting in shorter, more readable, and, in some cases, faster SQL. | ||
|
||
```{r} | ||
lf1 <- lazy_frame(x = 1, y = "a", .name = "lf1") | ||
lf2 <- lazy_frame(x = 1, y = "b", .name = "lf2") | ||
lf3 <- lazy_frame(x = 1, z = "c", .name = "lf3") | ||
lf1 |> | ||
union(lf2) |> | ||
union(lf3) | ||
``` | ||
|
||
(As usual in these blog posts, I'm using `lazy_frame()` to focus on the SQL generation, without having to set up a dummy database.) | ||
|
||
Similarly, a `semi/anti_join()` on a filtered table now avoids a subquery: | ||
|
||
```{r} | ||
lf1 |> | ||
semi_join(lf3 |> filter(z == "c"), join_by(x)) | ||
``` | ||
|
||
## SQL generation | ||
|
||
The new argument `sql_options` for `show_query()` and `remote_query()` gives you more control on the generated SQL. | ||
|
||
- By default dbplyr uses `*` to select all columns of a table, but with `use_star = FALSE` all columns are selected explicitly: | ||
|
||
```{r} | ||
lf3 <- lazy_frame(x = 1, y = 2, z = 3, .name = "lf3") | ||
lf3 |> | ||
mutate(a = 4) | ||
lf3 |> | ||
mutate(a = 4) |> | ||
show_query(sql_options = sql_options(use_star = FALSE)) | ||
``` | ||
|
||
- If you prefer common table expressions (CTE) over subqueries use `cte = TRUE`: | ||
|
||
```{r} | ||
nested_query <- lf3 |> | ||
mutate(z = z + 1) |> | ||
left_join(lf2, by = join_by(x, y)) | ||
nested_query | ||
nested_query |> | ||
show_query(sql_options = sql_options(cte = TRUE)) | ||
``` | ||
|
||
- And if you want that all columns in a join are qualified with the table name and not only the ambiguous ones use `qualify_all_columns = TRUE`: | ||
|
||
```{r} | ||
qualify_columns <- lf2 |> | ||
left_join(lf3, by = join_by(x, y)) | ||
qualify_columns | ||
qualify_columns |> | ||
show_query(sql_options = sql_options(qualify_all_columns = TRUE)) | ||
``` | ||
|
||
## New translations | ||
|
||
`str_detect()`, `str_starts()` and `str_ends()` with fixed patterns are translated to `INSTR()`: | ||
|
||
```{r} | ||
lf1 |> | ||
filter( | ||
stringr::str_detect(x, stringr::fixed("abc")), | ||
stringr::str_starts(x, stringr::fixed("a")) | ||
) | ||
``` | ||
|
||
And `nzchar()` and `runif()` are now translated to their SQL equivalents: | ||
|
||
```{r} | ||
lf1 |> | ||
filter(nzchar(x)) |> | ||
mutate(z = runif()) | ||
``` | ||
|
||
## Acknowledgements | ||
|
||
The vast majority of this release (particularly the SQL optimisations) are from [Maximilian Girlich](https://github.com/mgirlich); thanks so much for continued work on this package! | ||
And a big thanks go to the 84 other folks who helped out by filing issues and contributing code: [\@abalter](https://github.com/abalter), [\@ablack3](https://github.com/ablack3), [\@andreassoteriadesmoj](https://github.com/andreassoteriadesmoj), [\@apalacio9502](https://github.com/apalacio9502), [\@avsdev-cw](https://github.com/avsdev-cw), [\@bairdj](https://github.com/bairdj), [\@bastistician](https://github.com/bastistician), [\@brownj31](https://github.com/brownj31), [\@But2ene](https://github.com/But2ene), [\@carlganz](https://github.com/carlganz), [\@catalamarti](https://github.com/catalamarti), [\@CEH-SLU](https://github.com/CEH-SLU), [\@chriscardillo](https://github.com/chriscardillo), [\@DavisVaughan](https://github.com/DavisVaughan), [\@DaZaM82](https://github.com/DaZaM82), [\@donour](https://github.com/donour), [\@edgararuiz](https://github.com/edgararuiz), [\@eduardszoecs](https://github.com/eduardszoecs), [\@eipi10](https://github.com/eipi10), [\@ejneer](https://github.com/ejneer), [\@erikvona](https://github.com/erikvona), [\@fh-afrachioni](https://github.com/fh-afrachioni), [\@fh-mthomson](https://github.com/fh-mthomson), [\@gui-salome](https://github.com/gui-salome), [\@hadley](https://github.com/hadley), [\@halpo](https://github.com/halpo), [\@homer3018](https://github.com/homer3018), [\@iangow](https://github.com/iangow), [\@jdlom](https://github.com/jdlom), [\@jennal-datacenter](https://github.com/jennal-datacenter), [\@JeremyPasco](https://github.com/JeremyPasco), [\@jiemakel](https://github.com/jiemakel), [\@jingydz](https://github.com/jingydz), [\@johnbaums](https://github.com/johnbaums), [\@joshseiv](https://github.com/joshseiv), [\@jrandall](https://github.com/jrandall), [\@khkk378](https://github.com/khkk378), [\@kmishra9](https://github.com/kmishra9), [\@kongdd](https://github.com/kongdd), [\@krlmlr](https://github.com/krlmlr), [\@krprasangdas](https://github.com/krprasangdas), [\@KRRLP-PL](https://github.com/KRRLP-PL), [\@lentinj](https://github.com/lentinj), [\@lgaborini](https://github.com/lgaborini), [\@lhabegger](https://github.com/lhabegger), [\@lorenzolightsgdwarf](https://github.com/lorenzolightsgdwarf), [\@lschneiderbauer](https://github.com/lschneiderbauer), [\@marianschmidt](https://github.com/marianschmidt), [\@matthewjnield](https://github.com/matthewjnield), [\@mgirlich](https://github.com/mgirlich), [\@MichaelChirico](https://github.com/MichaelChirico), [\@misea](https://github.com/misea), [\@mjbroerman](https://github.com/mjbroerman), [\@moodymudskipper](https://github.com/moodymudskipper), [\@multimeric](https://github.com/multimeric), [\@nannerhammix](https://github.com/nannerhammix), [\@nikolasharing](https://github.com/nikolasharing), [\@nviets](https://github.com/nviets), [\@nviraj](https://github.com/nviraj), [\@oobd](https://github.com/oobd), [\@pboesu](https://github.com/pboesu), [\@pepijn-devries](https://github.com/pepijn-devries), [\@rbcavanaugh](https://github.com/rbcavanaugh), [\@rcepka](https://github.com/rcepka), [\@robertkck](https://github.com/robertkck), [\@samssann](https://github.com/samssann), [\@SayfSaid](https://github.com/SayfSaid), [\@scottporter](https://github.com/scottporter), [\@shearerpmm](https://github.com/shearerpmm), [\@srikanthtist](https://github.com/srikanthtist), [\@stemangiola](https://github.com/stemangiola), [\@stephenashton-dhsc](https://github.com/stephenashton-dhsc), [\@stevepowell99](https://github.com/stevepowell99), [\@TBlackmore](https://github.com/TBlackmore), [\@thomashulst](https://github.com/thomashulst), [\@thothal](https://github.com/thothal), [\@tilo-aok](https://github.com/tilo-aok), [\@tisseuil](https://github.com/tisseuil), [\@tonyk7440](https://github.com/tonyk7440), [\@TSchiefer](https://github.com/TSchiefer), [\@Tsemharb](https://github.com/Tsemharb), [\@tuge98](https://github.com/tuge98), [\@vadim-cherepanov](https://github.com/vadim-cherepanov), and [\@wdenton](https://github.com/wdenton). |
Oops, something went wrong.