Skip to content

Commit

Permalink
dbplyr 2.4.0 (#659)
Browse files Browse the repository at this point in the history
Co-authored-by: Hadley Wickham <[email protected]>
  • Loading branch information
mgirlich and hadley authored Oct 26, 2023
1 parent 1b8f6ed commit 6df5686
Show file tree
Hide file tree
Showing 4 changed files with 369 additions and 0 deletions.
142 changes: 142 additions & 0 deletions content/blog/dbplyr-2-4-0/index.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
output: hugodown::hugo_document

slug: dbplyr-2-4-0
title: dbplyr 2.4.0
date: 2023-10-26
author: Hadley Wickham
description: >
dbplyr 2.4.0 brings improvements to SQL generation, better control over the
generated SQL, some new translations, and a bunch of backend specific improvements.
photo:
url: https://unsplash.com/photos/AJqaubLEaN4
author: Parker Hilton

# one of: "deep-dive", "learn", "package", "programming", "roundup", or "other"
categories: [package]
tags: [dplyr, dbplyr]
---

```{=html}
<!--
* also include something about dbplyr 2.3.1?
* support for `join_by()`
* many bugs introduced in 2.3.0 fixed
TODO:
* [x] Look over / edit the post's title in the yaml
* [x] Edit (or delete) the description; note this appears in the Twitter card
* [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`)
* [x] Find photo & update yaml metadata
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
* [x] `hugodown::use_tidy_thumbnails()`
* [x] Add intro sentence, e.g. the standard tagline for the package
* [x] `usethis::use_tidy_thanks()`
-->
```
We're chuffed to announce the release of [dbplyr](http://dbplyr.tidyverse.org/) 2.4.0.
dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.

You can install it from CRAN with:

```{r, eval = FALSE}
install.packages("dbplyr")
```

This blog post will highlight some of the most important new features: eliminating subqueries when using multiple unions in a row, getting more control on the generated SQL, and a handful of new translations.
As usual, release comes with a large number of improvements to translations for individual backends; see the full list in the [release notes](https://github.com/tidyverse/dbplyr/releases/tag/v2.4.0)

```{r setup}
library(dbplyr)
library(dplyr, warn.conflicts = FALSE)
```

## SQL optimisation

dbplyr now produces fewer subqueries when combining tables with `union()` and `union_all()` resulting in shorter, more readable, and, in some cases, faster SQL.

```{r}
lf1 <- lazy_frame(x = 1, y = "a", .name = "lf1")
lf2 <- lazy_frame(x = 1, y = "b", .name = "lf2")
lf3 <- lazy_frame(x = 1, z = "c", .name = "lf3")
lf1 |>
union(lf2) |>
union(lf3)
```

(As usual in these blog posts, I'm using `lazy_frame()` to focus on the SQL generation, without having to set up a dummy database.)

Similarly, a `semi/anti_join()` on a filtered table now avoids a subquery:

```{r}
lf1 |>
semi_join(lf3 |> filter(z == "c"), join_by(x))
```

## SQL generation

The new argument `sql_options` for `show_query()` and `remote_query()` gives you more control on the generated SQL.

- By default dbplyr uses `*` to select all columns of a table, but with `use_star = FALSE` all columns are selected explicitly:

```{r}
lf3 <- lazy_frame(x = 1, y = 2, z = 3, .name = "lf3")
lf3 |>
mutate(a = 4)
lf3 |>
mutate(a = 4) |>
show_query(sql_options = sql_options(use_star = FALSE))
```

- If you prefer common table expressions (CTE) over subqueries use `cte = TRUE`:

```{r}
nested_query <- lf3 |>
mutate(z = z + 1) |>
left_join(lf2, by = join_by(x, y))
nested_query
nested_query |>
show_query(sql_options = sql_options(cte = TRUE))
```

- And if you want that all columns in a join are qualified with the table name and not only the ambiguous ones use `qualify_all_columns = TRUE`:

```{r}
qualify_columns <- lf2 |>
left_join(lf3, by = join_by(x, y))
qualify_columns
qualify_columns |>
show_query(sql_options = sql_options(qualify_all_columns = TRUE))
```

## New translations

`str_detect()`, `str_starts()` and `str_ends()` with fixed patterns are translated to `INSTR()`:

```{r}
lf1 |>
filter(
stringr::str_detect(x, stringr::fixed("abc")),
stringr::str_starts(x, stringr::fixed("a"))
)
```

And `nzchar()` and `runif()` are now translated to their SQL equivalents:

```{r}
lf1 |>
filter(nzchar(x)) |>
mutate(z = runif())
```

## Acknowledgements

The vast majority of this release (particularly the SQL optimisations) are from [Maximilian Girlich](https://github.com/mgirlich); thanks so much for continued work on this package!
And a big thanks go to the 84 other folks who helped out by filing issues and contributing code: [\@abalter](https://github.com/abalter), [\@ablack3](https://github.com/ablack3), [\@andreassoteriadesmoj](https://github.com/andreassoteriadesmoj), [\@apalacio9502](https://github.com/apalacio9502), [\@avsdev-cw](https://github.com/avsdev-cw), [\@bairdj](https://github.com/bairdj), [\@bastistician](https://github.com/bastistician), [\@brownj31](https://github.com/brownj31), [\@But2ene](https://github.com/But2ene), [\@carlganz](https://github.com/carlganz), [\@catalamarti](https://github.com/catalamarti), [\@CEH-SLU](https://github.com/CEH-SLU), [\@chriscardillo](https://github.com/chriscardillo), [\@DavisVaughan](https://github.com/DavisVaughan), [\@DaZaM82](https://github.com/DaZaM82), [\@donour](https://github.com/donour), [\@edgararuiz](https://github.com/edgararuiz), [\@eduardszoecs](https://github.com/eduardszoecs), [\@eipi10](https://github.com/eipi10), [\@ejneer](https://github.com/ejneer), [\@erikvona](https://github.com/erikvona), [\@fh-afrachioni](https://github.com/fh-afrachioni), [\@fh-mthomson](https://github.com/fh-mthomson), [\@gui-salome](https://github.com/gui-salome), [\@hadley](https://github.com/hadley), [\@halpo](https://github.com/halpo), [\@homer3018](https://github.com/homer3018), [\@iangow](https://github.com/iangow), [\@jdlom](https://github.com/jdlom), [\@jennal-datacenter](https://github.com/jennal-datacenter), [\@JeremyPasco](https://github.com/JeremyPasco), [\@jiemakel](https://github.com/jiemakel), [\@jingydz](https://github.com/jingydz), [\@johnbaums](https://github.com/johnbaums), [\@joshseiv](https://github.com/joshseiv), [\@jrandall](https://github.com/jrandall), [\@khkk378](https://github.com/khkk378), [\@kmishra9](https://github.com/kmishra9), [\@kongdd](https://github.com/kongdd), [\@krlmlr](https://github.com/krlmlr), [\@krprasangdas](https://github.com/krprasangdas), [\@KRRLP-PL](https://github.com/KRRLP-PL), [\@lentinj](https://github.com/lentinj), [\@lgaborini](https://github.com/lgaborini), [\@lhabegger](https://github.com/lhabegger), [\@lorenzolightsgdwarf](https://github.com/lorenzolightsgdwarf), [\@lschneiderbauer](https://github.com/lschneiderbauer), [\@marianschmidt](https://github.com/marianschmidt), [\@matthewjnield](https://github.com/matthewjnield), [\@mgirlich](https://github.com/mgirlich), [\@MichaelChirico](https://github.com/MichaelChirico), [\@misea](https://github.com/misea), [\@mjbroerman](https://github.com/mjbroerman), [\@moodymudskipper](https://github.com/moodymudskipper), [\@multimeric](https://github.com/multimeric), [\@nannerhammix](https://github.com/nannerhammix), [\@nikolasharing](https://github.com/nikolasharing), [\@nviets](https://github.com/nviets), [\@nviraj](https://github.com/nviraj), [\@oobd](https://github.com/oobd), [\@pboesu](https://github.com/pboesu), [\@pepijn-devries](https://github.com/pepijn-devries), [\@rbcavanaugh](https://github.com/rbcavanaugh), [\@rcepka](https://github.com/rcepka), [\@robertkck](https://github.com/robertkck), [\@samssann](https://github.com/samssann), [\@SayfSaid](https://github.com/SayfSaid), [\@scottporter](https://github.com/scottporter), [\@shearerpmm](https://github.com/shearerpmm), [\@srikanthtist](https://github.com/srikanthtist), [\@stemangiola](https://github.com/stemangiola), [\@stephenashton-dhsc](https://github.com/stephenashton-dhsc), [\@stevepowell99](https://github.com/stevepowell99), [\@TBlackmore](https://github.com/TBlackmore), [\@thomashulst](https://github.com/thomashulst), [\@thothal](https://github.com/thothal), [\@tilo-aok](https://github.com/tilo-aok), [\@tisseuil](https://github.com/tisseuil), [\@tonyk7440](https://github.com/tonyk7440), [\@TSchiefer](https://github.com/TSchiefer), [\@Tsemharb](https://github.com/Tsemharb), [\@tuge98](https://github.com/tuge98), [\@vadim-cherepanov](https://github.com/vadim-cherepanov), and [\@wdenton](https://github.com/wdenton).
Loading

0 comments on commit 6df5686

Please sign in to comment.