Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upkeep 2024-09 #479

Merged
merged 5 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 14 additions & 13 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ on:
pull_request:
branches: [main, master]

name: R-CMD-check
name: R-CMD-check.yaml

permissions: read-all

jobs:
R-CMD-check:
Expand All @@ -25,24 +27,22 @@ jobs:
- {os: macos-latest, r: 'release'}

- {os: windows-latest, r: 'release'}
# Use 3.6 to trigger usage of RTools35
- {os: windows-latest, r: '3.6'}
# use 4.1 to check with rtools40's older compiler
- {os: windows-latest, r: '4.1'}

- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
- {os: ubuntu-latest, r: 'oldrel-2'}
- {os: ubuntu-latest, r: 'oldrel-3'}
- {os: ubuntu-latest, r: 'oldrel-4'}
# use 4.0 or 4.1 to check with rtools40's older compiler
- {os: windows-latest, r: 'oldrel-4'}

- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
- {os: ubuntu-latest, r: 'oldrel-2'}
- {os: ubuntu-latest, r: 'oldrel-3'}
- {os: ubuntu-latest, r: 'oldrel-4'}

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/setup-pandoc@v2

Expand All @@ -60,3 +60,4 @@ jobs:
- uses: r-lib/actions/check-r-package@v2
with:
upload-snapshots: true
build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'
8 changes: 5 additions & 3 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ on:
types: [published]
workflow_dispatch:

name: pkgdown
name: pkgdown.yaml

permissions: read-all

jobs:
pkgdown:
Expand All @@ -22,7 +24,7 @@ jobs:
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/setup-pandoc@v2

Expand All @@ -41,7 +43,7 @@ jobs:

- name: Deploy to GitHub pages 🚀
if: github.event_name != 'pull_request'
uses: JamesIves/github-pages-deploy-action@v4.4.1
uses: JamesIves/github-pages-deploy-action@v4.5.0
with:
clean: false
branch: gh-pages
Expand Down
12 changes: 9 additions & 3 deletions .github/workflows/pr-commands.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ on:
issue_comment:
types: [created]

name: Commands
name: pr-commands.yaml

permissions: read-all

jobs:
document:
Expand All @@ -13,8 +15,10 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/pr-fetch@v2
with:
Expand Down Expand Up @@ -50,8 +54,10 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/pr-fetch@v2
with:
Expand Down
23 changes: 17 additions & 6 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ on:
pull_request:
branches: [main, master]

name: test-coverage
name: test-coverage.yaml

permissions: read-all

jobs:
test-coverage:
Expand All @@ -15,36 +17,45 @@ jobs:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::covr
extra-packages: any::covr, any::xml2
needs: coverage

- name: Test coverage
run: |
covr::codecov(
cov <- covr::package_coverage(
quiet = FALSE,
clean = FALSE,
install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
)
covr::to_cobertura(cov)
shell: Rscript {0}

- uses: codecov/codecov-action@v4
with:
fail_ci_if_error: ${{ github.event_name != 'pull_request' && true || false }}
file: ./cobertura.xml
plugin: noop
disable_search: true
token: ${{ secrets.CODECOV_TOKEN }}

- name: Show testthat output
if: always()
run: |
## --------------------------------------------------------------------
find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true
shell: bash

- name: Upload test results
if: failure()
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: coverage-test-failures
path: ${{ runner.temp }}/package
8 changes: 4 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ Description: Provides a data.table backend for 'dplyr'. The goal of
License: MIT + file LICENSE
URL: https://dtplyr.tidyverse.org, https://github.com/tidyverse/dtplyr
BugReports: https://github.com/tidyverse/dtplyr/issues
Depends:
R (>= 3.6)
Depends:
R (>= 4.0)
Imports:
cli (>= 3.4.0),
data.table (>= 1.13.0),
Expand All @@ -35,10 +35,10 @@ Suggests:
testthat (>= 3.1.2),
tidyr (>= 1.1.0),
waldo (>= 0.3.1)
VignetteBuilder:
VignetteBuilder:
knitr
Config/Needs/website: tidyverse/tidytemplate
Config/testthat/edition: 3
Encoding: UTF-8
Roxygen: {library(tidyr); list(markdown = TRUE)}
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@ S3method(distinct,dtplyr_step)
S3method(do,dtplyr_step)
S3method(dt_call,dtplyr_step)
S3method(dt_call,dtplyr_step_assign)
S3method(dt_call,dtplyr_step_call)
S3method(dt_call,dtplyr_step_first)
S3method(dt_call,dtplyr_step_join)
S3method(dt_call,dtplyr_step_modify)
S3method(dt_call,dtplyr_step_mutate)
S3method(dt_call,dtplyr_step_set)
S3method(dt_call,dtplyr_step_subset)
S3method(dt_has_computation,dtplyr_step)
Expand Down
1 change: 1 addition & 0 deletions R/step-call.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ step_call <- function(parent, fun, args = list(), vars = parent$vars, in_place =
)
}

#' @export
dt_call.dtplyr_step_call <- function(x, needs_copy = x$needs_copy) {
call2(x$fun, dt_call(x$parent, needs_copy), !!!x$args)
}
Expand Down
3 changes: 1 addition & 2 deletions R/step-join.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ dt_call.dtplyr_step_join <- function(x, needs_copy = x$needs_copy) {
anti = call2("[", lhs, call2("!", rhs), on = on),
semi = call2("[", lhs, call2("unique", call2("[", lhs, rhs, which = TRUE, nomatch = NULL, on = on)))
)

if (x$style == "full") {
default_suffix <- c(".x", ".y")
if (!identical(x$suffix, default_suffix)) {
Expand Down Expand Up @@ -133,7 +133,6 @@ right_join.dtplyr_step <- function(x, y, ..., by = NULL, copy = FALSE, suffix =
step_join(x, y, by, style = "right", copy = copy, suffix = suffix)
}


#' @importFrom dplyr inner_join
#' @export
inner_join.dtplyr_step <- function(x, y, ..., by = NULL, copy = FALSE, suffix = c(".x", ".y")) {
Expand Down
1 change: 1 addition & 0 deletions R/step-mutate.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ step_mutate <- function(parent, new_vars = list(), use_braces = FALSE, by = new_
out
}

#' @export
dt_call.dtplyr_step_mutate <- function(x, needs_copy = x$needs_copy) {
# i is always empty because we never mutate a subset
if (is_empty(x$new_vars)) {
Expand Down
28 changes: 14 additions & 14 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ knitr::opts_chunk$set(

<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/dtplyr)](https://cran.r-project.org/package=dtplyr)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/workflows/R-CMD-check/badge.svg)](https://github.com/tidyverse/dtplyr/actions)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/dtplyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr?branch=main)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/dtplyr/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr)
<!-- badges: end -->

## Overview
Expand Down Expand Up @@ -52,7 +52,7 @@ library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
```

Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it.
Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it.

```{r}
mtcars2 <- lazy_dt(mtcars)
Expand All @@ -61,35 +61,35 @@ mtcars2 <- lazy_dt(mtcars)
You can preview the transformation (including the generated data.table code) by printing the result:

```{r}
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k))
```

But generally you should reserve this only for debugging, and use `as.data.table()`, `as.data.frame()`, or `as_tibble()` to indicate that you're done with the transformation and want to access the results:

```{r}
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
as_tibble()
```

## Why is dtplyr slower than data.table?

There are two primary reasons that dtplyr will always be somewhat slower than data.table:

* Each dplyr verb must do some work to convert dplyr syntax to data.table
syntax. This takes time proportional to the complexity of the input code,
* Each dplyr verb must do some work to convert dplyr syntax to data.table
syntax. This takes time proportional to the complexity of the input code,
not the input _data_, so should be a negligible overhead for large datasets.
[Initial benchmarks][benchmark] suggest that the overhead should be under
[Initial benchmarks][benchmark] suggest that the overhead should be under
1ms per dplyr call.

* To match dplyr semantics, `mutate()` does not modify in place by default.
* To match dplyr semantics, `mutate()` does not modify in place by default.
This means that most expressions involving `mutate()` must make a copy
that would not be necessary if you were using data.table directly.
(You can opt out of this behaviour in `lazy_dt()` with `immutable = FALSE`).
Expand Down
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@

[![CRAN
status](https://www.r-pkg.org/badges/version/dtplyr)](https://cran.r-project.org/package=dtplyr)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/workflows/R-CMD-check/badge.svg)](https://github.com/tidyverse/dtplyr/actions)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml)
[![Codecov test
coverage](https://codecov.io/gh/tidyverse/dtplyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr?branch=main)
coverage](https://codecov.io/gh/tidyverse/dtplyr/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr)
<!-- badges: end -->

## Overview
Expand Down Expand Up @@ -47,6 +47,7 @@ other goodies that it provides:

``` r
library(data.table)
#> Warning: package 'data.table' was built under R version 4.4.1
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
```
Expand All @@ -62,10 +63,10 @@ You can preview the transformation (including the generated data.table
code) by printing the result:

``` r
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k))
#> Source: local data table [3 x 2]
#> Call: `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)),
Expand All @@ -85,11 +86,11 @@ But generally you should reserve this only for debugging, and use
you’re done with the transformation and want to access the results:

``` r
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
as_tibble()
#> # A tibble: 3 × 2
#> cyl l100k
Expand Down
2 changes: 1 addition & 1 deletion tests/testthat/_snaps/step-call.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,6 @@
collect(drop_na(dt, "z"))
Condition
Error in `drop_na()`:
! Can't subset columns that don't exist.
! Can't select columns that don't exist.
x Column `z` doesn't exist.

2 changes: 1 addition & 1 deletion tests/testthat/test-step-join.R
Original file line number Diff line number Diff line change
Expand Up @@ -346,10 +346,10 @@ test_that("performs cartesian joins as needed", {
test_that("performs cross join", {
df1 <- data.frame(x = 1:2, y = "a", stringsAsFactors = FALSE)
df2 <- data.frame(x = 3:4)
expected <- dplyr::cross_join(df1, df2) %>% as_tibble()

dt1 <- lazy_dt(df1, "dt1")
dt2 <- lazy_dt(df2, "dt2")
expected <- left_join(df1, df2, by = character()) %>% as_tibble()

expect_snapshot(left_join(dt1, dt2, by = character()))
expect_equal(left_join(dt1, dt2, by = character()) %>% collect(), expected)
Expand Down
Loading