Allow `epi_slide` to access `ref_time_value` #318

nmdefries · 2023-05-19T22:10:03Z

This implements Ryan's suggestion for how to calculate ref_time_value, and makes sure epi_slide computations of all types can access ref_time_value and the group key using the same approaches as in #313 and #317.

I plan to remove the n_mandatory_f_args arg to assert_sufficient_f_args in another PR.

Closes #171.

nmdefries · 2023-05-23T22:01:51Z

There was some concern over the efficiency of finding time_values and joining them onto the data, then sorting the data.

In the old implementation, we were already doing a direct sort. I've just moved that step to be a littler bit later in the new version. So that step hasn't changed computational complexity of the function.

In the new code here, we do need to identify dates not already in the epi_df. I used a standard %in% to identify these since we are doing the same kind of search already. Ditto about complexity.

The main addition, that is, something that could be slow that we're not already doing elsewhere in the epi_slide function, are the handful of bind_rows, which will copy some data around. I haven't compared this approach to doing a merge/join.

nmdefries · 2023-05-23T22:37:03Z

Doing some benchmarking, the slow steps of the new approach are dropping the .real column and filtering by the .real column. Improved (~10x faster) in 5564c0d.

nmdefries · 2023-05-24T16:20:17Z

When testing that computations can access ref_time_value, I noticed that slide returns date columns in numeric format (18672, e.g.). It is converted during unlist. Not sure it's worth changing. We don't really expect users to return dates. They can also convert back to date format themselves.

Opened issue #321

nmdefries · 2023-05-30T17:50:10Z

The input data

set.seed(100)
n <- 5000L
x <- tibble::tribble(~geo_value, ~time_value, ~binary,
                     "x",       ceiling(runif(n, 0, 200)),   runif(n, 0, 100)^2) %>%
  tidyr::unnest(c(time_value,binary))

# Randomize order
x <- x[sample(nrow(x)), ] %>% as_epi_df()

Looking at a computation that doesn't require access to ref_time_value so that we can compare between the new and old versions (the new logic calculates ref_time_value even if it isn't used):

bench::mark(
  ref_value_calculated = {epi_slide(x, f = function(x, g, t) sum(x$binary),
                                    before = 2,
                                    new_col_name = "sum_binary")},
  old = {old_epi_slide(x, f = function(x, g, t) sum(x$binary),
                       before = 2,
                       new_col_name = "sum_binary")},
  iterations = 10)
# A tibble: 2 × 13
#   expression                min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
# 1 ref_value_calculated   40.8ms   41.9ms      22.8    3.46MB     9.78     7     3
# 2 old                    20.5ms   21.1ms      46.6    2.46MB     5.17     9     1

For this computation, the new code is ~2x slower. With larger data sizes, the difference in runtime gets smaller (new version ~30% slower with 50k rows).

brookslogan

Looks pretty good, and I like the more bite-size test data with more extensive tests! Major things that need action I think are:

ungrouped case might have a bug
merge with dev will probably break
same tidyeval edge cases as #317

R/slide.R

tests/testthat/test-epi_slide.R

R/slide.R

…al implmentation

nmdefries · 2023-06-08T15:10:19Z

I won't be able to finish the last few issues here before my PTO. Will pick those up when I get back.

nmdefries · 2023-06-08T15:22:13Z

Need to update the documentation for the f arg to epi_slide and add epi_slide to the NEWS blurb about ref_time_value being available.

and add some unrelated comments regarding time value counting details. Since we're not potentially dealing with R6 objects, we can just us `as_data_mask(.x)`. We can also flatten the data mask by installing the `.x`, `.group_key` and `.ref_time_value` in the same way as that pronouns are installed (except keeping their original classes, not converting to pronouns), rather than using another level in the environment chain for the data mask.

brookslogan · 2023-06-16T20:01:11Z

An interesting performance experiment... I thought we might speed up things by changing x[x$.real,] to x[x[[".real"]],] as I thought it's supposed to be faster based on some R6 benchmarks that include comparisons of these things, but it's actually slower and doubles the amount of garbage collections!

bench::mark(
  grped %>% epi_slide(cases_7dav = mean(cases), before=6),
  grped %>% curr_epi_slide(cases_7dav = mean(cases), before=6)
)
#> # A tibble: 2 × 13
#>   expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr>    <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 grped %>% ep… 3.13s  3.13s     0.319      26MB     6.38     1    20      3.13s
#> 2 grped %>% cu… 2.96s  2.96s     0.338      26MB     3.04     1     9      2.96s
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>
#> Warning message:
#> Some expressions had a GC in every iteration; so filtering is disabled.

brookslogan

Thanks for clearing up some of my confusion. I've applied some minor updates and am going to merge. Docs and NEWS might still need a little work; I'll try to tackle that in after the merge.

nmdefries added 8 commits May 18, 2023 18:14

basic way to calculate ref_time_value in epi_slide

d0374a1

support using ref_time_value in epi_slide computations

f2f3436

drop .real, clean up names

d1b5d72

import bind_rows

a95b2a8

tests

05c311f

add empty rows for all group var combos

cd4c382

add dot prefix to dots arg names

51b13a6

add support for and test sliding on ungrouped epi_dfs

8bfb4d8

nmdefries force-pushed the ndefries/epi-slide-rtv branch from 5d14ddd to 8bfb4d8 Compare May 23, 2023 21:01

nmdefries added 2 commits May 24, 2023 11:59

speed up slow filter and select steps

5564c0d

tests expect epi_df output

22b9b69

nmdefries marked this pull request as ready for review May 24, 2023 16:07

nmdefries requested review from dajmcdon and brookslogan as code owners May 24, 2023 16:08

brookslogan requested changes Jun 2, 2023

View reviewed changes

brookslogan reviewed Jun 2, 2023

View reviewed changes

R/slide.R Show resolved Hide resolved

nmdefries added 8 commits June 6, 2023 11:45

test reftimevalue access on ungrouped epidf

b7d7c53

Merge branch 'ndefries/epix-slide-dots-rtv' into ndefries/epi-slide-rtv

4aa59cb

expect output date col as date not double

e98e03d

filter result by .real when all_rows == TRUE

fed6f11

keep .real col from being accessible from user f computation

9ccfe91

use custom data mask rather than quo env to match epix_slide tidyev…

307fb6e

…al implmentation

test helper function behavior

0b48ea4

rename inrange... for phony helper dates

9db1165

nmdefries mentioned this pull request Jun 8, 2023

Pass ref_time_value to epix_slide for functions and formulas #313

Merged

brookslogan self-requested a review June 16, 2023 19:53

Describe our pronoun-like objects in epi_slide ... docs

d22b88b

brookslogan approved these changes Jun 16, 2023

View reviewed changes

brookslogan merged commit 39f4d2d into ndefries/epix-slide-dots-rtv Jun 16, 2023

brookslogan deleted the ndefries/epi-slide-rtv branch June 16, 2023 20:11

nmdefries mentioned this pull request Jun 21, 2023

Add way to refer to ref_time_value within f arg to epi_slide #171

Closed

nmdefries mentioned this pull request Jun 29, 2023

Refactor slide computation function generation and move to as_slide_computation #337

Merged

nmdefries mentioned this pull request Jan 17, 2024

Step through pre-calculated start times for each group using closure rather than using .real col in epi_slide #397

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow `epi_slide` to access `ref_time_value` #318

Allow `epi_slide` to access `ref_time_value` #318

Uh oh!

nmdefries commented May 19, 2023 •

edited

Loading

Uh oh!

nmdefries commented May 23, 2023 •

edited

Loading

Uh oh!

nmdefries commented May 23, 2023 •

edited

Loading

Uh oh!

nmdefries commented May 24, 2023 •

edited

Loading

Uh oh!

nmdefries commented May 30, 2023 •

edited

Loading

Uh oh!

brookslogan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nmdefries commented Jun 8, 2023

Uh oh!

nmdefries commented Jun 8, 2023

Uh oh!

brookslogan commented Jun 16, 2023 •

edited

Loading

Uh oh!

brookslogan left a comment

Uh oh!

Uh oh!

Allow epi_slide to access ref_time_value #318

Allow epi_slide to access ref_time_value #318

Uh oh!

Conversation

nmdefries commented May 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmdefries commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmdefries commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmdefries commented May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmdefries commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brookslogan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nmdefries commented Jun 8, 2023

Uh oh!

nmdefries commented Jun 8, 2023

Uh oh!

brookslogan commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brookslogan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Allow `epi_slide` to access `ref_time_value` #318

Allow `epi_slide` to access `ref_time_value` #318

nmdefries commented May 19, 2023 •

edited

Loading

nmdefries commented May 23, 2023 •

edited

Loading

nmdefries commented May 23, 2023 •

edited

Loading

nmdefries commented May 24, 2023 •

edited

Loading

nmdefries commented May 30, 2023 •

edited

Loading

brookslogan commented Jun 16, 2023 •

edited

Loading