Add specialized epix_slide for epi_slide_opt #611

brookslogan · 2025-02-26T21:15:06Z

Checklist

Please:

Make sure this PR is against "dev", not "main" (unless this is a release
PR).
Request a review from one of the current main reviewers:
brookslogan, nmdefries.
Makes sure to bump the version number in DESCRIPTION. Always increment
the patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
See DEVELOPMENT.md for more information on the development
process.

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Resolves Implement smart sparse archive -> archive slide #609.

TODO

Address review comments
Address change in epi_slide_opt(.before = Inf) behavior from refactor (CHECK check not running at the moment despite some others running?)

brookslogan · 2025-02-26T21:40:27Z

handle all possible .f, before
finish docs, tests
[-] port map_ea at least, maybe base on iterators
- handle in Add separate epi_archive constructor based on a "iterator"/list/df of snapshots #173 or beyond
in-source todos
check whether frollapply on function that errors on ref time values not in ref_time_values will error in current scheme; consider adding skipping layer for it or documenting

comments and some nit rewrites

14% or 17% time reduction for 7daving on a couple of test archives.

…chars Something still seems different; some Date arithmetic stuck out, and attempts to avoid helped somewhat, but this still seems slower for archives than pre-unified approach.

As `vec_c` is slower on 3 scalar `Date`s, and it's probably pretty uncommon to use `time_value`s with `c` incompatible with `vec_c` (though possible... maybe tibbles with year, week, wday, like from MMWRweek; though these would probably break inside archive DT).

Revert earlier change to try to reduce time inside `time_minus_time_in_n_steps`, as `c.Date` is more costly. This also works back towards generality from the previous `vec_c` -> `c` change.

Also change a map -> lapply so we can get srcref to internal error immediately, add a missed .window_size-missingness check, delete some helper functions that were sort of helpful but also tacked on significant run time when used in a natural way.

The error message says that slide computations can't be rownamed data frames. This is both bad wording (having rownames != having non-automatic rownames) and we're not actually enforcing any restriction of the sort anyway. We drop/ignore the rownames though (via dplyr/tibble). In `epix_slide`, the allowance of non-automatic rownames was deliberate; in `epi_slide`, it also seems likely the way to go; one might convert to `data.frame` and have some filter-like operation return non-automatic rownames.

Use `... =` to specify this. Prevents some confusing error messages on ``` archive_cases_dv_subset %>% epi_slide_opt(percent_cli, frollmean, 0, .window_size = 7) ``` but does not solve all issues that can happen with unnamed `...`.

dsweber2 assigned brookslogan Feb 26, 2025

brookslogan force-pushed the lcb/archive-agg branch 6 times, most recently from 2949de6 to 661cece Compare February 28, 2025 05:33

brookslogan mentioned this pull request Feb 28, 2025

Add abstraction to allow for easier implementation of streaming PID recalibration #616

Open

brookslogan force-pushed the lcb/archive-agg branch 15 times, most recently from 57dd9ea to 6dcf722 Compare March 7, 2025 01:30

brookslogan force-pushed the lcb/archive-agg branch from 3a147d0 to 6d801a2 Compare March 7, 2025 21:46

brookslogan mentioned this pull request Mar 10, 2025

[meta] data.table and dplyr footguns #618

Open

brookslogan force-pushed the lcb/archive-agg branch 3 times, most recently from 1fbe7b8 to dd84924 Compare March 10, 2025 19:52

brookslogan marked this pull request as ready for review March 10, 2025 20:00

brookslogan added 4 commits April 4, 2025 17:01

Merge pull request #652 from cmu-delphi/archive-agg-nits

d1ba68e

comments and some nit rewrites

Simplify tbl_diff2 with tbl_fast_anti_join

71116cb

fix(tbl_fast_anti_join): include non-ukey, non-val cols in result

1c5f3c9

Comment, clean, style, fix @Keywords in archive opt slide & helpers

5fc1dc5

brookslogan force-pushed the lcb/archive-agg branch from 09a1af5 to 5fc1dc5 Compare April 6, 2025 23:41

brookslogan and others added 3 commits April 6, 2025 23:43

docs: document (GHA)

436fbba

Fix incomplete rename refactor

5422363

perf(epi_slide_opt.epi_archive): more [ -> vec_slice changes

f48fbf0

14% or 17% time reduction for 7daving on a couple of test archives.

brookslogan force-pushed the lcb/archive-agg branch from 665d558 to f48fbf0 Compare April 7, 2025 18:16

brookslogan added 14 commits April 7, 2025 12:53

docs(patch.R): fix comment grammar

ad0ec0f

fix: add missing importFrom

172c3c1

Refactor and speed up branch of vec_approx_equal NA/NaN testing

7a5708f

Test unifying epi_slide_opt inner comps between edf and archive

fdee9de

Refactor unified epi_slide_opt_one_epikey for clarity

68065c6

Clean up slide range logic

f064578

perf(unit_time_delta): faster arg matching

5295f1b

refactor: handle time +/- Inf in helper function

c568774

Attempt rewriting combined opt slide logic to match old archive perf …

80966e2

…chars Something still seems different; some Date arithmetic stuck out, and attempts to avoid helped somewhat, but this still seems slower for archives than pre-unified approach.

perf: avoid c.Date when possible

4ec96be

Revert earlier change to try to reduce time inside `time_minus_time_in_n_steps`, as `c.Date` is more costly. This also works back towards generality from the previous `vec_c` -> `c` change.

perf: old, faster slide edge-trimming args are actually usable

cc1a563

fix(time_plus_slide_window_arg): on non-integerish, non-Inf y

4792aea

brookslogan force-pushed the lcb/archive-agg branch from e42d6b4 to 62375f8 Compare April 10, 2025 22:29

brookslogan added 6 commits April 10, 2025 17:20

refactor(epi_slide_opt): helper function&variable naming, arg ordering

68de47b

More internal renames and documentation

4763e36

perf: c -> list when specifying slide out date min, max

184f98a

fix(epi_slide_opt): on out_filter_time_set narrowing to 0

a82d48d

fix: purrr::partial missing future arg placement

d962101

Use `... =` to specify this. Prevents some confusing error messages on ``` archive_cases_dv_subset %>% epi_slide_opt(percent_cli, frollmean, 0, .window_size = 7) ``` but does not solve all issues that can happen with unnamed `...`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add specialized epix_slide for epi_slide_opt #611

Add specialized epix_slide for epi_slide_opt #611

Uh oh!

brookslogan commented Feb 26, 2025 •

edited

Loading

Uh oh!

brookslogan commented Feb 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add specialized epix_slide for epi_slide_opt #611

Are you sure you want to change the base?

Add specialized epix_slide for epi_slide_opt #611

Uh oh!

Conversation

brookslogan commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

TODO

Uh oh!

brookslogan commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

brookslogan commented Feb 26, 2025 •

edited

Loading

brookslogan commented Feb 26, 2025 •

edited

Loading