-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add specialized epix_slide for epi_slide_opt #611
base: dev
Are you sure you want to change the base?
Conversation
|
2949de6
to
661cece
Compare
9903efd
to
e908168
Compare
For re-use in an epix_epi_slide_opt
fcfd9c2
to
5d24262
Compare
5d24262
to
40e8778
Compare
1fbe7b8
to
dd84924
Compare
7c3d243
to
8149aa2
Compare
We actually coincidentally caught the tibble-to-DT key-setting issue with code intended for addressing a separate issue with `data.table`s passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various nits and some questions. The logic looks fine to me, but given the complexity of this change, someone else should review as well.
"problem with {rlang::expr_label(rlang::caller_arg(f))}", | ||
"i" = "`f` must be one of `data.table`'s rolling functions (`frollmean`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
"problem with {rlang::expr_label(rlang::caller_arg(f))}", | |
"i" = "`f` must be one of `data.table`'s rolling functions (`frollmean`, | |
"problem with {rlang::expr_label(rlang::caller_arg(.f))}", | |
"i" = "`.f` must be one of `data.table`'s rolling functions (`frollmean`, |
#' @keywords internal | ||
across_ish_names_info <- function(.x, time_type, col_names_quo, .f_namer, | ||
.window_size, .align, .prefix, .suffix, .new_col_names) { | ||
# The position of a given column can be differ between input `.x` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
# The position of a given column can be differ between input `.x` and | |
# The position of a given column can differ between input `.x` and |
#' | ||
#' * On `epi_df`s, it will take care of looping over `geo_value`s, temporarily | ||
#' filling in time gaps with `NA`s and other work needed to ensure there are | ||
#' exactly n consecutive time steps per computation, and has some other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
#' exactly n consecutive time steps per computation, and has some other | |
#' exactly `n` consecutive time steps per computation, and has some other |
#' @description | ||
#' | ||
#' `epi_slide_opt` calculates n-time-step rolling means&sums, | ||
#' cumulative/"running" means&sums, or other operations supported by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
#' cumulative/"running" means&sums, or other operations supported by | |
#' cumulative/"running" means&sums, and other operations supported by |
# Most input validation + handle NULL earlier_snapshot. This is a small function so | ||
# use faster validation variants: | ||
if (!is_tibble(later_tbl)) { | ||
cli_abort("`later_tbl` must be a tibble") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: please add class names to errors.
group_updates <- group_values %>% | ||
nest(.by = version, .key = "subtbl") %>% | ||
arrange(version) | ||
# TODO move nesting inside the helper? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: some TODOs and "XXX"s left. Please either implement, or convert to issues and link to from code.
# in slide outputs in the range t1-after..t2+before, and to compute those | ||
# slide values, we need to look at the input snapshot from | ||
# t1-after-before..t2+before+after. nolint: commented_code_linter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I understand how we arrived at t1-after..t2+before
, but where did the t1-after-before....
come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose before = 6, after = 0
and there's an update to the 1d value at time t alone. We need to compute the trailing 7dav for t..t+6. But in order to compute the trailing 7dav for time t, we need 1d data from t-6..t. So we end up needing the union of the ranges t-6..t, t-5..t+1, t-4..t+2, ..., t..t+6, which gives us t-6..t+6.
if (f_from_package == "data.table") { | ||
for (col_i in seq_along(in_colnames)) { | ||
if (before == Inf) { | ||
slide[[out_colnames[[col_i]]]] <- | ||
f_dots_baked(slide[[in_colnames[[col_i]]]], seq_len(slide_nrow), adaptive = TRUE) | ||
} else { | ||
out_col <- f_dots_baked(slide[[in_colnames[[col_i]]]], before + after + 1L) | ||
if (after != 0L) { | ||
# data.table always puts NAs at tails, even with na.rm = TRUE; chop | ||
# off extra NAs from beginning and place missing NAs at end: | ||
out_col <- c(out_col[seq(after + 1L, slide_nrow)], rep(NA, after)) | ||
} | ||
slide[[out_colnames[[col_i]]]] <- out_col | ||
} | ||
} | ||
} else if (f_from_package == "slider") { | ||
for (col_i in seq_along(in_colnames)) { | ||
slide[[out_colnames[[col_i]]]] <- f_dots_baked(slide[[in_colnames[[col_i]]]], before = before, after = after) | ||
} | ||
} else { | ||
cli_abort("epiprocess internal error: `f_from_package` was {format_chr_deparse(f_from_package)}") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: data.table
functions can process more than one column at a time and I assume it's faster. So we don't need the column loop into the data.table
branch here.
if (f_from_package == "data.table") { | |
for (col_i in seq_along(in_colnames)) { | |
if (before == Inf) { | |
slide[[out_colnames[[col_i]]]] <- | |
f_dots_baked(slide[[in_colnames[[col_i]]]], seq_len(slide_nrow), adaptive = TRUE) | |
} else { | |
out_col <- f_dots_baked(slide[[in_colnames[[col_i]]]], before + after + 1L) | |
if (after != 0L) { | |
# data.table always puts NAs at tails, even with na.rm = TRUE; chop | |
# off extra NAs from beginning and place missing NAs at end: | |
out_col <- c(out_col[seq(after + 1L, slide_nrow)], rep(NA, after)) | |
} | |
slide[[out_colnames[[col_i]]]] <- out_col | |
} | |
} | |
} else if (f_from_package == "slider") { | |
for (col_i in seq_along(in_colnames)) { | |
slide[[out_colnames[[col_i]]]] <- f_dots_baked(slide[[in_colnames[[col_i]]]], before = before, after = after) | |
} | |
} else { | |
cli_abort("epiprocess internal error: `f_from_package` was {format_chr_deparse(f_from_package)}") | |
} | |
if (f_from_package == "data.table") { | |
if (before == Inf) { | |
slide[, out_colnames] <- | |
f_dots_baked(slide[, in_colnames], seq_len(slide_nrow), adaptive = TRUE) | |
} else { | |
out_cols <- f_dots_baked(slide[, in_colnames], before + after + 1L) | |
if (after != 0L) { | |
# data.table always puts NAs at tails, even with na.rm = TRUE; chop | |
# off extra NAs from beginning and place missing NAs at end: | |
out_cols <- purrr::map(out_cols, function(.x) { | |
c(.x[seq(after + 1L, slide_nrow)], rep(NA, after)) | |
}) | |
} | |
slide[, out_colnames] <- out_cols | |
} | |
} else if (f_from_package == "slider") { | |
for (col_i in seq_along(in_colnames)) { | |
slide[[out_colnames[[col_i]]]] <- f_dots_baked(slide[[in_colnames[[col_i]]]], before = before, after = after) | |
} | |
} else { | |
cli_abort("epiprocess internal error: `f_from_package` was {format_chr_deparse(f_from_package)}") | |
} |
if (after != 0L) { | ||
# data.table always puts NAs at tails, even with na.rm = TRUE; chop | ||
# off extra NAs from beginning and place missing NAs at end: | ||
out_col <- c(out_col[seq(after + 1L, slide_nrow)], rep(NA, after)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: This doesn't respect data.table
's fill
param. If a computation uses fill
, these values should not be NA
. I think what we really want to do is move the first after
values to the end of the result.
epi_slide_opt.epi_df
has the same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I think I was worried about some garbage-rather-than-NA being in the tails which is why I didn't just copy that approach, but on second look that worry seems unfounded.
prev_inp_snapshot <<- inp_snapshot | ||
prev_out_snapshot <<- out_snapshot | ||
out_diff$version <- version | ||
out_diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: this always returns a compactified archive. Will users ever want a non-compactified result? If we add a compactify
arg or inherit compactification status from the original archive, the user could toggle whether we return out_snapshot
here instead of out_diff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps; some thoughts:
- Recording all full
out_snapshot
s seems like it will potentially give a very hefty result, and they could get that same result by something likeepix_slide(~ .x)
on the result or a helper function doing something like that. - Recording just
out_snapshot
restricted to times that were ininp_update
plus any times that are required for validity (e.g., an change in just the time t 1d value producing an update to the trailing 7dav for t+1..t+6 as well as t) seems maybe more useful. I'm not sure how hard/slow this would be; I had some issues trying to selectively applycompactify_tol
to the 7dav/etc. cols and set aside that idea.
Checklist
Please:
PR).
brookslogan, nmdefries.
DESCRIPTION
. Always incrementthe patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
process.
Change explanations for reviewer
Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch