Description
- Intersects with Draft and discuss naming schemes for
epix_slide
parameters, output #163, but the desired approach here is more settled, and any renaming decided on there could be done later. - Relates to a similar feature we'd like in
epi_slide
, but this is more important + easier to implement: Add way to refer toref_time_value
withinf
arg toepi_slide
#171 - Implementation approaches used here may be applicable to Make tidy sliding play nicely with data frame inputs #55 and vice versa.
- Interacts with Replace
n
inepi[x]_slide
withbefore
,after
#161.
Currently, epix_slide
does not make the current ref_time_value
(t
in the current code) available to the supplied f
. This is a problem, e.g., when forecasting, where the targets for ref_time_value
should be observations at ref_time_value + aheads
, but there is data reporting latency, so data are only available for time_value
s through ref_time_value - latency
. We are left with either incorrectly pretending that max(.x$time_value)
is the same as ref_time_value
, or making some guess or inconvenient calculation of the latency, so we can say ref_time_value = max(.x$time_value) + latency_guess_or_calculation
.
Instead, we should make ref_time_value
available to f
. Approaches to passing:
- Metadata of
.x
: doesn'twork, we don'tprovide an epi_df. [but same issue of potentially being overlooked] - Attributes of
.x
: works across all forms off
, but is easily overlooked. - Extra args/pronouns: probably the best user interface, but more complicated to implement. Notes below assume that we take this approach.
Current epi[x]_slide f
functions take:
- window data
- group key (1-row tibble with values of the grouping cols)
- any additional args the user would like to pass via dots
Since epi_slide
implementation might be trickier and take longer, it makes sense to make epix_slide
's f
functions take the following:
- window data
- group key (1-row tibble with values of the grouping cols)
- ref_time_value
- any additional args the user would like to pass via dots
(If instead we implement the corresponding epi_slide change simultaneously, we could place the ref_time_value before the group key if we felt it made more sense. I don't think one approach is dominant in this case, though. Often we group by geo, so group key followed by ref_time_value is sort of a geo value followed by sort of a time value, which seems natural given how we order epi_df columns. However, sometimes we'll group by nothing, group key followed by ref_time_value would be something useless followed by something useful, which is a bit unnatural.)
--- Implementation ideas: ---
Passing ref_time_value as a third arg when f
is a function may require:
- Updating the docs to indicate that
f
must take 3 positional args + user's custom args. - Ideally, validating that user's
f
follows this format. See Validatef
function inepi[x]_slide
; give better feedback if doesn't take enough args #168. - Changing signature of
comp_one_group
tofunction(.data_group, g, <rest of the current args>)
and callingf(.data_group, g, ref_time_value=t, ...)
.
Passing ref_time_value
when f
is a formula might require:
- Nothing beyond the above: maybe we would already be able to reference it in the formula as
..3
orref_time_value
- Somehow introducing a new data pronoun or data mask (using
rlang::as_data_{pronoun,mask}
+ otherrlang
stuff?), still usingpurrr::map_dfr
. - Some approach not using
purrr::map_dfr
with more freedom.
Passing ref_time_value
when f
is missing & ...
contains summarize-like operations:
- Somehow introduce a new data pronoun or data mask?
- Change the execution environment, e.g., by setting the environment of
f
to a new child of itself, and bindingref_time_value
within that child environment. (Careful, check whether we need to do something to copyf
to avoid modifying the realf
's environment.) - [Make
cur_group()
work and mimic it with, e.g.,cur_ref_time_value()
.]