Skip to content

Add way to refer to ref_time_value within f arg to epix_slide #170

Closed
@brookslogan

Description

@brookslogan

Currently, epix_slide does not make the current ref_time_value (t in the current code) available to the supplied f. This is a problem, e.g., when forecasting, where the targets for ref_time_value should be observations at ref_time_value + aheads, but there is data reporting latency, so data are only available for time_values through ref_time_value - latency. We are left with either incorrectly pretending that max(.x$time_value) is the same as ref_time_value, or making some guess or inconvenient calculation of the latency, so we can say ref_time_value = max(.x$time_value) + latency_guess_or_calculation.

Instead, we should make ref_time_value available to f. Approaches to passing:

  • Metadata of .x: doesn't work, we don't provide an epi_df. [but same issue of potentially being overlooked]
  • Attributes of .x: works across all forms of f, but is easily overlooked.
  • Extra args/pronouns: probably the best user interface, but more complicated to implement. Notes below assume that we take this approach.

Current epi[x]_slide f functions take:

  • window data
  • group key (1-row tibble with values of the grouping cols)
  • any additional args the user would like to pass via dots

Since epi_slide implementation might be trickier and take longer, it makes sense to make epix_slide's f functions take the following:

  • window data
  • group key (1-row tibble with values of the grouping cols)
  • ref_time_value
  • any additional args the user would like to pass via dots

(If instead we implement the corresponding epi_slide change simultaneously, we could place the ref_time_value before the group key if we felt it made more sense. I don't think one approach is dominant in this case, though. Often we group by geo, so group key followed by ref_time_value is sort of a geo value followed by sort of a time value, which seems natural given how we order epi_df columns. However, sometimes we'll group by nothing, group key followed by ref_time_value would be something useless followed by something useful, which is a bit unnatural.)

--- Implementation ideas: ---

Passing ref_time_value as a third arg when f is a function may require:

Passing ref_time_value when f is a formula might require:

  • Nothing beyond the above: maybe we would already be able to reference it in the formula as ..3 or ref_time_value
  • Somehow introducing a new data pronoun or data mask (using rlang::as_data_{pronoun,mask} + other rlang stuff?), still using purrr::map_dfr.
  • Some approach not using purrr::map_dfr with more freedom.

Passing ref_time_value when f is missing & ... contains summarize-like operations:

  • Somehow introduce a new data pronoun or data mask?
  • Change the execution environment, e.g., by setting the environment of f to a new child of itself, and binding ref_time_value within that child environment. (Careful, check whether we need to do something to copy f to avoid modifying the real f's environment.)
  • [Make cur_group() work and mimic it with, e.g., cur_ref_time_value().]

Metadata

Metadata

Assignees

Labels

P1medium priorityop-semanticsOperational semantics; many potentially breaking changes here

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions