Skip to content

Provide option to complete/skip incomplete windows, fill gaps in epi[x]_slide windows #256

Closed
@brookslogan

Description

@brookslogan

[Refined scope: see these comments: A, B. We can then split off any other ideas into a separate issue.]

It's very easy to write an erroneous 7-day-average or 7-day-sum slide: e.g., if we epi_slide with the natural mean or sum, we get the sum of less than n * {n key values in group} things for

  • the first n - 1L time_values
  • wherever there's a gap in availability (e.g., a skipped time_value in a/all geo_values, an unavailable geo_value across a wide span of time_values, etc.)

Ideas:

  • Have some sort of option to auto-complete the window data, as writing this is a little tedious. Furthermore, we probably need this to be the default behavior for epi_slide or some sort of messaging to go on unless the user explicitly says that they're okay without completion. When this completion is enabled, we should balk if the key columns don't form a unique key. Counterpart for epix_slide likely needs to complete a lot of leading edges through the ref_time_value, inclusive, and balk if there's anything beyond it (e.g., if archive is holding forecasts).
  • Have an option to skip over incomplete(-in-another-sense) windows for early ref_time_values by filtering down ref_time_values (perhaps as an alternative to the default ref_time_values or manual specification, perhaps as a separate option that would allow for this filtering to be applied even to manually-supplied ref_time_values). (An epi_archive equivalent might not be so simple due to the possibility of later versions extending the time series into the past or removing time_values from the past, so min(time_value) is not a fixed property of an archive group.) --- This seems closest to .complete=TRUE for slider::slide_index*, but I'm not sure we would want to ignore gaps like these functions, and there might be details to check regarding when the time_value set available varies across groups and/or "epikey" (non-time, non-version key col) values.
  • Have option to skip over or return some missing result indicator when there are incomplete windows in any sense (either because we are at the very beginning/end of the time series and the window extends past the edge, or because there are gaps).
  • Reject epi_dfs/epi_archives with gaps / gaps in slide computation inputs unless the user explicitly allows them. (For the user to easily address these gaps, we want Add a complete method for epi_dfs #250 + something for epi_archives.)

We might not really need all of these; some of them make the others unnecessary or not as useful; some brainstorming still is needed here.

Metadata

Metadata

Assignees

Labels

P1medium priorityenhancementNew feature or requestop-semanticsOperational semantics; many potentially breaking changes here

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions