Skip to content

Provide option to complete/skip incomplete windows, fill gaps in epi[x]_slide windows #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brookslogan opened this issue Dec 5, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request op-semantics Operational semantics; many potentially breaking changes here P1 medium priority

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Dec 5, 2022

[Refined scope: see these comments: A, B. We can then split off any other ideas into a separate issue.]

It's very easy to write an erroneous 7-day-average or 7-day-sum slide: e.g., if we epi_slide with the natural mean or sum, we get the sum of less than n * {n key values in group} things for

  • the first n - 1L time_values
  • wherever there's a gap in availability (e.g., a skipped time_value in a/all geo_values, an unavailable geo_value across a wide span of time_values, etc.)

Ideas:

  • Have some sort of option to auto-complete the window data, as writing this is a little tedious. Furthermore, we probably need this to be the default behavior for epi_slide or some sort of messaging to go on unless the user explicitly says that they're okay without completion. When this completion is enabled, we should balk if the key columns don't form a unique key. Counterpart for epix_slide likely needs to complete a lot of leading edges through the ref_time_value, inclusive, and balk if there's anything beyond it (e.g., if archive is holding forecasts).
  • Have an option to skip over incomplete(-in-another-sense) windows for early ref_time_values by filtering down ref_time_values (perhaps as an alternative to the default ref_time_values or manual specification, perhaps as a separate option that would allow for this filtering to be applied even to manually-supplied ref_time_values). (An epi_archive equivalent might not be so simple due to the possibility of later versions extending the time series into the past or removing time_values from the past, so min(time_value) is not a fixed property of an archive group.) --- This seems closest to .complete=TRUE for slider::slide_index*, but I'm not sure we would want to ignore gaps like these functions, and there might be details to check regarding when the time_value set available varies across groups and/or "epikey" (non-time, non-version key col) values.
  • Have option to skip over or return some missing result indicator when there are incomplete windows in any sense (either because we are at the very beginning/end of the time series and the window extends past the edge, or because there are gaps).
  • Reject epi_dfs/epi_archives with gaps / gaps in slide computation inputs unless the user explicitly allows them. (For the user to easily address these gaps, we want Add a complete method for epi_dfs #250 + something for epi_archives.)

We might not really need all of these; some of them make the others unnecessary or not as useful; some brainstorming still is needed here.

@brookslogan brookslogan added enhancement New feature or request P1 medium priority op-semantics Operational semantics; many potentially breaking changes here labels Dec 5, 2022
@brookslogan
Copy link
Contributor Author

One aspect of this (perhaps overlooked in the avoid bullet points) can be attacked via group_by(....., .drop=FALSE) with epix_slide, which is in the lcb/make_epix_slide_more_reframe_like branch.

@brookslogan
Copy link
Contributor Author

See also #322, which we might want to tackle simultaneously.

@brookslogan
Copy link
Contributor Author

brookslogan commented Jun 28, 2023

We also planned to go back to n, align, before simultaneously. [epi_slide would take all three of these args; epix_slide would only take n. One corner case of epix_slide would be if the archive includes forecasts, but this is not a major use case so it could potentially be ignored / receive little thought and filed off into another issue.]

@brookslogan
Copy link
Contributor Author

For the first pass, I think we are probably just thinking of two options:

  • Complete the inputs to slide computations, by default.
  • Alternatively, don't complete them.

For future iterations, we might consider some of the other options above.

@dshemetov
Copy link
Contributor

Addressed in #477 by having epi_slide complete the dates in in every geo group by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request op-semantics Operational semantics; many potentially breaking changes here P1 medium priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants