Skip to content

Consider adding version-lag-subsetting/windowing operation #272

Open
@brookslogan

Description

@brookslogan

Currently, epi_archives are intended to track the full version history of every observation in a data set. However, sometimes we do not care as much about revisions past some "anchor version lag", i.e., with version > time_value + anchor_version_lag, and including this data can use more / fail to fit in available RAM, and make data harder to inspect via the DT without doing such filtering manually (--- and a simple filter only works for applying a max lag; see below). We may also want to integrate this as a parameter to epix_slide with all_versions=TRUE, so we're not unnecessarily creating near-copies of the DT when using long time windows on recent ref time values.

A simple max-version-lag can be performed by filtering rows in the obvious manner. However, version-lag-subsetting or a min-version-lag or min version will require smashing together / merging version data. This can probably be accomplished by interpreting the lag subset / lag filter as a "coarsening" of the version column, partitioning the data based on the coarsened version, smashing versions together by using epix_as_of(corresponding_coarsened_version, ......) / a rolling join / a fromLast unique on each partition member, tagging with the appropriate coarsened version, then recombining.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions