Description
Currently, epi_archive
s are intended to track the full version history of every observation in a data set. However, sometimes we do not care as much about revisions past some "anchor version lag", i.e., with version > time_value + anchor_version_lag
, and including this data can use more / fail to fit in available RAM, and make data harder to inspect via the DT
without doing such filtering manually (--- and a simple filter only works for applying a max lag; see below). We may also want to integrate this as a parameter to epix_slide
with all_versions=TRUE
, so we're not unnecessarily creating near-copies of the DT when using long time windows on recent ref time values.
A simple max-version-lag can be performed by filtering rows in the obvious manner. However, version-lag-subsetting or a min-version-lag or min version will require smashing together / merging version data. This can probably be accomplished by interpreting the lag subset / lag filter as a "coarsening" of the version column, partitioning the data based on the coarsened version, smashing versions together by using epix_as_of(corresponding_coarsened_version, ......)
/ a rolling join / a fromLast
unique
on each partition member, tagging with the appropriate coarsened version, then recombining.