Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add separate epi_archive constructor based on a "iterator"/list/df of snapshots #173

Open
brookslogan opened this issue Jul 27, 2022 · 3 comments
Labels
enhancement New feature or request P1 medium priority

Comments

@brookslogan
Copy link
Contributor

See #172 for background. This would work around some of the caveats noted there by:

  • Detecting row deletions, probably just representing them as an update row with all non-key columns NA.
  • For each explicit version, check whether a snapshot of next_after(version) was provided. If not, insert a snapshot revising all observations to NA with this version tag (next_after(version)). ---- This is pretty inefficient space-wise. Maybe we would want to do something with the design of epi_archive to make more efficient options available.
@brookslogan brookslogan added enhancement New feature or request P3 very low priority labels Jul 27, 2022
@brookslogan brookslogan changed the title Add special constructor function for an epi_archive based on a list/data.table of snapshots Add separate epi_archive constructor based on a list/data.table of snapshots Jul 27, 2022
@brookslogan brookslogan changed the title Add separate epi_archive constructor based on a list/data.table of snapshots Add separate epi_archive constructor based on a "iterator"/list/data.table of snapshots Mar 2, 2023
@brookslogan brookslogan added P1 medium priority and removed P3 very low priority labels Mar 2, 2023
@brookslogan brookslogan changed the title Add separate epi_archive constructor based on a "iterator"/list/data.table of snapshots Add separate epi_archive constructor based on a "iterator"/list/df of snapshots Mar 2, 2023
@brookslogan
Copy link
Contributor Author

We can be a lot more memory-friendly than an in-memory list/df of snapshots, by only reading in one / a few snapshots at a time, and compactifying along the way (plus suggesting to save the compactified version on disk if we compactified a lot, or more complicated things regarding proposed "updating" archives that are out of scope for this initial feature).

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 5, 2025

Status: I have the iterator stuff ready in a separate repo + a version of a map_ea. Documenting, testing, and refactoring to reduce duplication with existing stuff means it will still take some work to incorporate. The extent of refactoring and how much to split off to another time TBD.

@brookslogan
Copy link
Contributor Author

Going to punt this down the road in favor of doc updates. See map_ea here, iterators here (iterb approach seems probably preferable), and tbl_{diff2,patch} in #611.

@brookslogan brookslogan removed their assignment Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P1 medium priority
Projects
None yet
Development

No branches or pull requests

1 participant