Open
Description
In developing epix_rbind()
@dsweber2 identified some gnarly edge cases because NAs are overloaded in epi_archive
s, especially if we allow epi_archive
s holding partial version histories. We don't currently allow partial version histories, and some other operations (besides the epix_merge()
output ambiguity) may malfunction if we try to use them.
Augmenting the epi_archive
format may help disambiguate sources of missingness in diff data.
One potential scheme: add an extra column per signal (like NA codes in the API) indicating for each NA measurement, whether
- it's an explicit NA
- it's a missing row, or
- it should be LOCF'd from the previous version
Or maybe something like one of the original epi_archive
formats considered: flagging every measurement (NA or otherwise) with:
- add this measurement
- change this measurement
- remove this measurement
Some things to think about:
- How do we determine these flags? I'm not sure we get add/change/remove information out of
issues
queries from epidatr, for example. And we need to think about variousepix_*()
functions as well. - Would the user ever need to specify these flags, or would we be able to add these on automatically?
- What if we try combining version histories randomly from different parts of history, or haphazardly alternating between doing rbinds and merges to arrive at a full archive? Will we need some extra metadata? Will it blow the entire thing up? Can we put some restrictions in to disallow any tricky cases?
- Performance implications: time and memory cost of dealing with this. Do/can we make it opt-in if it's expensive?