-
Notifications
You must be signed in to change notification settings - Fork 8
group_by()
for epi_archive
objects
#64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I feel like the Pro is pretty major, and the Con is pretty minor. But maybe I'm not representative. My potential confusion: For On the other hand, the grouping/not behaviour in |
I think the drawback Ryan is pointing out is not that we need to
[This con] would be [re]solved by moving away from R6+data.table for epi_archive and instead using list+lazy_dt, or having epi_archive be a non-reference-semantics wrapper on top of an R6 EpiArchive backend. We decided this move was low priority though. |
Addressing/ameliorating the Con:
|
Oh, I see. Thanks Logan! |
@lcbrooks @dajmcdon What is your current thinking on this? Based on our conversation yesterday, it seems like we want to abide by the rule:
So that means that we could go with something like Logan's "medium" solution: Are we OK with the fact that we don't clone the underlying Thoughts? |
Thinking about this clone-based "medium" solution: Just regular There's a bit of complication due to the combination of S3 and R6. A couple of decisions to make:
Top contenders are
A1 involves more S3 implementations, but could also catch some issues when we don't want to directly inherit epi_archive behavior. B1 is closest to dplyr, but might or might not confuse R6 users with the triple class. C2 might be a little confusing with group methods listed for ungrouped archives. I can't see a clear winner here. Maybe A1 or B1 over C2. |
@lcbrooks Thanks Logan for the super detailed proposal and analysis! I'm in favor of B1. The only negative you point out is that it has a triple class, but I don't really see this as a negative to be honest. Tagging @dajmcdon to see if he has any further thoughts, but barring that, I think to get the ball rolling, you could go ahead and implement this. Seems like it could be a good idea to address #67 in the same PR since it's a related issue. |
Returning to this. I was trying to write a justification for B1 in a comment, and in doing so convinced myself to switch to A1, because looking at, e.g., the backcasting preprocessors and compactify, I think we're going to want to force ourselves&users be precise about when we're dealing with an ungrouped vs. a grouped |
I have what looks like a working implementation on lcb/grouped_epi_archive. I originally hoped to keep the
Installation issue
|
Musings on Having library(data.table)
DT = data.table(a = 2:6, b=3:7)
c = 1:5 # potentially uses ALTREP
d = c(1,5,2,6,3) # probably no ALTREP
old_DT_address = address(DT)
old_a_address = address(DT$a)
set(DT, , c("b", "c", "d"), list(NULL, c, d))
address(DT) == old_DT_address
#> [1] TRUE
address(DT$a) == old_a_address
#> [1] TRUE
address(DT$c) == address(c)
#> [1] FALSE
address(DT$d) == address(d)
#> [1] FALSE Created on 2022-09-14 with reprex v2.0.2 It would stay more true to the So maybe we have --- Conclusion: --- |
Should we implement
group_by()
as public method in theepi_archive
object? We could do that since it's an R6 object, and whengroup_by()
is called, we could just have it set a private field calledgroup_keys
to whatever variables are passed.The only downstream behavior this would affect is the
as_of()
andslide()
methods for theepi_archive
object. (These are its own public methods.) This would either return a groupepi_df
snapshot, or do a grouped sliding computation, respectively.Pros:
epi_archive
objectx
to do:geo_value
. This would be precisely analogous to how we would do it on anepi_df
object withepi_slide()
. Currently we have to set the grouping variables as an argument toepix_slide()
.Cons:
epix_slide()
, the objectx
above would still be grouped. This is because R6 objects obey reference semantics. So we'd have toungroup()
it, which may be a bit counterintuitive since it's not howdplyr
verbs work on data frames or tibbles.@lcbrooks @dajmcdon @jacobbien What do you guys think?
The text was updated successfully, but these errors were encountered: