Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider deprecating/moving .f, .new_col_name in epi_slide(); data-masking ... and vector outputs in epix_slide() #629

Open
brookslogan opened this issue Mar 12, 2025 · 4 comments
Assignees

Comments

@brookslogan
Copy link
Contributor

The slide function documentation is wordy and complicated, and the code is very complicated.

Consider making epi_slide() support only the tidyeval case, removing .f and .new_col_name. Maybe consider moving .window_size before ... so one could do edf %>% epi_slide(28, md_rate = median(rate)). Would play better with tibble(outcol1 = ...., outcol2 = ....) tidyeval feature, as there is no need to add an extra comma to make .f missing. Would also play better if we eventually add across support. Potential drawbacks:

  • Tidyeval overhead may mean this is significantly slower than alternatives.
  • Requires some arcane syntax for more complicated computations; my_growth_rate = { <multiple statements> }.

In epix_slide(), function and formula .f and outputting data frames, not ordinary vectors, is more common, though data-masking and ordinary-vector output might be used for some max time_value / reporting latency calculations. We could potentially force function/formula usage.

It'd also be good to do this at the same time as, or after, renaming epi_slide() and epix_slide() to more distinct names.

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 12, 2025

Alternatively, consider moving .f later in the epi_slide() args rather than removing entirely, so its documentation appears later, and must be called by name if used.

We also might be able to standardize around tidy eval by focusing on the unnamed-tibble data-masking option, and having epix_slide() also (re)move .f in favor of ... containing

archive %>%
  epix_slide({
    <lots of code involving .x>
    tibble(out_col_1 = 1:5, out_col_2 = 2:6)
  })

@brookslogan brookslogan changed the title Consider deprecating .f, .new_col_name in epi_slide(); data-masking ... and vector outputs in epix_slide() Consider deprecating/moving .f, .new_col_name in epi_slide(); data-masking ... and vector outputs in epix_slide() Mar 12, 2025
@brookslogan
Copy link
Contributor Author

Nat noted that it may be okay for epi_slide() to be more complicated since it should only be resorted to if epi_slide_opt&co. don't work. We should revisit renaming ideas for these.

@dshemetov
Copy link
Contributor

dshemetov commented Mar 12, 2025

Agree with Nat's note that epi_slide() seems more aimed at general (and thus more advanced) use cases, so it's not clear to me that this is worth it. FWIW, I also tend to prefer the .f interface instead of tidyeval, mostly because it's much harder to reason about from a developer point of view, and harder to do programmatic things with the columns, which require across-type abstractions, which seem even harder to reason about as a developer.

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 17, 2025

Related: these messages about when the generality of epi_slide() is actually needed, and it's not actually the rolling-median case (there's epi_slide_opt() w/ frollapply, median for that which seemed much faster in a one-off test, though I guess IMO it's not as nice to write). It's the more advanced cases as @dshemetov notes.

Random notes:

  • We don't have proper across support in epi[x]_slide(); you have to route through a dplyr verb that supports it, which performs across overhead on every computation, which is slow, less natural to write, and could be done with .f as well.
  • You can actually could very similar to .f using tidyeval:
tibble(a = 1:5, b = 2:6, c = 3:7) %>%
  summarize({
    .x <- pick(everything())
    tibble(
      min_a = min(.x$a),
      min_bpc = min(.x$b + .x$c)
    )
  })

though we also don't support pick() either in epi_slide() and instead supply .x & some other pronoun-ish things directly, so... maybe not convenient for people who use tidyeval for everything either. Plus we require a leading comma before any unnamed-tibble computation like the above, so that .f will be missing. (Part of the motivation for this candidate change would be to remove .f so that that leading comma isn't required.)

Related: #478

Limiting the epix_slide() interface still hasn't really been discussed. Might not make as much sense if we're not limiting epi_slide().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants