Skip to content

Commit 1d5d525

Browse files
committed
docs(epi_slide): iterate on intro, examples, motivation
1 parent 01ab5f4 commit 1d5d525

File tree

3 files changed

+171
-90
lines changed

3 files changed

+171
-90
lines changed

R/slide.R

+87-45
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,27 @@
11
#' More general form of [`epi_slide_opt`] for rolling/running computations
22
#'
3-
#' Check first whether you can use [`epi_slide_mean`], [`epi_slide_sum`], or the
4-
#' medium-generality [`epi_slide_opt`] instead, as they are faster and more
5-
#' convenient to use. You typically only need to use `epi_slide()` if you have a
6-
#' computation that depends on multiple columns simultaneously, outputs multiple
7-
#' columns simultaneously, or produces non-numeric output.
3+
#' Most rolling/running computations can be handled by [`epi_slide_mean`],
4+
#' [`epi_slide_sum`], or the medium-generality [`epi_slide_opt`] functions
5+
#' instead, which are much faster. You typically only need to consider
6+
#' `epi_slide()` if you have a computation that depends on multiple columns
7+
#' simultaneously, outputs multiple columns simultaneously, or produces
8+
#' non-numeric output. For example, this computation depends on multiple
9+
#' columns:
810
#'
911
#' ```
10-
#' # Create new column `cases_7dmed` that contains a 7-day trailing median of cases
11-
#' epi_slide(edf, cases_7dmed = median(cases), .window_size = 7)
12+
#' cases_deaths_subset %>%
13+
#' epi_slide(
14+
#' cfr_estimate_v0 = death_rate_7d_av[[22]]/case_rate_7d_av[[1]],
15+
#' .window_size = 22
16+
#' ) %>%
17+
#' print(n = 30)
1218
#' ```
1319
#'
14-
#' For two very common use cases, we provide optimized functions that are much
15-
#' faster than `epi_slide`: `epi_slide_mean()` and `epi_slide_sum()`. We
16-
#' recommend using these functions when possible.
20+
#' (Here, the value 22 was selected using `epi_cor()` and averaging across
21+
#' `geo_value`s. See
22+
#' \href{https://www.medrxiv.org/content/10.1101/2024.12.27.24319518v1}{this
23+
#' manuscript}{this manuscript} for some warnings & information using similar
24+
#' types of CFR estimators.)
1725
#'
1826
#' See `vignette("epi_df")` for more examples.
1927
#'
@@ -34,15 +42,19 @@
3442
#'
3543
#' - Don't provide `.f`, and instead use use one or more
3644
#' [`dplyr::summarize`]-esque ["data-masking"][rlang::args_data_masking]
37-
#' expressions in `...`, e.g., `cases_7dmed = median(cases)`. This is usually
38-
#' the most convenient way to use `epi_slide`. See examples.
45+
#' expressions in `...`, e.g., `cfr_estimate_v0 =
46+
#' death_rate_7d_av[[22]]/case_rate_7d_av[[1]]`. This way is sometimes more
47+
#' convenient, but also has the most computational overhead.
3948
#'
40-
#' - Provide a formula in `.f`, e.g., `~ median(.x$cases)`. In this formula,
41-
#' `.x` is an `epi_df` containing data for a single time window as described
42-
#' above, taken from the original `.x` fed into `epi_slide()`.
49+
#' - Provide a formula in `.f`, e.g., `~
50+
#' .x$death_rate_7d_av[[22]]/.x$case_rate_7d_av[[1]]`. In this formula, `.x`
51+
#' is an `epi_df` containing data for a single time window as described above,
52+
#' taken from the original `.x` fed into `epi_slide()`.
4353
#'
44-
#' - Provide a function in `.f`. The function should be of the form `function(x,
45-
#' g, t)` or `function(x, g, t, <additional configuration arguments>)`, where:
54+
#' - Provide a function in `.f`, e.g., `function(x, g, t)
55+
#' x$death_rate_7d_av[[22]]/x$case_rate_7d_av[[1]]`. The function should be of
56+
#' the form `function(x, g, t)` or `function(x, g, t, <additional
57+
#' configuration arguments>)`, where:
4658
#'
4759
#' - `x` is a data frame with the same column names as the original object,
4860
#' minus any grouping variables, with only the windowed data for one
@@ -60,7 +72,8 @@
6072
#' The values of `g` and `t` are also available to data-masking expression and
6173
#' formula-based computations as `.group_key` and `.ref_time_value`,
6274
#' respectively. Formula computations also let you use `.y` or `.z`,
63-
#' respectively.
75+
#' respectively, as additional names for these same quantities (similar to
76+
#' [`dplyr::group_modify`]).
6477
#'
6578
#' @param ... Additional arguments to pass to the function or formula specified
6679
#' via `.f`. Alternatively, if `.f` is missing, then the `...` is interpreted
@@ -73,6 +86,26 @@
7386
#' be given names that clash with the existing columns of `.x`.
7487
#'
7588
#' @details
89+
#'
90+
#' ## Motivation and lower-level alternatives
91+
#'
92+
#' `epi_slide()` is focused on preventing errors and providing a convenient
93+
#' interface. If you need computational speed, many computations can be optimized
94+
#' by one of the following:
95+
#'
96+
#' * Performing core sliding operations with `epi_slide_opt()` with
97+
#' `frollapply`, and using potentially-grouped `mutate()`s to transform or
98+
#' combine the results.
99+
#'
100+
#' * Grouping by `geo_value` and any `other_keys`; [`complete()`]ing with
101+
#' `full_seq()` to fill in time gaps; `arrange()`ing by `time_value`s within
102+
#' each group; using `mutate()` with vectorized operations and shift operators
103+
#' like `dplyr::lead()` and `dplyr::lag()` to perform the core operations,
104+
#' being careful to give the desired results for the least and most recent
105+
#' `time_value`s (often `NA`s for the least recent); ungrouping; and
106+
#' `filter()`ing back down to only rows that existed before the `complete()`
107+
#' stage if necessary.
108+
#'
76109
#' ## Advanced uses of `.f` via tidy evaluation
77110
#'
78111
#' If specifying `.f` via tidy evaluation, in addition to the standard [`.data`]
@@ -96,34 +129,43 @@
96129
#' @examples
97130
#' library(dplyr)
98131
#'
99-
#' # Get the 7-day trailing standard deviation of cases and the 7-day trailing mean of cases
100-
#' cases_deaths_subset %>%
132+
#' # Generate some simple time-varying CFR estimates:
133+
#' with_cfr_estimates <- cases_deaths_subset %>%
101134
#' epi_slide(
102-
#' cases_7sd = sd(cases, na.rm = TRUE),
103-
#' cases_7dav = mean(cases, na.rm = TRUE),
104-
#' .window_size = 7
105-
#' ) %>%
106-
#' select(geo_value, time_value, cases, cases_7sd, cases_7dav)
107-
#' # Note that epi_slide_mean could be used to more quickly calculate cases_7dav.
135+
#' cfr_estimate_v0 = death_rate_7d_av[[22]]/case_rate_7d_av[[1]],
136+
#' .window_size = 22
137+
#' )
138+
#' with_cfr_estimates %>%
139+
#' print(n = 30)
140+
#' # (Here, the value 22 was selected using `epi_cor()` and averaging across
141+
#' # `geo_value`s. See
142+
#' # https://www.medrxiv.org/content/10.1101/2024.12.27.24319518v1 for some
143+
#' # warnings & information using CFR estimators along these lines.)
108144
#'
109-
#' # In addition to the [`dplyr::mutate`]-like syntax, you can feed in a function or
110-
#' # formula in a way similar to [`dplyr::group_modify`]:
111-
#' my_summarizer <- function(window_data) {
112-
#' window_data %>%
113-
#' summarize(
114-
#' cases_7sd = sd(cases, na.rm = TRUE),
115-
#' cases_7dav = mean(cases, na.rm = TRUE)
116-
#' )
145+
#' # In addition to the [`dplyr::mutate`]-like syntax, you can feed in a
146+
#' # function or formula in a way similar to [`dplyr::group_modify`]; these
147+
#' # often run much more quickly:
148+
#' my_computation <- function(window_data) {
149+
#' tibble(
150+
#' cfr_estimate_v0 = window_data$death_rate_7d_av[[nrow(window_data)]] /
151+
#' window_data$case_rate_7d_av[[1]]
152+
#' )
117153
#' }
118-
#' cases_deaths_subset %>%
154+
#' with_cfr_estimates2 <- cases_deaths_subset %>%
119155
#' epi_slide(
120-
#' ~ my_summarizer(.x),
121-
#' .window_size = 7
122-
#' ) %>%
123-
#' select(geo_value, time_value, cases, cases_7sd, cases_7dav)
124-
#'
125-
#'
126-
#'
156+
#' ~ my_computation(.x),
157+
#' .window_size = 22
158+
#' )
159+
#' with_cfr_estimates3 <- cases_deaths_subset %>%
160+
#' epi_slide(
161+
#' function(window_data, g, t) {
162+
#' tibble(
163+
#' cfr_estimate_v0 = window_data$death_rate_7d_av[[nrow(window_data)]] /
164+
#' window_data$case_rate_7d_av[[1]]
165+
#' )
166+
#' },
167+
#' .window_size = 22
168+
#' )
127169
#'
128170
#'
129171
#' #### Advanced: ####
@@ -586,9 +628,9 @@ get_before_after_from_window <- function(window_size, align, time_type) {
586628
#'
587629
#' `epi_slide_opt` allows you to use any [data.table::froll] or
588630
#' [slider::summary-slide] function. If none of the specialized functions here
589-
#' work, you can use `data.table::frollapply` with your own function. See
590-
#' [`epi_slide`] if you need to work with multiple columns at once or output a
591-
#' custom type.
631+
#' work, you can use `data.table::frollapply` together with a non-rolling
632+
#' function (e.g., `median`). See [`epi_slide`] if you need to work with
633+
#' multiple columns at once or output a custom type.
592634
#'
593635
#' @template basic-slide-params
594636
#' @param .col_names <[`tidy-select`][dplyr_tidy_select]> An unquoted column

man/epi_slide.Rd

+81-42
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/epi_slide_opt.Rd

+3-3
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)