|
1 | 1 | #' More general form of [`epi_slide_opt`] for rolling/running computations
|
2 | 2 | #'
|
3 |
| -#' Check first whether you can use [`epi_slide_mean`], [`epi_slide_sum`], or the |
4 |
| -#' medium-generality [`epi_slide_opt`] instead, as they are faster and more |
5 |
| -#' convenient to use. You typically only need to use `epi_slide()` if you have a |
6 |
| -#' computation that depends on multiple columns simultaneously, outputs multiple |
7 |
| -#' columns simultaneously, or produces non-numeric output. |
| 3 | +#' Most rolling/running computations can be handled by [`epi_slide_mean`], |
| 4 | +#' [`epi_slide_sum`], or the medium-generality [`epi_slide_opt`] functions |
| 5 | +#' instead, which are much faster. You typically only need to consider |
| 6 | +#' `epi_slide()` if you have a computation that depends on multiple columns |
| 7 | +#' simultaneously, outputs multiple columns simultaneously, or produces |
| 8 | +#' non-numeric output. For example, this computation depends on multiple |
| 9 | +#' columns: |
8 | 10 | #'
|
9 | 11 | #' ```
|
10 |
| -#' # Create new column `cases_7dmed` that contains a 7-day trailing median of cases |
11 |
| -#' epi_slide(edf, cases_7dmed = median(cases), .window_size = 7) |
| 12 | +#' cases_deaths_subset %>% |
| 13 | +#' epi_slide( |
| 14 | +#' cfr_estimate_v0 = death_rate_7d_av[[22]]/case_rate_7d_av[[1]], |
| 15 | +#' .window_size = 22 |
| 16 | +#' ) %>% |
| 17 | +#' print(n = 30) |
12 | 18 | #' ```
|
13 | 19 | #'
|
14 |
| -#' For two very common use cases, we provide optimized functions that are much |
15 |
| -#' faster than `epi_slide`: `epi_slide_mean()` and `epi_slide_sum()`. We |
16 |
| -#' recommend using these functions when possible. |
| 20 | +#' (Here, the value 22 was selected using `epi_cor()` and averaging across |
| 21 | +#' `geo_value`s. See |
| 22 | +#' \href{https://www.medrxiv.org/content/10.1101/2024.12.27.24319518v1}{this |
| 23 | +#' manuscript}{this manuscript} for some warnings & information using similar |
| 24 | +#' types of CFR estimators.) |
17 | 25 | #'
|
18 | 26 | #' See `vignette("epi_df")` for more examples.
|
19 | 27 | #'
|
|
34 | 42 | #'
|
35 | 43 | #' - Don't provide `.f`, and instead use use one or more
|
36 | 44 | #' [`dplyr::summarize`]-esque ["data-masking"][rlang::args_data_masking]
|
37 |
| -#' expressions in `...`, e.g., `cases_7dmed = median(cases)`. This is usually |
38 |
| -#' the most convenient way to use `epi_slide`. See examples. |
| 45 | +#' expressions in `...`, e.g., `cfr_estimate_v0 = |
| 46 | +#' death_rate_7d_av[[22]]/case_rate_7d_av[[1]]`. This way is sometimes more |
| 47 | +#' convenient, but also has the most computational overhead. |
39 | 48 | #'
|
40 |
| -#' - Provide a formula in `.f`, e.g., `~ median(.x$cases)`. In this formula, |
41 |
| -#' `.x` is an `epi_df` containing data for a single time window as described |
42 |
| -#' above, taken from the original `.x` fed into `epi_slide()`. |
| 49 | +#' - Provide a formula in `.f`, e.g., `~ |
| 50 | +#' .x$death_rate_7d_av[[22]]/.x$case_rate_7d_av[[1]]`. In this formula, `.x` |
| 51 | +#' is an `epi_df` containing data for a single time window as described above, |
| 52 | +#' taken from the original `.x` fed into `epi_slide()`. |
43 | 53 | #'
|
44 |
| -#' - Provide a function in `.f`. The function should be of the form `function(x, |
45 |
| -#' g, t)` or `function(x, g, t, <additional configuration arguments>)`, where: |
| 54 | +#' - Provide a function in `.f`, e.g., `function(x, g, t) |
| 55 | +#' x$death_rate_7d_av[[22]]/x$case_rate_7d_av[[1]]`. The function should be of |
| 56 | +#' the form `function(x, g, t)` or `function(x, g, t, <additional |
| 57 | +#' configuration arguments>)`, where: |
46 | 58 | #'
|
47 | 59 | #' - `x` is a data frame with the same column names as the original object,
|
48 | 60 | #' minus any grouping variables, with only the windowed data for one
|
|
60 | 72 | #' The values of `g` and `t` are also available to data-masking expression and
|
61 | 73 | #' formula-based computations as `.group_key` and `.ref_time_value`,
|
62 | 74 | #' respectively. Formula computations also let you use `.y` or `.z`,
|
63 |
| -#' respectively. |
| 75 | +#' respectively, as additional names for these same quantities (similar to |
| 76 | +#' [`dplyr::group_modify`]). |
64 | 77 | #'
|
65 | 78 | #' @param ... Additional arguments to pass to the function or formula specified
|
66 | 79 | #' via `.f`. Alternatively, if `.f` is missing, then the `...` is interpreted
|
|
73 | 86 | #' be given names that clash with the existing columns of `.x`.
|
74 | 87 | #'
|
75 | 88 | #' @details
|
| 89 | +#' |
| 90 | +#' ## Motivation and lower-level alternatives |
| 91 | +#' |
| 92 | +#' `epi_slide()` is focused on preventing errors and providing a convenient |
| 93 | +#' interface. If you need computational speed, many computations can be optimized |
| 94 | +#' by one of the following: |
| 95 | +#' |
| 96 | +#' * Performing core sliding operations with `epi_slide_opt()` with |
| 97 | +#' `frollapply`, and using potentially-grouped `mutate()`s to transform or |
| 98 | +#' combine the results. |
| 99 | +#' |
| 100 | +#' * Grouping by `geo_value` and any `other_keys`; [`complete()`]ing with |
| 101 | +#' `full_seq()` to fill in time gaps; `arrange()`ing by `time_value`s within |
| 102 | +#' each group; using `mutate()` with vectorized operations and shift operators |
| 103 | +#' like `dplyr::lead()` and `dplyr::lag()` to perform the core operations, |
| 104 | +#' being careful to give the desired results for the least and most recent |
| 105 | +#' `time_value`s (often `NA`s for the least recent); ungrouping; and |
| 106 | +#' `filter()`ing back down to only rows that existed before the `complete()` |
| 107 | +#' stage if necessary. |
| 108 | +#' |
76 | 109 | #' ## Advanced uses of `.f` via tidy evaluation
|
77 | 110 | #'
|
78 | 111 | #' If specifying `.f` via tidy evaluation, in addition to the standard [`.data`]
|
|
96 | 129 | #' @examples
|
97 | 130 | #' library(dplyr)
|
98 | 131 | #'
|
99 |
| -#' # Get the 7-day trailing standard deviation of cases and the 7-day trailing mean of cases |
100 |
| -#' cases_deaths_subset %>% |
| 132 | +#' # Generate some simple time-varying CFR estimates: |
| 133 | +#' with_cfr_estimates <- cases_deaths_subset %>% |
101 | 134 | #' epi_slide(
|
102 |
| -#' cases_7sd = sd(cases, na.rm = TRUE), |
103 |
| -#' cases_7dav = mean(cases, na.rm = TRUE), |
104 |
| -#' .window_size = 7 |
105 |
| -#' ) %>% |
106 |
| -#' select(geo_value, time_value, cases, cases_7sd, cases_7dav) |
107 |
| -#' # Note that epi_slide_mean could be used to more quickly calculate cases_7dav. |
| 135 | +#' cfr_estimate_v0 = death_rate_7d_av[[22]]/case_rate_7d_av[[1]], |
| 136 | +#' .window_size = 22 |
| 137 | +#' ) |
| 138 | +#' with_cfr_estimates %>% |
| 139 | +#' print(n = 30) |
| 140 | +#' # (Here, the value 22 was selected using `epi_cor()` and averaging across |
| 141 | +#' # `geo_value`s. See |
| 142 | +#' # https://www.medrxiv.org/content/10.1101/2024.12.27.24319518v1 for some |
| 143 | +#' # warnings & information using CFR estimators along these lines.) |
108 | 144 | #'
|
109 |
| -#' # In addition to the [`dplyr::mutate`]-like syntax, you can feed in a function or |
110 |
| -#' # formula in a way similar to [`dplyr::group_modify`]: |
111 |
| -#' my_summarizer <- function(window_data) { |
112 |
| -#' window_data %>% |
113 |
| -#' summarize( |
114 |
| -#' cases_7sd = sd(cases, na.rm = TRUE), |
115 |
| -#' cases_7dav = mean(cases, na.rm = TRUE) |
116 |
| -#' ) |
| 145 | +#' # In addition to the [`dplyr::mutate`]-like syntax, you can feed in a |
| 146 | +#' # function or formula in a way similar to [`dplyr::group_modify`]; these |
| 147 | +#' # often run much more quickly: |
| 148 | +#' my_computation <- function(window_data) { |
| 149 | +#' tibble( |
| 150 | +#' cfr_estimate_v0 = window_data$death_rate_7d_av[[nrow(window_data)]] / |
| 151 | +#' window_data$case_rate_7d_av[[1]] |
| 152 | +#' ) |
117 | 153 | #' }
|
118 |
| -#' cases_deaths_subset %>% |
| 154 | +#' with_cfr_estimates2 <- cases_deaths_subset %>% |
119 | 155 | #' epi_slide(
|
120 |
| -#' ~ my_summarizer(.x), |
121 |
| -#' .window_size = 7 |
122 |
| -#' ) %>% |
123 |
| -#' select(geo_value, time_value, cases, cases_7sd, cases_7dav) |
124 |
| -#' |
125 |
| -#' |
126 |
| -#' |
| 156 | +#' ~ my_computation(.x), |
| 157 | +#' .window_size = 22 |
| 158 | +#' ) |
| 159 | +#' with_cfr_estimates3 <- cases_deaths_subset %>% |
| 160 | +#' epi_slide( |
| 161 | +#' function(window_data, g, t) { |
| 162 | +#' tibble( |
| 163 | +#' cfr_estimate_v0 = window_data$death_rate_7d_av[[nrow(window_data)]] / |
| 164 | +#' window_data$case_rate_7d_av[[1]] |
| 165 | +#' ) |
| 166 | +#' }, |
| 167 | +#' .window_size = 22 |
| 168 | +#' ) |
127 | 169 | #'
|
128 | 170 | #'
|
129 | 171 | #' #### Advanced: ####
|
@@ -586,9 +628,9 @@ get_before_after_from_window <- function(window_size, align, time_type) {
|
586 | 628 | #'
|
587 | 629 | #' `epi_slide_opt` allows you to use any [data.table::froll] or
|
588 | 630 | #' [slider::summary-slide] function. If none of the specialized functions here
|
589 |
| -#' work, you can use `data.table::frollapply` with your own function. See |
590 |
| -#' [`epi_slide`] if you need to work with multiple columns at once or output a |
591 |
| -#' custom type. |
| 631 | +#' work, you can use `data.table::frollapply` together with a non-rolling |
| 632 | +#' function (e.g., `median`). See [`epi_slide`] if you need to work with |
| 633 | +#' multiple columns at once or output a custom type. |
592 | 634 | #'
|
593 | 635 | #' @template basic-slide-params
|
594 | 636 | #' @param .col_names <[`tidy-select`][dplyr_tidy_select]> An unquoted column
|
|
0 commit comments