Skip to content

Updated epi_slide to use before and after and added checks #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Nov 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
bb39209
Some cleanup of slide; still incomplete.
kenmawer Jul 26, 2022
5984d8d
Still needs changes as before and after numbers are wrong.
kenmawer Jul 26, 2022
b68af0b
Changed bad formatting.
kenmawer Jul 26, 2022
b3229f2
Still needs refactoring.
kenmawer Jul 27, 2022
d18d98c
Redocumented with changes; still needs changes.
kenmawer Jul 27, 2022
121f9d2
Bad changes that break things.
kenmawer Jul 29, 2022
35811f1
Seems like merge is broken.
kenmawer Jul 29, 2022
37b3815
Merge branch 'main' of https://github.com/cmu-delphi/epiprocess into …
kenmawer Jul 29, 2022
f6e8795
Merge branch 'km-slide-n-replace' of https://github.com/dajmcdon/epip…
kenmawer Jul 29, 2022
ee10963
Seems broken beyond repair.
kenmawer Jul 29, 2022
05d84ca
Fixed tests.
kenmawer Jul 29, 2022
846b6ca
Fixed improper use of n.
kenmawer Jul 29, 2022
d55e6b8
This finally runs without errors.
kenmawer Jul 29, 2022
1158c8a
Note that epix_slide still hasn't been updated, and some epi_slide do…
kenmawer Jul 29, 2022
bbf5d6b
Need to ensure tests pass.
kenmawer Aug 5, 2022
b55d411
This shouldn't be here.
kenmawer Aug 5, 2022
b22ace3
Removed repetitive code and added more tests.
kenmawer Aug 6, 2022
feea2f4
Merge branch 'main' into km-slide-n-replace2.1
kenmawer Aug 8, 2022
1038e15
Ran document after updating to epidatr.
kenmawer Aug 9, 2022
6e2b207
Addressed first two comments.
kenmawer Aug 9, 2022
77b5bb9
Replaced `n` in details.
kenmawer Aug 9, 2022
db99a67
Updated some poorly typed documentation and an imporperly refactored …
kenmawer Aug 9, 2022
0456aff
Cleared unclear documentation and removed redundancy with slide's code.
kenmawer Aug 10, 2022
950ee8c
Added a test for blank `after`.
kenmawer Aug 10, 2022
d43cede
Refactored edf with grouped.
kenmawer Aug 10, 2022
039f33f
More fixes.
kenmawer Aug 10, 2022
93738aa
Updated `align`.
kenmawer Aug 10, 2022
8c601f8
Fixed inconsistency with test formatting.
kenmawer Aug 10, 2022
cfe2b55
Updated compactify on a vignette, added two tests for NA and put a te…
kenmawer Aug 10, 2022
26836c4
This should not be here.
kenmawer Aug 15, 2022
8ec50dd
Added example of centre alignment.
kenmawer Aug 15, 2022
ca5c4ee
I forgot to document.
kenmawer Aug 15, 2022
ff6b0c1
Made `n` more descriptive.
kenmawer Aug 16, 2022
94aa234
Updated documentation.
kenmawer Aug 16, 2022
88eae27
Fixed up mixup with alignments.
kenmawer Aug 17, 2022
2f88b85
Replaced "rolling" with "running".
kenmawer Aug 17, 2022
7995dfe
Pulled changes to take out conflicts on .Rd.
kenmawer Aug 18, 2022
b0b2450
Implemented first point.
kenmawer Aug 19, 2022
5cd8ea9
IDK what's going on with the warning message printing...
kenmawer Aug 19, 2022
9f5ee8c
Require >=1 of `before`,`after`; ensure `time_step` receives integer
lcbrooks Aug 23, 2022
0fec3ae
Format `epi_slide` roxygen examples
lcbrooks Aug 23, 2022
d9682da
Fix some outdated docs, refine wording on others
lcbrooks Aug 23, 2022
0d3ea1b
Fix broken reference in roxygen docs
lcbrooks Aug 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ Imports:
tidyr,
tidyselect,
tsibble,
utils
utils,
vctrs
Suggests:
covidcast,
epidatr,
Expand All @@ -46,7 +47,6 @@ Suggests:
outbreaks,
rmarkdown,
testthat (>= 3.0.0),
vctrs,
waldo (>= 0.3.1),
withr
VignetteBuilder:
Expand Down
2 changes: 1 addition & 1 deletion R/growth_rate.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
#' implicitly defined by the `x` variable; for example, if `x` is a vector of
#' `Date` objects, `h = 7`, and the reference point is January 7, then the
#' sliding window contains all data in between January 1 and 14 (matching the
#' behavior of `epi_slide()` with `n = 2 * h` and `align = "center"`).
#' behavior of `epi_slide()` with `before = h - 1` and `after = h`).
#'
#' @section Additional Arguments:
#' For the global methods, "smooth_spline" and "trend_filter", additional
Expand Down
8 changes: 6 additions & 2 deletions R/outliers.R
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,10 @@ detect_outlr = function(x = seq_along(y), y,
#' `y`).
#' @param y Signal values.
#' @param n Number of time steps to use in the rolling window. Default is 21.
#' This value is centrally aligned. When `n` is an odd number, the rolling
#' window extends from `(n-1)/2` time steps before each design point to `(n-1)/2`
#' time steps after. When `n` is even, then the rolling range extends from
#' `n/2-1` time steps before to `n/2` time steps after.
#' @param log_transform Should a log transform be applied before running outlier
#' detection? Default is `FALSE`. If `TRUE`, and zeros are present, then the
#' log transform will be padded by 1.
Expand Down Expand Up @@ -179,7 +183,7 @@ detect_outlr_rm = function(x = seq_along(y), y, n = 21,

# Calculate lower and upper thresholds and replacement value
z = z %>%
epi_slide(fitted = median(y), n = n, align = "center") %>%
epi_slide(fitted = median(y), before = floor((n-1)/2), after = ceiling((n-1)/2)) %>%
dplyr::mutate(resid = y - fitted) %>%
roll_iqr(n = n,
detection_multiplier = detection_multiplier,
Expand Down Expand Up @@ -332,7 +336,7 @@ roll_iqr = function(z, n, detection_multiplier, min_radius,
if (typeof(z$y) == "integer") as_type = as.integer
else as_type = as.numeric

epi_slide(z, roll_iqr = stats::IQR(resid), n = n, align = "center") %>%
epi_slide(z, roll_iqr = stats::IQR(resid), before = floor((n-1)/2), after = ceiling((n-1)/2)) %>%
dplyr::mutate(
lower = pmax(min_lower,
fitted - pmax(min_radius, detection_multiplier * roll_iqr)),
Expand Down
187 changes: 113 additions & 74 deletions R/slide.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,40 +6,44 @@
#'
#' @param x The `epi_df` object under consideration.
#' @param f Function or formula to slide over variables in `x`. To "slide" means
#' to apply a function or formula over a running window of `n` time steps
#' (where one time step is typically one day or one week; see details for more
#' explanation). If a function, `f` should take `x`, an `epi_df` with the same
#' to apply a function or formula over a rolling window of time steps.
#' The window is determined by the `before` and `after` parameters described
#' below. One time step is typically one day or one week; see details for more
#' explanation. If a function, `f` should take `x`, an `epi_df` with the same
#' names as the non-grouping columns, followed by `g` to refer to the one row
#' tibble with one column per grouping variable that identifies the group,
#' and any number of named arguments (which will be taken from `...`). If a
#' formula, `f` can operate directly on columns accessed via `.x$var`, as
#' in `~ mean(.x$var)` to compute a mean of a column var over a sliding
#' window of n time steps. As well, `.y` may be used in the formula to refer
#' window. As well, `.y` may be used in the formula to refer
#' to the groupings that would be described by `g` if `f` was a function.
#' @param ... Additional arguments to pass to the function or formula specified
#' via `f`. Alternatively, if `f` is missing, then the current argument is
#' interpreted as an expression for tidy evaluation. See details.
#' @param n Number of time steps to use in the running window. For example, if
#' `n = 7`, one time step is one day, and the alignment is "right", then to
#' produce a value on January 7 we apply the given function or formula to data
#' in between January 1 and 7.
#' @param before,after How far `before` and `after` each `ref_time_value` should
#' the sliding window extend? At least one of these two arguments must be
#' provided; the other's default will be 0. Any value provided for either
#' argument must be a single, non-`NA`, non-negative,
#' [integer-compatible][vctrs::vec_cast] number of time steps. Endpoints of the
#' window are inclusive. Common settings:
#' * For trailing/right-aligned windows from `ref_time_value - time_step(k)`
#' to `ref_time_value`: either pass `before=k` by itself, or pass `before=k,
#' after=0`.
#' * For center-aligned windows from `ref_time_value - time_step(k)` to
#' `ref_time_value + time_step(k)`: pass `before=k, after=k`.
#' * For leading/left-aligned windows from `ref_time_value` to `ref_time_value
#' + time_step(k)`: either pass pass `after=k` by itself, or pass `before=0,
#' after=k`.
#' See "Details:" about the definition of a time step, (non)treatment of
#' missing rows within the window, and avoiding warnings about
#' `before`&`after` settings for a certain uncommon use case.
#' @param ref_time_values Time values for sliding computations, meaning, each
#' element of this vector serves as the reference time point for one sliding
#' window. If missing, then this will be set to all unique time values in the
#' underlying data table, by default.
#' @param align One of "right", "center", or "left", indicating the alignment of
#' the sliding window relative to the reference time point. If the alignment
#' is "center" and `n` is even, then one more time point will be used after
#' the reference time point than before. Default is "right".
#' @param before Positive integer less than `n`, specifying the number of time
#' points to use in the sliding window strictly before the reference time
#' point. For example, setting `before = n-1` would be the same as setting
#' `align = "right"`. The `before` argument allows for more flexible
#' specification of alignment than the `align` parameter, and if specified,
#' overrides `align`.
#' @param time_step Optional function used to define the meaning of one time
#' step, which if specified, overrides the default choice based on the
#' `time_value` column. This function must take a positive integer and return
#' `time_value` column. This function must take a non-negative integer and return
#' an object of class `lubridate::period`. For example, we can use `time_step
#' = lubridate::hours` in order to set the time step to be one hour (this
#' would only be meaningful if `time_value` is of class `POSIXct`).
Expand All @@ -59,28 +63,44 @@
#' @return An `epi_df` object given by appending a new column to `x`, named
#' according to the `new_col_name` argument.
#'
#' @details To "slide" means to apply a function or formula over a running
#' window of `n` time steps, where the unit (the meaning of one time step) is
#' @details To "slide" means to apply a function or formula over a rolling
#' window of time steps where the window is entered at a reference time and
#' left and right endpoints are given by the `before` and `after` arguments.
#' The unit (the meaning of one time step) is
#' implicitly defined by the way the `time_value` column treats addition and
#' subtraction; for example, if the time values are coded as `Date` objects,
#' then one time step is one day, since `as.Date("2022-01-01") + 1` equals
#' `as.Date("2022-01-02")`. Alternatively, the time step can be set explicitly
#' using the `time_step` argument (which if specified would override the
#' default choice based on `time_value` column). If less than `n` time steps
#' are available at any given reference time value, then `epi_slide()` still
#' default choice based on `time_value` column). If there are not enough time
#' steps available to complete the window at any given reference time, then
#' `epi_slide()` still
#' attempts to perform the computation anyway (it does not require a complete
#' window). The issue of what to do with partial computations (those run on
#' incomplete windows) is therefore left up to the user, either through the
#' specified function or formula `f`, or through post-processing.
#'
#' If `f` is missing, then an expression for tidy evaluation can be specified,
#' for example, as in:
#' specified function or formula `f`, or through post-processing. For a
#' centrally-aligned slide of `n` `time_value`s in a sliding window, set
#' `before = (n-1)/2` and `after = (n-1)/2` when the number of `time_value`s
#' in a sliding window is odd and `before = n/2-1` and `after = n/2` when
#' `n` is even.
#'
#' Sometimes, we want to experiment with various trailing or leading window
#' widths and compare the slide outputs. In the (uncommon) case where
#' zero-width windows are considered, manually pass both the `before` and
#' `after` arguments in order to prevent potential warnings. (E.g., `before=k`
#' with `k=0` and `after` missing may produce a warning. To avoid warnings,
#' use `before=k, after=0` instead; otherwise, it looks too much like a
#' leading window was intended, but the `after` argument was forgotten or
#' misspelled.)
#'
#' If `f` is missing, then an expression for tidy evaluation can be specified,
#' for example, as in:
#' ```
#' epi_slide(x, cases_7dav = mean(cases), n = 7)
#' epi_slide(x, cases_7dav = mean(cases), before = 6)
#' ```
#' which would be equivalent to:
#' ```
#' epi_slide(x, function(x, ...) mean(x$cases), n = 7,
#' epi_slide(x, function(x, ...) mean(x$cases), before = 6,
#' new_col_name = "cases_7dav")
#' ```
#' Thus, to be clear, when the computation is specified via an expression for
Expand All @@ -92,32 +112,45 @@
#' @importFrom rlang .data .env !! enquo enquos sym
#' @export
#' @examples
#' # slide a 7-day trailing average formula on cases
#' jhu_csse_daily_subset %>%
#' # slide a 7-day trailing average formula on cases
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(cases_7dav = mean(cases), before = 6) %>%
#' # rmv a nonessential var. to ensure new col is printed
#' dplyr::select(-death_rate_7d_av)
#'
#' # slide a 7-day leading average
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(cases_7dav = mean(cases), after = 6) %>%
#' # rmv a nonessential var. to ensure new col is printed
#' dplyr::select(-death_rate_7d_av)
#'
#' # slide a 7-day centre-aligned average
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(cases_7dav = mean(cases), n = 7,
#' align = "right") %>%
#' epi_slide(cases_7dav = mean(cases), before = 3, after = 3) %>%
#' # rmv a nonessential var. to ensure new col is printed
#' dplyr::select(-death_rate_7d_av)
#'
#' # slide a left-aligned 7-day average
#' jhu_csse_daily_subset %>%
#'
#' # slide a 14-day centre-aligned average
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(cases_7dav = mean(cases), n = 7,
#' align = "left") %>%
#' epi_slide(cases_7dav = mean(cases), before = 6, after = 7) %>%
#' # rmv a nonessential var. to ensure new col is printed
#' dplyr::select(-death_rate_7d_av)
#'
#' # nested new columns
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(a = data.frame(cases_2dav = mean(cases),
#' cases_2dma = mad(cases)),
#' n = 2, as_list_col = TRUE)
epi_slide = function(x, f, ..., n, ref_time_values,
align = c("right", "center", "left"), before, time_step,
#'
#' # nested new columns
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(a = data.frame(cases_2dav = mean(cases),
#' cases_2dma = mad(cases)),
#' before = 1, as_list_col = TRUE)
epi_slide = function(x, f, ..., before, after, ref_time_values,
time_step,
new_col_name = "slide_value", as_list_col = FALSE,
names_sep = "_", all_rows = FALSE) {

# Check we have an `epi_df` object
if (!inherits(x, "epi_df")) Abort("`x` must be of class `epi_df`.")

Expand All @@ -133,44 +166,50 @@ epi_slide = function(x, f, ..., n, ref_time_values,
ref_time_values = ref_time_values[ref_time_values %in%
unique(x$time_value)]
}

# If before is missing, then use align to set up alignment
if (missing(before)) {
align = match.arg(align)
if (align == "right") {
before_num = n-1
after_num = 0
}
else if (align == "center") {
before_num = floor((n-1)/2)
after_num = ceiling((n-1)/2)

# Validate and pre-process `before`, `after`:
if (!missing(before)) {
before <- vctrs::vec_cast(before, integer())
if (length(before) != 1L || is.na(before) || before < 0L) {
Abort("`before` must be length-1, non-NA, non-negative")
}
else {
before_num = 0
after_num = n-1
}
if (!missing(after)) {
after <- vctrs::vec_cast(after, integer())
if (length(after) != 1L || is.na(after) || after < 0L) {
Abort("`after` must be length-1, non-NA, non-negative")
}
}

# Otherwise set up alignment based on passed before value
else {
if (before < 0 || before > n-1) {
Abort("`before` must be in between 0 and n-1`.")
if (missing(before)) {
if (missing(after)) {
Abort("Either or both of `before`, `after` must be provided.")
} else if (after == 0L) {
Warn("`before` missing, `after==0`; maybe this was intended to be some
non-zero-width trailing window, but since `before` appears to be
missing, it's interpreted as a zero-width window (`before=0,
after=0`).")
}

before_num = before
after_num = n-1-before
before <- 0L
} else if (missing(after)) {
if (before == 0L) {
Warn("`before==0`, `after` missing; maybe this was intended to be some
non-zero-width leading window, but since `after` appears to be
missing, it's interpreted as a zero-width window (`before=0,
after=0`).")
}
after <- 0L
}

# If a custom time step is specified, then redefine units
# If a custom time step is specified, then redefine units
if (!missing(time_step)) {
before_num = time_step(before_num)
after_num = time_step(after_num)
before <- time_step(before)
after <- time_step(after)
}

# Now set up starts and stops for sliding/hopping
time_range = range(unique(x$time_value))
starts = in_range(ref_time_values - before_num, time_range)
stops = in_range(ref_time_values + after_num, time_range)
starts = in_range(ref_time_values - before, time_range)
stops = in_range(ref_time_values + after, time_range)

if( length(starts) == 0 || length(stops) == 0 ) {
Abort("The starting and/or stopping times for sliding are out of bounds with respect to the range of times in your data. Check your settings for ref_time_values and align (and before, if specified).")
Expand Down
6 changes: 5 additions & 1 deletion man/detect_outlr_rm.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading