Skip to content

Commit 2fa0d67

Browse files
committed
lots of reference updates
1 parent 46354ea commit 2fa0d67

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+604
-344
lines changed

DEVELOPMENT.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,14 @@ R -e 'devtools::document()'
3535
R -e 'pkgdown::build_site()'
3636
```
3737

38-
Note that sometimes the caches from either `pkgdown` or `knitr` can cause difficulties. To clear those, run `make`, with either `clean_knitr`, `clean_site`, or `clean` (which does both).
38+
Note that sometimes the caches from either `pkgdown` or `knitr` can cause
39+
difficulties. To clear those, run `make`, with either `clean_knitr`,
40+
`clean_site`, or `clean` (which does both).
3941

4042
If you work without R Studio and want to iterate on documentation, you might
4143
find [this
4244
script](https://gist.github.com/gadenbuie/d22e149e65591b91419e41ea5b2e0621)
43-
helpful.
45+
helpful. For updating references, you will need to manually call `pkgdown::build_reference()`.
4446

4547
## Versioning
4648

R/arx_classifier.R

+2-4
Original file line numberDiff line numberDiff line change
@@ -166,9 +166,7 @@ arx_classifier <- function(
166166
}
167167
forecast_date <- args_list$forecast_date %||% forecast_date_default
168168
target_date <- args_list$target_date %||% (forecast_date + args_list$ahead)
169-
preds <- forecast(
170-
wf,
171-
) %>%
169+
preds <- forecast(wf) %>%
172170
as_tibble() %>%
173171
select(-time_value)
174172

@@ -347,7 +345,7 @@ arx_class_epi_workflow <- function(
347345
#' be created using growth rates (as the predictors are) or lagged
348346
#' differences. The second case is closer to the requirements for the
349347
#' [2022-23 CDC Flusight Hospitalization Experimental Target](https://github.com/cdcepi/Flusight-forecast-data/blob/745511c436923e1dc201dea0f4181f21a8217b52/data-experimental/README.md).
350-
#' See the Classification Vignette for details of how to create a reasonable
348+
#' See the [Classification chapter from the forecasting book](https://cmu-delphi.github.io/delphi-tooling-book/arx-classifier.html) Vignette for details of how to create a reasonable
351349
#' baseline for this case. Selecting `"growth_rate"` (the default) uses
352350
#' [epiprocess::growth_rate()] to create the outcome using some of the
353351
#' additional arguments below. Choosing `"lag_difference"` instead simply

R/arx_forecaster.R

+24-18
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,29 @@
11
#' Direct autoregressive forecaster with covariates
22
#'
33
#' This is an autoregressive forecasting model for
4-
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct" forecasting, meaning
5-
#' that it estimates a model for a particular target horizon.
4+
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct"
5+
#' forecasting, meaning that it estimates a model for a particular target
6+
#' horizon of `outcome` based on the lags of the `predictors`. See the [Get
7+
#' started vignette](../articles/epipredict.html) for some worked examples and
8+
#' [Custom epi_workflows vignette](../articles/custom_epiworkflows.html) for a
9+
#' recreation using a custom `epi_workflow()`.
610
#'
711
#'
812
#' @param epi_data An `epi_df` object
9-
#' @param outcome A character (scalar) specifying the outcome (in the
10-
#' `epi_df`).
13+
#' @param outcome A character (scalar) specifying the outcome (in the `epi_df`).
1114
#' @param predictors A character vector giving column(s) of predictor variables.
12-
#' This defaults to the `outcome`. However, if manually specified, only those variables
13-
#' specifically mentioned will be used. (The `outcome` will not be added.)
14-
#' By default, equals the outcome. If manually specified, does not add the
15-
#' outcome variable, so make sure to specify it.
16-
#' @param trainer A `{parsnip}` model describing the type of estimation.
17-
#' For now, we enforce `mode = "regression"`.
18-
#' @param args_list A list of customization arguments to determine
19-
#' the type of forecasting model. See [arx_args_list()].
15+
#' This defaults to the `outcome`. However, if manually specified, only those
16+
#' variables specifically mentioned will be used. (The `outcome` will not be
17+
#' added.) By default, equals the outcome. If manually specified, does not
18+
#' add the outcome variable, so make sure to specify it.
19+
#' @param trainer A `{parsnip}` model describing the type of estimation. For
20+
#' now, we enforce `mode = "regression"`.
21+
#' @param args_list A list of customization arguments to determine the type of
22+
#' forecasting model. See [arx_args_list()].
2023
#'
21-
#' @return A list with (1) `predictions` an `epi_df` of predicted values
22-
#' and (2) `epi_workflow`, a list that encapsulates the entire estimation
23-
#' workflow
24+
#' @return An `arx_fcast`, with the fields `predictions` and `epi_workflow`.
25+
#' `predictions` is an `epi_df` of predicted values while `epi_workflow()` is
26+
#' the fit workflow used to make those predictions
2427
#' @export
2528
#' @seealso [arx_fcast_epi_workflow()], [arx_args_list()]
2629
#'
@@ -29,15 +32,18 @@
2932
#' dplyr::filter(time_value >= as.Date("2021-12-01"))
3033
#'
3134
#' out <- arx_forecaster(
32-
#' jhu, "death_rate",
35+
#' jhu,
36+
#' "death_rate",
3337
#' c("case_rate", "death_rate")
3438
#' )
3539
#'
36-
#' out <- arx_forecaster(jhu, "death_rate",
40+
#' out <- arx_forecaster(jhu,
41+
#' "death_rate",
3742
#' c("case_rate", "death_rate"),
3843
#' trainer = quantile_reg(),
3944
#' args_list = arx_args_list(quantile_levels = 1:9 / 10)
4045
#' )
46+
#' out
4147
arx_forecaster <- function(
4248
epi_data,
4349
outcome,
@@ -60,7 +66,7 @@ arx_forecaster <- function(
6066
forecast_date <- args_list$forecast_date %||% forecast_date_default
6167

6268

63-
preds <- forecast(wf, forecast_date = forecast_date) %>%
69+
preds <- forecast(wf) %>%
6470
as_tibble() %>%
6571
select(-time_value)
6672

R/epi_recipe.R

+7-5
Original file line numberDiff line numberDiff line change
@@ -232,9 +232,10 @@ is_epi_recipe <- function(x) {
232232

233233

234234

235-
#' Add an `epi_recipe` to a workflow
235+
#' Given an `epi_recipe`, add it to, remove it from, or update it in an
236+
#' `epi_workflow`
236237
#'
237-
#' @seealso [workflows::add_recipe()]
238+
#' @description
238239
#' - `add_recipe()` specifies the terms of the model and any preprocessing that
239240
#' is required through the usage of a recipe.
240241
#'
@@ -244,9 +245,9 @@ is_epi_recipe <- function(x) {
244245
#' recipe with the new one.
245246
#'
246247
#' @details
247-
#' `add_epi_recipe` has the same behaviour as
248-
#' [workflows::add_recipe()] but sets a different
249-
#' default blueprint to automatically handle [epiprocess::epi_df][epiprocess::as_epi_df] data.
248+
#' `add_epi_recipe()` has the same behaviour as [workflows::add_recipe()] but
249+
#' sets a different default blueprint to automatically handle
250+
#' `epiprocess::epi_df()` data.
250251
#'
251252
#' @param x A `workflow` or `epi_workflow`
252253
#'
@@ -265,6 +266,7 @@ is_epi_recipe <- function(x) {
265266
#' `x`, updated with a new recipe preprocessor.
266267
#'
267268
#' @export
269+
#' @seealso [workflows::add_recipe()]
268270
#' @examples
269271
#' jhu <- covid_case_death_rates %>%
270272
#' filter(time_value > "2021-08-01")

R/epi_workflow.R

+45-35
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,20 @@
11
#' Create an epi_workflow
22
#'
33
#' This is a container object that unifies preprocessing, fitting, prediction,
4-
#' and post-processing for predictive modeling on epidemiological data. It extends
5-
#' the functionality of a [workflows::workflow()] to handle the typical panel
6-
#' data structures found in this field. This extension is handled completely
7-
#' internally, and should be invisible to the user. For all intents and purposes,
8-
#' this operates exactly like a [workflows::workflow()]. For more details
9-
#' and numerous examples, see there.
4+
#' and post-processing for predictive modeling on epidemiological data. It
5+
#' extends the functionality of a [workflows::workflow()] to handle the typical
6+
#' panel data structures found in this field. This extension is handled
7+
#' completely internally, and should be invisible to the user. For all intents
8+
#' and purposes, this operates exactly like a [workflows::workflow()]. For some
9+
#' `{epipredict}` specific examples, see the [custom epiworkflows
10+
#' vignette](../articles/custom_epiworkflows.html).
1011
#'
1112
#' @inheritParams workflows::workflow
1213
#' @param postprocessor An optional postprocessor to add to the workflow.
1314
#' Currently only `frosting` is allowed using, `add_frosting()`.
1415
#'
1516
#' @return A new `epi_workflow` object.
16-
#' @seealso workflows::workflow
17+
#' @seealso [workflows::workflow()]
1718
#' @importFrom rlang is_null
1819
#' @importFrom stats predict
1920
#' @importFrom generics fit
@@ -62,9 +63,9 @@ is_epi_workflow <- function(x) {
6263
#' Fit an `epi_workflow` object
6364
#'
6465
#' @description
65-
#' This is the `fit()` method for an `epi_workflow` object that
66+
#' This is the `fit()` method for an `epi_workflow()` object that
6667
#' estimates parameters for a given model from a set of data.
67-
#' Fitting an `epi_workflow` involves two main steps, which are
68+
#' Fitting an `epi_workflow()` involves two main steps, which are
6869
#' preprocessing the data and fitting the underlying parsnip model.
6970
#'
7071
#' @inheritParams workflows::fit.workflow
@@ -79,7 +80,7 @@ is_epi_workflow <- function(x) {
7980
#' @return The `epi_workflow` object, updated with a fit parsnip
8081
#' model in the `object$fit$fit` slot.
8182
#'
82-
#' @seealso workflows::fit-workflow
83+
#' @seealso [workflows::fit-workflow()]
8384
#'
8485
#' @name fit-epi_workflow
8586
#' @export
@@ -111,20 +112,20 @@ fit.epi_workflow <- function(object, data, ..., control = workflows::control_wor
111112
#' Predict from an epi_workflow
112113
#'
113114
#' @description
114-
#' This is the `predict()` method for a fit epi_workflow object. The nice thing
115-
#' about predicting from an epi_workflow is that it will:
115+
#' This is the `predict()` method for a fit epi_workflow object. The 3 steps that this implements are:
116116
#'
117-
#' - Preprocess `new_data` using the preprocessing method specified when the
118-
#' workflow was created and fit. This is accomplished using
119-
#' [hardhat::forge()], which will apply any formula preprocessing or call
120-
#' [recipes::bake()] if a recipe was supplied.
117+
#' - Preprocessing `new_data` using the preprocessing method specified when the
118+
#' epi_workflow was created and fit. This is accomplished using
119+
#' `recipes::bake()` if a recipe was supplied. Note that this is a slightly
120+
#' different `bake` operation than the one occuring during the fit. Any `step`
121+
#' that has `skip = TRUE` isn't applied during prediction; for example in
122+
#' `step_epi_naomit()`, `all_outcomes()` isn't `NA` omitted, since doing so
123+
#' would drop the exact `time_values` we are trying to predict.
121124
#'
122-
#' - Call [parsnip::predict.model_fit()] for you using the underlying fit
125+
#' - Calling `parsnip::predict.model_fit()` for you using the underlying fit
123126
#' parsnip model.
124127
#'
125-
#' - Ensure that the returned object is an [epiprocess::epi_df][epiprocess::as_epi_df] where
126-
#' possible. Specifically, the output will have `time_value` and
127-
#' `geo_value` columns as well as the prediction.
128+
#' - `slather()` any frosting that has been included in the `epi_workflow`.
128129
#'
129130
#' @param object An epi_workflow that has been fit by
130131
#' [workflows::fit.workflow()]
@@ -136,7 +137,7 @@ fit.epi_workflow <- function(object, data, ..., control = workflows::control_wor
136137
#'
137138
#' @return
138139
#' A data frame of model predictions, with as many rows as `new_data` has.
139-
#' If `new_data` is an `epi_df` or a data frame with `time_value` or
140+
#' If `new_data` is an `epi_df()` or a data frame with `time_value` or
140141
#' `geo_value` columns, then the result will have those as well.
141142
#'
142143
#' @name predict-epi_workflow
@@ -177,6 +178,11 @@ predict.epi_workflow <- function(object, new_data, type = NULL, opts = list(), .
177178

178179
#' Augment data with predictions
179180
#'
181+
#' `augment()`, unlike `forecast()`, has the goal of modifying the training
182+
#' data, rather than just producing new forecasts. It does a prediction on
183+
#' `new_data`, which will produce a prediction for most `time_values`, and then
184+
#' adds `.pred` as a column to `new_data` and returns the resulting join.
185+
#'
180186
#' @param x A trained epi_workflow
181187
#' @param new_data A epi_df of predictors
182188
#' @param ... Arguments passed on to the predict method.
@@ -228,26 +234,30 @@ print.epi_workflow <- function(x, ...) {
228234
}
229235

230236

231-
#' Produce a forecast from an epi workflow
237+
#' Produce a forecast from just an epi workflow
238+
#'
239+
#' `forecast.epi_workflow` predicts by restricting the training data to the
240+
#' latest available data, and predicting on that. It binds together
241+
#' `get_test_data()` and `predict()`.
232242
#'
233243
#' @param object An epi workflow.
234-
#' @param ... Not used.
235-
#' @param n_recent Integer or NULL. If filling missing data with locf = TRUE,
236-
#' how far back are we willing to tolerate missing data? Larger values allow
237-
#' more filling. The default NULL will determine this from the the recipe. For
238-
#' example, suppose n_recent = 3, then if the 3 most recent observations in any
239-
#' geo_value are all NA’s, we won’t be able to fill anything, and an error
240-
#' message will be thrown. (See details.)
241-
#' @param forecast_date By default, this is set to the maximum time_value in x.
242-
#' But if there is data latency such that recent NA's should be filled, this may
243-
#' be after the last available time_value.
244244
#'
245245
#' @return A forecast tibble.
246246
#'
247247
#' @export
248-
forecast.epi_workflow <- function(object, ..., n_recent = NULL, forecast_date = NULL) {
249-
rlang::check_dots_empty()
250-
248+
#' @examples
249+
#' jhu <- covid_case_death_rates %>%
250+
#' filter(time_value > "2021-08-01")
251+
#'
252+
#' r <- epi_recipe(jhu) %>%
253+
#' step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
254+
#' step_epi_ahead(death_rate, ahead = 7) %>%
255+
#' step_epi_naomit()
256+
#'
257+
#' epi_workflow(r, parsnip::linear_reg()) %>%
258+
#' fit(jhu) %>%
259+
#' forecast()
260+
forecast.epi_workflow <- function(object) {
251261
if (!object$trained) {
252262
cli_abort(c(
253263
"You cannot `forecast()` a {.cls workflow} that has not been trained.",

R/extrapolate_quantiles.R

+14-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,16 @@
1-
#' Summarize a distribution with a set of quantiles
1+
#' Extrapolate the quantiles to new quantile levels
2+
#'
3+
#' This both interpolates between quantile levels already defined in `x` and
4+
#' extrapolates quantiles outside their bounds. The interpolation method is
5+
#' determined by the `quantile` argument `middle`, which can be either `"cubic"`
6+
#' for a (hyman) cubic spline interpolation, or `"linear"` for simple linear
7+
#' interpolation.
8+
#'
9+
#' There is only one extrapolation method for values greater than the largest
10+
#' known quantile level or smaller than the smallest known quantile level. It
11+
#' assumes a roughly exponential tail, whose decay rate and offset is derived
12+
#' from the slope of the two most extreme quantile levels on a logistic scale.
13+
#' See the internal function `tail_extrapolate()` for the exact implementation.
214
#'
315
#' @param x a `distribution` vector
416
#' @param probs a vector of probabilities at which to calculate quantiles
@@ -26,7 +38,7 @@
2638
#' dist_normal(c(10, 2), c(5, 10)),
2739
#' dist_quantiles(list(1:4, 8:11), list(c(.2, .4, .6, .8)))
2840
#' )
29-
#' extrapolate_quantiles(dstn, probs = c(.25, 0.5, .75))
41+
#' extrapolate_quantiles(dstn, probs = c(0.0001, 0.25, 0.5, 0.75, 0.99999))
3042
extrapolate_quantiles <- function(x, probs, replace_na = TRUE, ...) {
3143
UseMethod("extrapolate_quantiles")
3244
}

R/flatline_forecaster.R

+32-9
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,41 @@
11
#' Predict the future with today's value
22
#'
3-
#' This is a simple forecasting model for
4-
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It uses the most recent
5-
#' observation as the
6-
#' forecast for any future date, and produces intervals based on the quantiles
7-
#' of the residuals of such a "flatline" forecast over all available training
8-
#' data.
3+
#' @description This is a simple forecasting model for
4+
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It uses the most recent
5+
#' observation as the forecast for any future date, and produces intervals
6+
#' based on the quantiles of the residuals of such a "flatline" forecast over
7+
#' all available training data.
98
#'
109
#' By default, the predictive intervals are computed separately for each
11-
#' combination of key values (`geo_value` + any additional keys) in the
12-
#' `epi_data` argument.
10+
#' combination of key values (`geo_value` + any additional keys) in the
11+
#' `epi_data` argument.
1312
#'
1413
#' This forecaster is very similar to that used by the
15-
#' [COVID19ForecastHub](https://covid19forecasthub.org)
14+
#' [COVID19ForecastHub](https://covid19forecasthub.org)
15+
#'
16+
#' @details
17+
#' Here is (roughly) the code for the `flatline_forecaster()` applied to the
18+
#' `case_rate` for `epidatasets::covid_case_death_rates`.
19+
#'
20+
#' ```{r}
21+
#' jhu <- covid_case_death_rates %>%
22+
#' filter(time_value > "2021-11-01", geo_value %in% c("ak", "ca", "ny"))
23+
#' r <- epi_recipe(covid_case_death_rates) %>%
24+
#' step_epi_ahead(case_rate, ahead = 7, skip = TRUE) %>%
25+
#' recipes::update_role(case_rate, new_role = "predictor") %>%
26+
#' recipes::add_role(all_of(key_colnames(jhu)), new_role = "predictor")
27+
#'
28+
#' f <- frosting() %>%
29+
#' layer_predict() %>%
30+
#' layer_residual_quantiles() %>%
31+
#' layer_add_forecast_date() %>%
32+
#' layer_add_target_date() %>%
33+
#' layer_threshold(starts_with(".pred"))
34+
#'
35+
#' eng <- linear_reg() %>% set_engine("flatline")
36+
#' wf <- epi_workflow(r, eng, f) %>% fit(jhu)
37+
#' preds <- forecast(wf)
38+
#' ```
1639
#'
1740
#' @param epi_data An [epiprocess::epi_df][epiprocess::as_epi_df]
1841
#' @param outcome A scalar character for the column name we wish to predict.

R/frosting.R

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
#' Add frosting to a workflow
1+
#' Given a `frosting()`, add it to, remove it from, or update it in an
2+
#' `epi_workflow`
23
#'
34
#' @param x A workflow
45
#' @param frosting A frosting object created using `frosting()`.

R/get_test_data.R

+4-5
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
#' Get test data for prediction based on longest lag period
22
#'
3-
#' Based on the longest lag period in the recipe,
4-
#' `get_test_data()` creates an [epi_df][epiprocess::as_epi_df]
5-
#' with columns `geo_value`, `time_value`
6-
#' and other variables in the original dataset,
7-
#' which will be used to create features necessary to produce forecasts.
3+
#' If `predict()` is given the full training dataset, it will produce a forecast
4+
#' for every day which has enough data. For most cases, this is far more
5+
#' forecasts than is necessary. `get_test_data()` is designed to restrict the given dataset to the minimum amount needed to produce a forecast on the `forecast_date`.
6+
#' Primarily this is based on the longest lag period in the recipe.
87
#'
98
#' The minimum required (recent) data to produce a forecast is equal to
109
#' the maximum lag requested (on any predictor) plus the longest horizon

0 commit comments

Comments
 (0)