Skip to content

Commit d5c816a

Browse files
authored
Merge pull request #431 from cmu-delphi/docsDraft
Docs overhaul
2 parents 0e86519 + 6047d4f commit d5c816a

File tree

100 files changed

+3369
-3007
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+3369
-3007
lines changed

DESCRIPTION

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Imports:
3939
lifecycle,
4040
lubridate,
4141
magrittr,
42-
recipes (>= 1.0.4),
42+
recipes (>= 1.1.1),
4343
rlang (>= 1.1.0),
4444
stats,
4545
tibble,
@@ -53,6 +53,7 @@ Suggests:
5353
epidatr (>= 1.0.0),
5454
fs,
5555
grf,
56+
here,
5657
knitr,
5758
poissonreg,
5859
purrr,

DEVELOPMENT.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,13 @@ R -e 'devtools::document()'
3535
R -e 'pkgdown::build_site()'
3636
```
3737

38+
Note that sometimes the caches from either `pkgdown` or `knitr` can cause
39+
difficulties. To clear those, run `make`, with either `clean_knitr`,
40+
`clean_site`, or `clean` (which does both).
41+
3842
If you work without R Studio and want to iterate on documentation, you might
39-
find [this
40-
script](https://gist.github.com/gadenbuie/d22e149e65591b91419e41ea5b2e0621)
41-
helpful.
43+
find `Rscript pkgdown-watch.R` useful.
44+
helpful. For updating references, you will need to manually call `pkgdown::build_reference()`.
4245

4346
## Versioning
4447

Makefile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
##
2+
# epipredict docs build
3+
#
4+
5+
# knitr doesn't actually clean it's own cache properly; this just deletes any of
6+
# the article knitr caches in vignettes or the base
7+
clean_knitr:
8+
rm -r *_cache; rm -r vignettes/*_cache
9+
clean_site:
10+
Rscript -e "pkgdown::clean_cache(); pkgdown::clean_site()"
11+
# this combines
12+
clean: clean_knitr clean_site
13+
14+
# end

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,7 @@ export(nested_quantiles)
190190
export(new_default_epi_recipe_blueprint)
191191
export(new_epi_recipe_blueprint)
192192
export(pivot_longer)
193+
export(pivot_quantiles)
193194
export(pivot_quantiles_longer)
194195
export(pivot_quantiles_wider)
195196
export(pivot_wider)

NEWS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicat
3838
- Replace `dist_quantiles()` with `hardhat::quantile_pred()`
3939
- Allow `quantile()` to threshold to an interval if desired (#434)
4040
- `arx_forecaster()` detects if there's enough data to predict
41+
- Add `observed_response` to `autoplot` so that forecasts can be plotted against the values they're predicting
4142

4243
## Bug fixes
4344

@@ -69,7 +70,7 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicat
6970
- training window step debugged
7071
- `min_train_window` argument removed from canned forecasters
7172
- add forecasters
72-
- implement postprocessing
73+
- implement post-processing
7374
- vignettes avaliable
7475
- arx_forecaster
7576
- pkgdown

R/arx_classifier.R

Lines changed: 103 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,106 @@
11
#' Direct autoregressive classifier with covariates
22
#'
3-
#' This is an autoregressive classification model for
4-
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct" forecasting, meaning
5-
#' that it estimates a class at a particular target horizon.
3+
#'
4+
#' @description
5+
#' This is an autoregressive classification model for continuous data. It does
6+
#' "direct" forecasting, meaning that it estimates a class at a particular
7+
#' target horizon.
8+
#'
9+
#' @details
10+
#' The `arx_classifier()` is an autoregressive classification model for `epi_df`
11+
#' data that is used to predict a discrete class for each case under
12+
#' consideration. It is a direct forecaster in that it estimates the classes
13+
#' at a specific horizon or ahead value.
14+
#'
15+
#' To get a sense of how the `arx_classifier()` works, let's consider a simple
16+
#' example with minimal inputs. For this, we will use the built-in
17+
#' `covid_case_death_rates` that contains confirmed COVID-19 cases and deaths
18+
#' from JHU CSSE for all states over Dec 31, 2020 to Dec 31, 2021. From this,
19+
#' we'll take a subset of data for five states over June 4, 2021 to December
20+
#' 31, 2021. Our objective is to predict whether the case rates are increasing
21+
#' when considering the 0, 7 and 14 day case rates:
22+
#'
23+
#' ```{r}
24+
#' jhu <- covid_case_death_rates %>%
25+
#' filter(
26+
#' time_value >= "2021-06-04",
27+
#' time_value <= "2021-12-31",
28+
#' geo_value %in% c("ca", "fl", "tx", "ny", "nj")
29+
#' )
30+
#'
31+
#' out <- arx_classifier(jhu, outcome = "case_rate", predictors = "case_rate")
32+
#'
33+
#' out$predictions
34+
#' ```
35+
#'
36+
#' The key takeaway from the predictions is that there are two prediction
37+
#' classes: `(-Inf, 0.25]` and `(0.25, Inf)`: the classes to predict must be
38+
#' discrete. The discretization of the real-valued outcome is controlled by
39+
#' the `breaks` argument, which defaults to `0.25`. Such breaks will be
40+
#' automatically extended to cover the entire real line. For example, the
41+
#' default break of `0.25` is silently extended to `breaks = c(-Inf, .25,
42+
#' Inf)` and, therefore, results in two classes: `[-Inf, 0.25]` and `(0.25,
43+
#' Inf)`. These two classes are used to discretize the outcome. The conversion
44+
#' of the outcome to such classes is handled internally. So if discrete
45+
#' classes already exist for the outcome in the `epi_df`, then we recommend to
46+
#' code a classifier from scratch using the `epi_workflow` framework for more
47+
#' control.
48+
#'
49+
#' The `trainer` is a `parsnip` model describing the type of estimation such
50+
#' that `mode = "classification"` is enforced. The two typical trainers that
51+
#' are used are `parsnip::logistic_reg()` for two classes or
52+
#' `parsnip::multinom_reg()` for more than two classes.
53+
#'
54+
#' ```{r}
55+
#' workflows::extract_spec_parsnip(out$epi_workflow)
56+
#' ```
57+
#'
58+
#' From the parsnip model specification, we can see that the trainer used is
59+
#' logistic regression, which is expected for our binary outcome. More
60+
#' complicated trainers like `parsnip::naive_Bayes()` or
61+
#' `parsnip::rand_forest()` may also be used (however, we will stick to the
62+
#' basics in this gentle introduction to the classifier).
63+
#'
64+
#' If you use the default trainer of logistic regression for binary
65+
#' classification and you decide against using the default break of 0.25, then
66+
#' you should only input one break so that there are two classification bins
67+
#' to properly dichotomize the outcome. For example, let's set a break of 0.5
68+
#' instead of relying on the default of 0.25. We can do this by passing 0.5 to
69+
#' the `breaks` argument in `arx_class_args_list()` as follows:
70+
#'
71+
#' ```{r}
72+
#' out_break_0.5 <- arx_classifier(
73+
#' jhu,
74+
#' outcome = "case_rate",
75+
#' predictors = "case_rate",
76+
#' args_list = arx_class_args_list(
77+
#' breaks = 0.5
78+
#' )
79+
#' )
80+
#'
81+
#' out_break_0.5$predictions
82+
#' ```
83+
#' Indeed, we can observe that the two `.pred_class` are now (-Inf, 0.5] and
84+
#' (0.5, Inf). See `help(arx_class_args_list)` for other available
85+
#' modifications.
86+
#'
87+
#' Additional arguments that may be supplied to `arx_class_args_list()` include
88+
#' the expected `lags` and `ahead` arguments for an autoregressive-type model.
89+
#' These have default values of 0, 7, and 14 days for the lags of the
90+
#' predictors and 7 days ahead of the forecast date for predicting the
91+
#' outcome. There is also `n_training` to indicate the upper bound for the
92+
#' number of training rows per key. If you would like some practice with using
93+
#' this, then remove the filtering command to obtain data within "2021-06-04"
94+
#' and "2021-12-31" and instead set `n_training` to be the number of days
95+
#' between these two dates, inclusive of the end points. The end results
96+
#' should be the same. In addition to `n_training`, there are `forecast_date`
97+
#' and `target_date` to specify the date that the forecast is created and
98+
#' intended, respectively. We will not dwell on such arguments here as they
99+
#' are not unique to this classifier or absolutely essential to understanding
100+
#' how it operates. The remaining arguments will be discussed organically, as
101+
#' they are needed to serve our purposes. For information on any remaining
102+
#' arguments that are not discussed here, please see the function
103+
#' documentation for a complete list and their definitions.
6104
#'
7105
#' @inheritParams arx_forecaster
8106
#' @param outcome A character (scalar) specifying the outcome (in the
@@ -68,9 +166,7 @@ arx_classifier <- function(
68166
}
69167
forecast_date <- args_list$forecast_date %||% forecast_date_default
70168
target_date <- args_list$target_date %||% (forecast_date + args_list$ahead)
71-
preds <- forecast(
72-
wf,
73-
) %>%
169+
preds <- forecast(wf) %>%
74170
as_tibble() %>%
75171
select(-time_value)
76172

@@ -249,7 +345,7 @@ arx_class_epi_workflow <- function(
249345
#' be created using growth rates (as the predictors are) or lagged
250346
#' differences. The second case is closer to the requirements for the
251347
#' [2022-23 CDC Flusight Hospitalization Experimental Target](https://github.com/cdcepi/Flusight-forecast-data/blob/745511c436923e1dc201dea0f4181f21a8217b52/data-experimental/README.md).
252-
#' See the Classification Vignette for details of how to create a reasonable
348+
#' See the [Classification chapter from the forecasting book](https://cmu-delphi.github.io/delphi-tooling-book/arx-classifier.html) Vignette for details of how to create a reasonable
253349
#' baseline for this case. Selecting `"growth_rate"` (the default) uses
254350
#' [epiprocess::growth_rate()] to create the outcome using some of the
255351
#' additional arguments below. Choosing `"lag_difference"` instead simply

R/arx_forecaster.R

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,28 @@
11
#' Direct autoregressive forecaster with covariates
22
#'
33
#' This is an autoregressive forecasting model for
4-
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct" forecasting, meaning
5-
#' that it estimates a model for a particular target horizon.
4+
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct"
5+
#' forecasting, meaning that it estimates a model for a particular target
6+
#' horizon of the `outcome` based on the lags of the `predictors`. See the [Get
7+
#' started vignette](../articles/epipredict.html) for some worked examples and
8+
#' [Custom epi_workflows vignette](../articles/custom_epiworkflows.html) for a
9+
#' recreation using a custom `epi_workflow()`.
610
#'
711
#'
812
#' @param epi_data An `epi_df` object
9-
#' @param outcome A character (scalar) specifying the outcome (in the
10-
#' `epi_df`).
13+
#' @param outcome A character (scalar) specifying the outcome (in the `epi_df`).
1114
#' @param predictors A character vector giving column(s) of predictor variables.
12-
#' This defaults to the `outcome`. However, if manually specified, only those variables
13-
#' specifically mentioned will be used. (The `outcome` will not be added.)
14-
#' By default, equals the outcome. If manually specified, does not add the
15-
#' outcome variable, so make sure to specify it.
16-
#' @param trainer A `{parsnip}` model describing the type of estimation.
17-
#' For now, we enforce `mode = "regression"`.
18-
#' @param args_list A list of customization arguments to determine
19-
#' the type of forecasting model. See [arx_args_list()].
15+
#' This defaults to the `outcome`. However, if manually specified, only those
16+
#' variables specifically mentioned will be used, and the `outcome` will not be
17+
#' added.
18+
#' @param trainer A `{parsnip}` model describing the type of estimation. For
19+
#' now, we enforce `mode = "regression"`.
20+
#' @param args_list A list of customization arguments to determine the type of
21+
#' forecasting model. See [arx_args_list()].
2022
#'
21-
#' @return A list with (1) `predictions` an `epi_df` of predicted values
22-
#' and (2) `epi_workflow`, a list that encapsulates the entire estimation
23-
#' workflow
23+
#' @return An `arx_fcast`, with the fields `predictions` and `epi_workflow`.
24+
#' `predictions` is a `tibble` of predicted values while `epi_workflow()` is
25+
#' the fit workflow used to make those predictions
2426
#' @export
2527
#' @seealso [arx_fcast_epi_workflow()], [arx_args_list()]
2628
#'
@@ -29,15 +31,18 @@
2931
#' dplyr::filter(time_value >= as.Date("2021-12-01"))
3032
#'
3133
#' out <- arx_forecaster(
32-
#' jhu, "death_rate",
34+
#' jhu,
35+
#' "death_rate",
3336
#' c("case_rate", "death_rate")
3437
#' )
3538
#'
36-
#' out <- arx_forecaster(jhu, "death_rate",
39+
#' out <- arx_forecaster(jhu,
40+
#' "death_rate",
3741
#' c("case_rate", "death_rate"),
3842
#' trainer = quantile_reg(),
3943
#' args_list = arx_args_list(quantile_levels = 1:9 / 10)
4044
#' )
45+
#' out
4146
arx_forecaster <- function(
4247
epi_data,
4348
outcome,
@@ -60,7 +65,7 @@ arx_forecaster <- function(
6065
forecast_date <- args_list$forecast_date %||% forecast_date_default
6166

6267

63-
preds <- forecast(wf, forecast_date = forecast_date) %>%
68+
preds <- forecast(wf) %>%
6469
as_tibble() %>%
6570
select(-time_value)
6671

@@ -262,10 +267,12 @@ arx_fcast_epi_workflow <- function(
262267
#' @param quantile_levels Vector or `NULL`. A vector of probabilities to produce
263268
#' prediction intervals. These are created by computing the quantiles of
264269
#' training residuals. A `NULL` value will result in point forecasts only.
265-
#' @param symmetrize Logical. The default `TRUE` calculates
266-
#' symmetric prediction intervals. This argument only applies when
267-
#' residual quantiles are used. It is not applicable with
268-
#' `trainer = quantile_reg()`, for example.
270+
#' @param symmetrize Logical. The default `TRUE` calculates symmetric prediction
271+
#' intervals. This argument only applies when residual quantiles are used. It
272+
#' is not applicable with `trainer = quantile_reg()`, for example. Typically, one
273+
#' would only want non-symmetric quantiles when increasing trajectories are
274+
#' quite different from decreasing ones, such as a strictly postive variable
275+
#' near zero.
269276
#' @param nonneg Logical. The default `TRUE` enforces nonnegative predictions
270277
#' by hard-thresholding at 0.
271278
#' @param quantile_by_key Character vector. Groups residuals by listed keys

0 commit comments

Comments
 (0)