Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs overhaul #431

Open
wants to merge 45 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
2100811
second readthrough of README.Rmd
dsweber2 Jan 23, 2025
70ca389
make for manually clearing the caches
dsweber2 Jan 23, 2025
69cc61d
styler
dsweber2 Jan 23, 2025
7e7eaf6
missed an image in the man version
dsweber2 Jan 23, 2025
90709dc
getting started page
dsweber2 Jan 24, 2025
83012fa
fixing rebase problem
dsweber2 Jan 24, 2025
61da679
linewrapping
dsweber2 Jan 24, 2025
b27cafb
pushing only the dev docs
dsweber2 Jan 24, 2025
6f66c6a
isn't building the readme
dsweber2 Jan 24, 2025
30513f2
readme.rmd red/yellow -> blue/black
dsweber2 Jan 24, 2025
a4df870
training on only the shown subset
dsweber2 Jan 27, 2025
5067e24
autoplot new data
dsweber2 Jan 27, 2025
07d7435
using new autoplot
dsweber2 Jan 27, 2025
1cabd55
getting started first draft
dsweber2 Jan 31, 2025
2cfa098
much more complete guts example, branching flatline fixes
dsweber2 Feb 7, 2025
59da948
fix for flatline discovered, rename guts
dsweber2 Feb 7, 2025
a67d7c1
docs, styler
dsweber2 Feb 10, 2025
e8854a8
passing check & news
dsweber2 Feb 10, 2025
ae332f8
revising custom_epiworkflows
dsweber2 Feb 10, 2025
682f2db
some more editing
dsweber2 Feb 11, 2025
6622308
finished custom_workflows, reviewing backtesting
dsweber2 Feb 11, 2025
07b47f9
backtesting rmd rewrite
dsweber2 Feb 25, 2025
17bfd1b
dropping CAN backtesting example b/c ~no revisions
dsweber2 Feb 25, 2025
f49720f
formatting
dsweber2 Feb 25, 2025
c6e55de
|> in backtesting, dropped a section in get started
dsweber2 Feb 25, 2025
3cd58ac
landing page wording and get code running
nmdefries Feb 28, 2025
34c42a3
landing page again but in Rmd
nmdefries Mar 1, 2025
2f14ff8
consistent naming, 7dav pull instead of manually
dsweber2 Mar 3, 2025
d91587c
going back to just using the API call
dsweber2 Mar 4, 2025
015d585
recipes version, include epiprocess in the rmds
dsweber2 Mar 4, 2025
1a984e7
rebuild landing page
nmdefries Mar 4, 2025
7312a38
first half epipredict.Rmd
nmdefries Mar 4, 2025
48154eb
custom header, dropping arx_classifier smooth-qr
dsweber2 Mar 4, 2025
f075b5d
follow up on first half of epipredict.Rmd
dsweber2 Mar 5, 2025
7d7bd28
avoid [link] parsing
dsweber2 Mar 5, 2025
fd27a61
reorganize reference page
dsweber2 Mar 5, 2025
8dca98b
postprocessing -> post-processing
dsweber2 Mar 5, 2025
792a3be
lots of reference updates
dsweber2 Mar 6, 2025
15f83d9
forecast needs `...` as a generic
dsweber2 Mar 7, 2025
e0bcba8
include climate, only calculate necessary days
dsweber2 Mar 14, 2025
c5a37a9
Adding short blurb on cdc_flatline
dsweber2 Mar 18, 2025
d6edae8
extra details for symmetrize
dsweber2 Mar 18, 2025
b786750
epipredict.Rmd
nmdefries Mar 28, 2025
af932cf
backtesting.rmd
nmdefries Apr 3, 2025
d83d9d8
first half custom_epiworkflows.Rmd
nmdefries Apr 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Imports:
lifecycle,
lubridate,
magrittr,
recipes (>= 1.0.4),
recipes (>= 1.1.1),
rlang (>= 1.1.0),
stats,
tibble,
Expand All @@ -53,6 +53,7 @@ Suggests:
epidatr (>= 1.0.0),
fs,
grf,
here,
knitr,
poissonreg,
purrr,
Expand Down
6 changes: 5 additions & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,14 @@ R -e 'devtools::document()'
R -e 'pkgdown::build_site()'
```

Note that sometimes the caches from either `pkgdown` or `knitr` can cause
difficulties. To clear those, run `make`, with either `clean_knitr`,
`clean_site`, or `clean` (which does both).

If you work without R Studio and want to iterate on documentation, you might
find [this
script](https://gist.github.com/gadenbuie/d22e149e65591b91419e41ea5b2e0621)
helpful.
helpful. For updating references, you will need to manually call `pkgdown::build_reference()`.

## Versioning

Expand Down
14 changes: 14 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
##
# epipredict docs build
#

# knitr doesn't actually clean it's own cache properly; this just deletes any of
# the article knitr caches in vignettes or the base
clean_knitr:
rm -r *_cache; rm -r vignettes/*_cache
clean_site:
Rscript -e "pkgdown::clean_cache(); pkgdown::clean_site()"
# this combines
clean: clean_knitr clean_site

# end
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicat
- Replace `dist_quantiles()` with `hardhat::quantile_pred()`
- Allow `quantile()` to threshold to an interval if desired (#434)
- `arx_forecaster()` detects if there's enough data to predict
- Add `plot_data` to `autoplot` so that forecasts can be plotted against the values they're predicting

## Bug fixes

Expand Down Expand Up @@ -68,7 +69,7 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicat
- training window step debugged
- `min_train_window` argument removed from canned forecasters
- add forecasters
- implement postprocessing
- implement post-processing
- vignettes avaliable
- arx_forecaster
- pkgdown
Expand Down
110 changes: 103 additions & 7 deletions R/arx_classifier.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,106 @@
#' Direct autoregressive classifier with covariates
#'
#' This is an autoregressive classification model for
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct" forecasting, meaning
#' that it estimates a class at a particular target horizon.
#'
#' @description
#' This is an autoregressive classification model for continuous data. It does
#' "direct" forecasting, meaning that it estimates a class at a particular
#' target horizon.
#'
#' @details
#' The `arx_classifier()` is an autoregressive classification model for `epi_df`
#' data that is used to predict a discrete class for each case under
#' consideration. It is a direct forecaster in that it estimates the classes
#' at a specific horizon or ahead value.
#'
#' To get a sense of how the `arx_classifier()` works, let's consider a simple
#' example with minimal inputs. For this, we will use the built-in
#' `covid_case_death_rates` that contains confirmed COVID-19 cases and deaths
#' from JHU CSSE for all states over Dec 31, 2020 to Dec 31, 2021. From this,
#' we'll take a subset of data for five states over June 4, 2021 to December
#' 31, 2021. Our objective is to predict whether the case rates are increasing
#' when considering the 0, 7 and 14 day case rates:
#'
#' ```{r}
#' jhu <- covid_case_death_rates %>%
#' filter(
#' time_value >= "2021-06-04",
#' time_value <= "2021-12-31",
#' geo_value %in% c("ca", "fl", "tx", "ny", "nj")
#' )
#'
#' out <- arx_classifier(jhu, outcome = "case_rate", predictors = "case_rate")
#'
#' out$predictions
#' ```
#'
#' The key takeaway from the predictions is that there are two prediction
#' classes: `(-Inf, 0.25]` and `(0.25, Inf)`. This is because for our goal of
#' classification the classes must be discrete. The discretization of the
#' real-valued outcome is controlled by the `breaks` argument, which defaults
#' to `0.25`. Such breaks will be automatically extended to cover the entire
#' real line. For example, the default break of `0.25` is silently extended to
#' `breaks = c(-Inf, .25, Inf)` and, therefore, results in two classes:
#' `[-Inf, 0.25]` and `(0.25, Inf)`. These two classes are used to discretize
#' the outcome. The conversion of the outcome to such classes is handled
#' internally. So if discrete classes already exist for the outcome in the
#' `epi_df`, then we recommend to code a classifier from scratch using the
#' `epi_workflow` framework for more control.
#'
#' The `trainer` is a `parsnip` model describing the type of estimation such
#' that `mode = "classification"` is enforced. The two typical trainers that
#' are used are `parsnip::logistic_reg()` for two classes or
#' `parsnip::multinom_reg()` for more than two classes.
#'
#' ```{r}
#' workflows::extract_spec_parsnip(out$epi_workflow)
#' ```
#'
#' From the parsnip model specification, we can see that the trainer used is
#' logistic regression, which is expected for our binary outcome. More
#' complicated trainers like `parsnip::naive_Bayes()` or
#' `parsnip::rand_forest()` may also be used (however, we will stick to the
#' basics in this gentle introduction to the classifier).
#'
#' If you use the default trainer of logistic regression for binary
#' classification and you decide against using the default break of 0.25, then
#' you should only input one break so that there are two classification bins
#' to properly dichotomize the outcome. For example, let's set a break of 0.5
#' instead of relying on the default of 0.25. We can do this by passing 0.5 to
#' the `breaks` argument in `arx_class_args_list()` as follows:
#'
#' ```{r}
#' out_break_0.5 <- arx_classifier(
#' jhu,
#' outcome = "case_rate",
#' predictors = "case_rate",
#' args_list = arx_class_args_list(
#' breaks = 0.5
#' )
#' )
#'
#' out_break_0.5$predictions
#' ```
#' Indeed, we can observe that the two `.pred_class` are now (-Inf, 0.5] and
#' (0.5, Inf). See `help(arx_class_args_list)` for other available
#' modifications.
#'
#' Additional arguments that may be supplied to `arx_class_args_list()` include
#' the expected `lags` and `ahead` arguments for an autoregressive-type model.
#' These have default values of 0, 7, and 14 days for the lags of the
#' predictors and 7 days ahead of the forecast date for predicting the
#' outcome. There is also `n_training` to indicate the upper bound for the
#' number of training rows per key. If you would like some practice with using
#' this, then remove the filtering command to obtain data within "2021-06-04"
#' and "2021-12-31" and instead set `n_training` to be the number of days
#' between these two dates, inclusive of the end points. The end results
#' should be the same. In addition to `n_training`, there are `forecast_date`
#' and `target_date` to specify the date that the forecast is created and
#' intended, respectively. We will not dwell on such arguments here as they
#' are not unique to this classifier or absolutely essential to understanding
#' how it operates. The remaining arguments will be discussed organically, as
#' they are needed to serve our purposes. For information on any remaining
#' arguments that are not discussed here, please see the function
#' documentation for a complete list and their definitions.
#'
#' @inheritParams arx_forecaster
#' @param outcome A character (scalar) specifying the outcome (in the
Expand Down Expand Up @@ -68,9 +166,7 @@ arx_classifier <- function(
}
forecast_date <- args_list$forecast_date %||% forecast_date_default
target_date <- args_list$target_date %||% (forecast_date + args_list$ahead)
preds <- forecast(
wf,
) %>%
preds <- forecast(wf) %>%
as_tibble() %>%
select(-time_value)

Expand Down Expand Up @@ -249,7 +345,7 @@ arx_class_epi_workflow <- function(
#' be created using growth rates (as the predictors are) or lagged
#' differences. The second case is closer to the requirements for the
#' [2022-23 CDC Flusight Hospitalization Experimental Target](https://github.com/cdcepi/Flusight-forecast-data/blob/745511c436923e1dc201dea0f4181f21a8217b52/data-experimental/README.md).
#' See the Classification Vignette for details of how to create a reasonable
#' See the [Classification chapter from the forecasting book](https://cmu-delphi.github.io/delphi-tooling-book/arx-classifier.html) Vignette for details of how to create a reasonable
#' baseline for this case. Selecting `"growth_rate"` (the default) uses
#' [epiprocess::growth_rate()] to create the outcome using some of the
#' additional arguments below. Choosing `"lag_difference"` instead simply
Expand Down
53 changes: 31 additions & 22 deletions R/arx_forecaster.R
Original file line number Diff line number Diff line change
@@ -1,26 +1,29 @@
#' Direct autoregressive forecaster with covariates
#'
#' This is an autoregressive forecasting model for
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct" forecasting, meaning
#' that it estimates a model for a particular target horizon.
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct"
#' forecasting, meaning that it estimates a model for a particular target
#' horizon of `outcome` based on the lags of the `predictors`. See the [Get
#' started vignette](../articles/epipredict.html) for some worked examples and
#' [Custom epi_workflows vignette](../articles/custom_epiworkflows.html) for a
#' recreation using a custom `epi_workflow()`.
#'
#'
#' @param epi_data An `epi_df` object
#' @param outcome A character (scalar) specifying the outcome (in the
#' `epi_df`).
#' @param outcome A character (scalar) specifying the outcome (in the `epi_df`).
#' @param predictors A character vector giving column(s) of predictor variables.
#' This defaults to the `outcome`. However, if manually specified, only those variables
#' specifically mentioned will be used. (The `outcome` will not be added.)
#' By default, equals the outcome. If manually specified, does not add the
#' outcome variable, so make sure to specify it.
#' @param trainer A `{parsnip}` model describing the type of estimation.
#' For now, we enforce `mode = "regression"`.
#' @param args_list A list of customization arguments to determine
#' the type of forecasting model. See [arx_args_list()].
#' This defaults to the `outcome`. However, if manually specified, only those
#' variables specifically mentioned will be used. (The `outcome` will not be
#' added.) By default, equals the outcome. If manually specified, does not
#' add the outcome variable, so make sure to specify it.
#' @param trainer A `{parsnip}` model describing the type of estimation. For
#' now, we enforce `mode = "regression"`.
#' @param args_list A list of customization arguments to determine the type of
#' forecasting model. See [arx_args_list()].
#'
#' @return A list with (1) `predictions` an `epi_df` of predicted values
#' and (2) `epi_workflow`, a list that encapsulates the entire estimation
#' workflow
#' @return An `arx_fcast`, with the fields `predictions` and `epi_workflow`.
#' `predictions` is an `epi_df` of predicted values while `epi_workflow()` is
#' the fit workflow used to make those predictions
#' @export
#' @seealso [arx_fcast_epi_workflow()], [arx_args_list()]
#'
Expand All @@ -29,15 +32,18 @@
#' dplyr::filter(time_value >= as.Date("2021-12-01"))
#'
#' out <- arx_forecaster(
#' jhu, "death_rate",
#' jhu,
#' "death_rate",
#' c("case_rate", "death_rate")
#' )
#'
#' out <- arx_forecaster(jhu, "death_rate",
#' out <- arx_forecaster(jhu,
#' "death_rate",
#' c("case_rate", "death_rate"),
#' trainer = quantile_reg(),
#' args_list = arx_args_list(quantile_levels = 1:9 / 10)
#' )
#' out
arx_forecaster <- function(
epi_data,
outcome,
Expand All @@ -60,7 +66,7 @@ arx_forecaster <- function(
forecast_date <- args_list$forecast_date %||% forecast_date_default


preds <- forecast(wf, forecast_date = forecast_date) %>%
preds <- forecast(wf) %>%
as_tibble() %>%
select(-time_value)

Expand Down Expand Up @@ -262,10 +268,13 @@ arx_fcast_epi_workflow <- function(
#' @param quantile_levels Vector or `NULL`. A vector of probabilities to produce
#' prediction intervals. These are created by computing the quantiles of
#' training residuals. A `NULL` value will result in point forecasts only.
#' @param symmetrize Logical. The default `TRUE` calculates
#' symmetric prediction intervals. This argument only applies when
#' residual quantiles are used. It is not applicable with
#' `trainer = quantile_reg()`, for example.
#' @param symmetrize Logical. The default `TRUE` calculates symmetric prediction
#' intervals. This argument only applies when residual quantiles are used. It
#' is not applicable with `trainer = quantile_reg()`, for example. This is
#' achieved by including both the residuals and their negation. Typically, one
#' would only want non-symmetric quantiles when increasing trajectories are
#' quite different from decreasing ones, such as a strictly postive variable
#' near zero.
#' @param nonneg Logical. The default `TRUE` enforces nonnegative predictions
#' by hard-thresholding at 0.
#' @param quantile_by_key Character vector. Groups residuals by listed keys
Expand Down
Loading