|
| 1 | +--- |
| 2 | +output: github_document |
| 3 | +--- |
| 4 | + |
| 5 | +<!-- README.md is generated from README.Rmd. Please edit that file --> |
| 6 | + |
| 7 | +```{r, include = FALSE} |
| 8 | +knitr::opts_chunk$set( |
| 9 | + collapse = TRUE, |
| 10 | + comment = "#>", |
| 11 | + fig.path = "man/figures/README-", |
| 12 | + out.width = "100%" |
| 13 | +) |
| 14 | +ggplot2::theme_set(ggplot2::theme_bw()) |
| 15 | +``` |
| 16 | + |
| 17 | +# epiprocess |
| 18 | + |
| 19 | +## TODO: Condense these paragraphs |
| 20 | + |
| 21 | +The [`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) package works |
| 22 | +with epidemiological time series data to provide situational |
| 23 | +awareness, processing, and transformations in preparation for modeling, and |
| 24 | +version-faithful model backtesting. It contains: |
| 25 | + |
| 26 | +- `epi_df`, a class for working with epidemiological time series data which |
| 27 | +behaves like a tibble (and can be manipulated with |
| 28 | +[`{dplyr}`](https://dplyr.tidyverse.org/)-esque "verbs") but with some |
| 29 | +additional structure; |
| 30 | +- `epi_archive`, a class for working with the version history of such time series data; |
| 31 | +- sample epidemiological data in these formats; |
| 32 | + |
| 33 | +This package is provided by the Delphi group at Carnegie Mellon University. The |
| 34 | +Delphi group provides many tools also hosts the Delphi Epidata API, which provides access to a wide |
| 35 | +range of epidemiological data sets, including COVID-19 data, flu data, and more. |
| 36 | +This package is designed to work seamlessly with the data in the Delphi Epidata |
| 37 | +API, which can be accessed using the `epidatr` package. |
| 38 | + |
| 39 | +It is part of a broader suite of packages that includes |
| 40 | +[`{epipredict}`](https://cmu-delphi.github.io/epipredict/), |
| 41 | +[`{epidatr}`](https://cmu-delphi.github.io/epidatr/), |
| 42 | +[`{rtestim}`](https://dajmcdon.github.io/rtestim/), and |
| 43 | +[`{epidatasets}`](https://cmu-delphi.github.io/epidatasets/), for accessing, |
| 44 | +analyzing, and forecasting epidemiological time series data. We have expanded |
| 45 | +documentation and demonstrations for some of these packages available in an |
| 46 | +online "book" format [here](https://cmu-delphi.github.io/delphi-tooling-book/). |
| 47 | + |
| 48 | +## Motivation |
| 49 | + |
| 50 | +[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) and |
| 51 | +[`{epipredict}`](https://cmu-delphi.github.io/epipredict/) are designed to lower |
| 52 | +the barrier to entry and implementation cost for epidemiological time series |
| 53 | +analysis and forecasting. Epidemiologists and forecasting groups repeatedly and |
| 54 | +separately have had to rush to implement this type of functionality in a much |
| 55 | +more ad hoc manner; we are trying to save such effort in the future by providing |
| 56 | +well-documented, tested, and general packages that can be called for many common |
| 57 | +tasks instead. |
| 58 | + |
| 59 | +## Installation |
| 60 | + |
| 61 | +To install: |
| 62 | + |
| 63 | +```{r, eval=FALSE} |
| 64 | +# Stable version |
| 65 | +pak::pkg_install("cmu-delphi/epiprocess@main") |
| 66 | +
|
| 67 | +# Dev version |
| 68 | +pak::pkg_install("cmu-delphi/epiprocess@dev") |
| 69 | +``` |
| 70 | + |
| 71 | +The package is not yet on CRAN. |
| 72 | + |
| 73 | +## Usage |
| 74 | + |
| 75 | +Once `epiprocess` and `epidatr` are installed, you can use the following code to |
| 76 | +get started: |
| 77 | + |
| 78 | +```{r, results=FALSE, warning=FALSE, message=FALSE} |
| 79 | +library(epiprocess) |
| 80 | +library(epidatr) |
| 81 | +library(dplyr) |
| 82 | +library(magrittr) |
| 83 | +``` |
| 84 | + |
| 85 | +Get COVID-19 confirmed cumulative case data from JHU CSSE for California, |
| 86 | +Florida, New York, and Texas, from March 1, 2020 to January 31, 2022 |
| 87 | + |
| 88 | +```{r cache=TRUE} |
| 89 | +df <- pub_covidcast( |
| 90 | + source = "jhu-csse", |
| 91 | + signals = "confirmed_cumulative_num", |
| 92 | + geo_type = "state", |
| 93 | + time_type = "day", |
| 94 | + geo_values = "ca,fl,ny,tx", |
| 95 | + time_values = epirange(20200301, 20220131), |
| 96 | +) %>% |
| 97 | + select(geo_value, time_value, cases_cumulative = value) |
| 98 | +df |
| 99 | +``` |
| 100 | + |
| 101 | +Convert the data to an epi_df object and sort by geo_value and time_value. You |
| 102 | +can work with the epi_df object like a tibble using dplyr |
| 103 | + |
| 104 | +```{r} |
| 105 | +edf <- df %>% |
| 106 | + as_epi_df() %>% |
| 107 | + arrange_canonical() %>% |
| 108 | + group_by(geo_value) %>% |
| 109 | + mutate(cases_daily = cases_cumulative - lag(cases_cumulative, default = 0)) |
| 110 | +edf |
| 111 | +``` |
| 112 | + |
| 113 | +Autoplot the confirmed daily cases for each geo_value |
| 114 | + |
| 115 | +```{r} |
| 116 | +edf %>% |
| 117 | + autoplot(cases_cumulative) |
| 118 | +``` |
| 119 | + |
| 120 | +Compute the 7 day moving average of the confirmed daily cases for each geo_value |
| 121 | + |
| 122 | +```{r} |
| 123 | +edf %>% |
| 124 | + group_by(geo_value) %>% |
| 125 | + epi_slide_mean(cases_daily, .window_size = 7, na.rm = TRUE) |
| 126 | +``` |
| 127 | + |
| 128 | +Compute the growth rate of the confirmed cumulative cases for each geo_value |
| 129 | + |
| 130 | +```{r} |
| 131 | +edf %>% |
| 132 | + group_by(geo_value) %>% |
| 133 | + mutate(cases_growth = growth_rate(x = time_value, y = cases_cumulative, method = "rel_change", h = 7)) |
| 134 | +``` |
| 135 | + |
| 136 | +Detect outliers in the growth rate of the confirmed cumulative cases for each |
| 137 | + |
| 138 | +```{r} |
| 139 | +edf %>% |
| 140 | + group_by(geo_value) %>% |
| 141 | + mutate(outlier_info = detect_outlr(x = time_value, y = cases_daily)) %>% |
| 142 | + ungroup() |
| 143 | +``` |
| 144 | + |
| 145 | +Add a column to the epi_df object with the daily deaths for each geo_value and |
| 146 | +compute the correlations between cases and deaths for each geo_value |
| 147 | + |
| 148 | +```{r cache=TRUE} |
| 149 | +df <- pub_covidcast( |
| 150 | + source = "jhu-csse", |
| 151 | + signals = "deaths_incidence_num", |
| 152 | + geo_type = "state", |
| 153 | + time_type = "day", |
| 154 | + geo_values = "ca,fl,ny,tx", |
| 155 | + time_values = epirange(20200301, 20220131), |
| 156 | +) %>% |
| 157 | + select(geo_value, time_value, deaths_daily = value) %>% |
| 158 | + as_epi_df() %>% |
| 159 | + arrange_canonical() |
| 160 | +edf <- inner_join(edf, df, by = c("geo_value", "time_value")) |
| 161 | +edf %>% |
| 162 | + group_by(geo_value) %>% |
| 163 | + epi_slide_mean(deaths_daily, .window_size = 7, na.rm = TRUE) %>% |
| 164 | + epi_cor(cases_daily, deaths_daily) |
| 165 | +``` |
0 commit comments