Skip to content

Commit 115fad6

Browse files
committed
Merge branch 'cmu-delphi/main' into km/growth_rate
2 parents 7864cdd + b8d5ac6 commit 115fad6

14 files changed

+267
-110
lines changed

.github/CODEOWNERS

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
@dajmcdon @brookslogan
1+
* @dajmcdon @brookslogan

DESCRIPTION

+8-4
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,19 @@ Authors@R: c(
66
person("Jacob", "Bien", role = "ctb"),
77
person("Logan", "Brooks", role = "aut"),
88
person("Rafael", "Catoia", role = "ctb"),
9-
person("Daniel", "McDonald", role = "ctb"),
9+
person("Daniel", "McDonald", role = "aut"),
10+
person("Rachel", "Lobay", role = "ctb"),
11+
person("Ken", "Mawer", role = "ctb"),
12+
person("Chloe", "You", role = "ctb"),
1013
person("Quang", "Nguyen", role = "ctb"),
1114
person("Evan", "Ray", role = "aut"),
1215
person("Dmitry", "Shemetov", role = "ctb"),
1316
person("Ryan", "Tibshirani", , "[email protected]", role = c("aut", "cre"))
1417
)
15-
Description: This package introduces a common data structure for
16-
epidemiological data sets measured over space and time, and offers
17-
associated utilities to perform basic signal processing tasks.
18+
Description: This package introduces a common data structure for epidemiological
19+
data reported by location and time, provides another data structure to
20+
work with revisions to these data sets over time, and offers associated
21+
utilities to perform basic signal processing tasks.
1822
License: MIT + file LICENSE
1923
Imports:
2024
data.table,

NAMESPACE

+1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ export(growth_rate)
4040
export(is_epi_archive)
4141
export(is_epi_df)
4242
export(mutate)
43+
export(new_epi_df)
4344
export(relocate)
4445
export(rename)
4546
export(slice)

R/data.R

+59-41
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,53 @@
1-
#' Subset of JHU daily cases and deaths from California, Florida, Texas, New York, Georgia, and Pennsylvania
1+
#' Subset of JHU daily state cases and deaths
22
#'
3-
#' This data source of confirmed COVID-19 cases and deaths
4-
#' is based on reports made available by the Center for
5-
#' Systems Science and Engineering at Johns Hopkins University.
6-
#' This example data ranges from Mar 1, 2020 to Dec 31, 2021, and is limited to California, Florida, Texas, New York, Georgia, and Pennsylvania.
3+
#' This data source of confirmed COVID-19 cases and deaths
4+
#' is based on reports made available by the Center for
5+
#' Systems Science and Engineering at Johns Hopkins University.
6+
#' This example data ranges from Mar 1, 2020 to Dec 31, 2021, and is limited to
7+
#' California, Florida, Texas, New York, Georgia, and Pennsylvania.
78
#'
89
#' @format A tibble with 4026 rows and 6 variables:
910
#' \describe{
10-
#' \item{geo_value}{the geographic value associated with each row of measurements.}
11+
#' \item{geo_value}{the geographic value associated with each row
12+
#' of measurements.}
1113
#' \item{time_value}{the time value associated with each row of measurements.}
12-
#' \item{case_rate_7d_av}{7-day average signal of number of new confirmed COVID-19 cases per 100,000 population, daily}
13-
#' \item{death_rate_7d_av}{7-day average signal of number of new confirmed deaths due to COVID-19 per 100,000 population, daily}
14+
#' \item{case_rate_7d_av}{7-day average signal of number of new
15+
#' confirmed COVID-19 cases per 100,000 population, daily}
16+
#' \item{death_rate_7d_av}{7-day average signal of number of new confirmed
17+
#' deaths due to COVID-19 per 100,000 population, daily}
1418
#' \item{cases}{Number of new confirmed COVID-19 cases, daily}
15-
#' \item{cases_7d_av}{7-day average signal of number of new confirmed COVID-19 cases, daily}
19+
#' \item{cases_7d_av}{7-day average signal of number of new confirmed
20+
#' COVID-19 cases, daily}
1621
#' }
17-
#' @source This object contains a modified part of the \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University} as \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{republished in the COVIDcast Epidata API}. This data set is licensed under the terms of the
18-
#' \href{https://creativecommons.org/licenses/by/4.0/}{Creative Commons Attribution 4.0 International license}
19-
#' by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering.
20-
#' Copyright Johns Hopkins University 2020.
21-
#'
22+
#' @source This object contains a modified part of the
23+
#' \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University}
24+
#' as \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{republished in the COVIDcast Epidata API}.
25+
#' This data set is licensed under the terms of the
26+
#' \href{https://creativecommons.org/licenses/by/4.0/}{Creative Commons Attribution 4.0 International license}
27+
#' by the Johns Hopkins University on behalf of its Center for Systems Science
28+
#' in Engineering. Copyright Johns Hopkins University 2020.
29+
#'
2230
#' Modifications:
23-
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{From the COVIDcast Epidata API}: These signals are taken directly from the JHU CSSE \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 GitHub repository} without changes. The 7-day average signals are computed by Delphi by calculating moving averages of the preceding 7 days, so the signal for June 7 is the average of the underlying data for June 1 through 7, inclusive.
24-
#' * Furthermore, the data has been limited to a very small number of rows, the signal names slightly altered, and formatted into a tibble.
31+
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{From the COVIDcast Epidata API}:
32+
#' These signals are taken directly from the JHU CSSE
33+
#' \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 GitHub repository}
34+
#' without changes. The 7-day average signals are computed by Delphi by
35+
#' calculating moving averages of the preceding 7 days, so the signal for
36+
#' June 7 is the average of the underlying data for June 1 through 7,
37+
#' inclusive.
38+
#' * Furthermore, the data has been limited to a very small number of rows,
39+
#' the signal names slightly altered, and formatted into a tibble.
2540
"jhu_csse_daily_subset"
2641

2742

28-
#' Subset of daily doctor visits and cases from California, Florida, Texas, and New York in archive format
43+
#' Subset of daily doctor visits and cases in archive format
2944
#'
30-
#' This data source is based on information about outpatient visits,
31-
#' provided to us by health system partners, and also contains confirmed
32-
#' COVID-19 cases based on reports made available by the Center for
33-
#' Systems Science and Engineering at Johns Hopkins University.
34-
#' This example data ranges from June 1, 2020 to Dec 1, 2021, and is also limited to California, Florida, Texas, and New York.
45+
#' This data source is based on information about outpatient visits,
46+
#' provided to us by health system partners, and also contains confirmed
47+
#' COVID-19 cases based on reports made available by the Center for
48+
#' Systems Science and Engineering at Johns Hopkins University.
49+
#' This example data ranges from June 1, 2020 to Dec 1, 2021, and
50+
#' is also limited to California, Florida, Texas, and New York.
3551
#'
3652
#' @format An `epi_archive` data format. The data table DT has 129,638 rows and 5 columns:
3753
#' \describe{
@@ -41,26 +57,28 @@
4157
#' \item{percent_cli}{percentage of doctor’s visits with CLI (COVID-like illness) computed from medical insurance claims}
4258
#' \item{case_rate_7d_av}{7-day average signal of number of new confirmed deaths due to COVID-19 per 100,000 population, daily}
4359
#' }
44-
#' @source
60+
#' @source
4561
#' This object contains a modified part of the \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University} as \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{republished in the COVIDcast Epidata API}. This data set is licensed under the terms of the
4662
#' \href{https://creativecommons.org/licenses/by/4.0/}{Creative Commons Attribution 4.0 International license}
4763
#' by Johns Hopkins University on behalf of its Center for Systems Science in Engineering.
4864
#' Copyright Johns Hopkins University 2020.
49-
#'
65+
#'
5066
#' Modifications:
51-
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html}{From the COVIDcast Epidata Doctor Visits API}: These signals are taken directly from the JHU CSSE \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 GitHub repository} without changes. The 7-day average signals are computed by Delphi by calculating moving averages of the preceding 7 days, so the signal for June 7 is the average of the underlying data for June 1 through 7, inclusive.
52-
#' * Furthermore, the data has been limited to a very small number of rows, the signal names slightly altered, and formatted into a tibble.
67+
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html}{From the COVIDcast Doctor Visits API}: The signal `percent_cli` is taken directly from the API without changes.
68+
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{From the COVIDcast Epidata API}: `case_rate_7d_av` is taken directly from the JHU CSSE \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 GitHub repository} without changes. The 7-day average signals are computed by Delphi by calculating moving averages of the preceding 7 days, so the signal for June 7 is the average of the underlying data for June 1 through 7, inclusive.
69+
#' * Furthermore, the data is a subset of the full dataset, the signal names slightly altered, and formatted into a tibble.
5370
"archive_cases_dv_subset"
5471

5572

5673
#' Subset of JHU daily cases from California and Florida
57-
#'
58-
#' This data source of confirmed COVID-19 cases
59-
#' is based on reports made available by the Center for
60-
#' Systems Science and Engineering at Johns Hopkins University.
61-
#' This example data is a snapshot as of Oct 28, 2021 and captures the cases from June 1, 2020 to May 31, 2021
74+
#'
75+
#' This data source of confirmed COVID-19 cases
76+
#' is based on reports made available by the Center for
77+
#' Systems Science and Engineering at Johns Hopkins University.
78+
#' This example data is a snapshot as of Oct 28, 2021 and captures the cases
79+
#' from June 1, 2020 to May 31, 2021
6280
#' and is limited to California and Florida.
63-
#'
81+
#'
6482
#' @format A tibble with 730 rows and 3 variables:
6583
#' \describe{
6684
#' \item{geo_value}{the geographic value associated with each row of measurements.}
@@ -71,19 +89,20 @@
7189
#' \href{https://creativecommons.org/licenses/by/4.0/}{Creative Commons Attribution 4.0 International license}
7290
#' by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering.
7391
#' Copyright Johns Hopkins University 2020.
74-
#'
92+
#'
7593
#' Modifications:
76-
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{From the COVIDcast Epidata API}:
77-
#' These signals are taken directly from the JHU CSSE \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 GitHub repository} without changes.
94+
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{From the COVIDcast Epidata API}:
95+
#' These signals are taken directly from the JHU CSSE \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 GitHub repository} without changes.
7896
#' * Furthermore, the data has been limited to a very small number of rows, the signal names slightly altered, and formatted into a tibble.
7997
"incidence_num_outlier_example"
8098

8199
#' Subset of JHU daily cases from counties in Massachusetts and Vermont
82100
#'
83-
#' This data source of confirmed COVID-19 cases and deaths
84-
#' is based on reports made available by the Center for
85-
#' Systems Science and Engineering at Johns Hopkins University.
86-
#' This example data ranges from Mar 1, 2020 to Dec 31, 2021, and is limited to Massachusetts and Vermont.
101+
#' This data source of confirmed COVID-19 cases and deaths
102+
#' is based on reports made available by the Center for
103+
#' Systems Science and Engineering at Johns Hopkins University.
104+
#' This example data ranges from Mar 1, 2020 to Dec 31, 2021,
105+
#' and is limited to Massachusetts and Vermont.
87106
#'
88107
#' @format A tibble with 16,212 rows and 5 variables:
89108
#' \describe{
@@ -97,9 +116,8 @@
97116
#' \href{https://creativecommons.org/licenses/by/4.0/}{Creative Commons Attribution 4.0 International license}
98117
#' by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering.
99118
#' Copyright Johns Hopkins University 2020.
100-
#'
119+
#'
101120
#' Modifications:
102121
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{From the COVIDcast Epidata API}: These signals are taken directly from the JHU CSSE \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 GitHub repository} without changes. The 7-day average signals are computed by Delphi by calculating moving averages of the preceding 7 days, so the signal for June 7 is the average of the underlying data for June 1 through 7, inclusive.
103122
#' * Furthermore, the data has been limited to a very small number of rows, the signal names slightly altered, and formatted into a tibble.
104-
105123
"jhu_csse_county_level_subset"

R/epi_df.R

+81-41
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,85 @@
8484
#' @name epi_df
8585
NULL
8686

87+
88+
#' Creates an `epi_df` object
89+
#'
90+
#' Creates a new `epi_df` object. By default, builds an empty tibble with the
91+
#' correct metadata for an `epi_df` object (ie. `geo_type`, `time_type`, and `as_of`).
92+
#' Refer to the below info. about the arguments for more details.
93+
#'
94+
#' @param x A data.frame, [tibble::tibble], or [tsibble::tsibble] to be converted
95+
#' @param geo_type Type for the geo values. If missing, then the function will
96+
#' attempt to infer it from the geo values present; if this fails, then it
97+
#' will be set to "custom".
98+
#' @param time_type Type for the time values. If missing, then the function will
99+
#' attempt to infer it from the time values present; if this fails, then it
100+
#' will be set to "custom".
101+
#' @param as_of Time value representing the time at which the given data were
102+
#' available. For example, if `as_of` is January 31, 2022, then the `epi_df`
103+
#' object that is created would represent the most up-to-date version of the
104+
#' data available as of January 31, 2022. If the `as_of` argument is missing,
105+
#' then the current day-time will be used.
106+
#' @param additional_metadata List of additional metadata to attach to the
107+
#' `epi_df` object. The metadata will have `geo_type`, `time_type`, and
108+
#' `as_of` fields; named entries from the passed list or will be included as
109+
#' well.
110+
#' @param ... Additional arguments passed to methods.
111+
#' @return An `epi_df` object.
112+
#'
113+
#' @export
114+
new_epi_df = function(x = tibble::tibble(), geo_type, time_type, as_of,
115+
additional_metadata = list(), ...) {
116+
# Check that we have a data frame
117+
if (!is.data.frame(x)) {
118+
Abort("`x` must be a data frame.")
119+
}
120+
121+
# If geo type is missing, then try to guess it
122+
if (missing(geo_type)) {
123+
geo_type = guess_geo_type(x$geo_value)
124+
}
125+
126+
# If time type is missing, then try to guess it
127+
if (missing(time_type)) {
128+
time_type = guess_time_type(x$time_value)
129+
}
130+
131+
# If as_of is missing, then try to guess it
132+
if (missing(as_of)) {
133+
# First check the metadata for an as_of field
134+
if ("metadata" %in% names(attributes(x)) &&
135+
"as_of" %in% names(attributes(x)$metadata)) {
136+
as_of = attributes(x)$metadata$as_of
137+
}
138+
139+
# Next check for as_of, issue, or version columns
140+
else if ("as_of" %in% names(x)) as_of = max(x$as_of)
141+
else if ("issue" %in% names(x)) as_of = max(x$issue)
142+
else if ("version" %in% names(x)) as_of = max(x$version)
143+
144+
# If we got here then we failed
145+
else as_of = Sys.time() # Use the current day-time
146+
}
147+
148+
# Define metadata fields
149+
metadata = list()
150+
metadata$geo_type = geo_type
151+
metadata$time_type = time_type
152+
metadata$as_of = as_of
153+
metadata = c(metadata, additional_metadata)
154+
155+
# Reorder columns (geo_value, time_value, ...)
156+
if(sum(dim(x)) != 0){
157+
x = dplyr::relocate(x, .data$geo_value, .data$time_value)
158+
}
159+
160+
# Apply epi_df class, attach metadata, and return
161+
class(x) = c("epi_df", class(x))
162+
attributes(x)$metadata = metadata
163+
return(x)
164+
}
165+
87166
#' Convert to `epi_df` format
88167
#'
89168
#' Converts a data frame or tibble into an `epi_df` object. See the [getting
@@ -142,47 +221,8 @@ as_epi_df.tbl_df = function(x, geo_type, time_type, as_of,
142221
Abort("`x` must contain a `time_value` column.")
143222
}
144223

145-
# If geo type is missing, then try to guess it
146-
if (missing(geo_type)) {
147-
geo_type = guess_geo_type(x$geo_value)
148-
}
149-
150-
# If time type is missing, then try to guess it
151-
if (missing(time_type)) {
152-
time_type = guess_time_type(x$time_value)
153-
}
154-
155-
# If as_of is missing, then try to guess it
156-
if (missing(as_of)) {
157-
# First check the metadata for an as_of field
158-
if ("metadata" %in% names(attributes(x)) &&
159-
"as_of" %in% names(attributes(x)$metadata)) {
160-
as_of = attributes(x)$metadata$as_of
161-
}
162-
163-
# Next check for as_of, issue, or version columns
164-
else if ("as_of" %in% names(x)) as_of = max(x$as_of)
165-
else if ("issue" %in% names(x)) as_of = max(x$issue)
166-
else if ("version" %in% names(x)) as_of = max(x$version)
167-
168-
# If we got here then we failed
169-
else as_of = Sys.time() # Use the current day-time
170-
}
171-
172-
# Define metadata fields
173-
metadata = list()
174-
metadata$geo_type = geo_type
175-
metadata$time_type = time_type
176-
metadata$as_of = as_of
177-
metadata = c(metadata, additional_metadata)
178-
179-
# Reorder columns (geo_value, time_value, ...)
180-
x = dplyr::relocate(x, .data$geo_value, .data$time_value)
181-
182-
# Apply epi_df class, attach metadata, and return
183-
class(x) = c("epi_df", class(x))
184-
attributes(x)$metadata = metadata
185-
return(x)
224+
new_epi_df(x, geo_type, time_type, as_of,
225+
additional_metadata = list(), ...)
186226
}
187227

188228
#' @method as_epi_df data.frame

_pkgdown.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,6 @@ reference:
6363
- incidence_num_outlier_example
6464
- contains("jhu_csse")
6565
- title: internal
66-
- contents:
66+
contents:
6767
- epiprocess
6868

0 commit comments

Comments
 (0)