Releases: cmu-delphi/epiprocess
Releases · cmu-delphi/epiprocess
epiprocess 0.7.0
Breaking changes:
- Changes to
epi_slide
andepix_slide
:- If
f
is a function, it is now required to take at least three arguments.
f
must take anepi_df
with the same column names as the archive'sDT
,
minus theversion
column; followed by a one-row tibble containing the
values of the grouping variables for the associated group; followed by a
reference time value, usually as aDate
object. Optionally, it can take
any number of additional arguments after that, and forward values for those
arguments throughepi[x]_slide
's...
args.- To make your existing slide computations work, add a third argument to
yourf
function to accept this new input: e.g., changef = function(x, g, <any other arguments>) { <body> }
tof = function(x, g, rt, <any other arguments>) { <body> }
.
- To make your existing slide computations work, add a third argument to
- If
New features:
epi_slide
andepix_slide
also make the window data, group key and
reference time value available to slide computations specified as formulas or
tidy evaluation expressions, in additional or completely new ways.- If
f
is a formula, it can now access the reference time value via.z
or
.ref_time_value
. - If
f
is missing, the tidy evaluation expression in...
can now refer to
the window data as anepi_df
ortibble
with.x
, the group key with
.group_key
, and the reference time value with.ref_time_value
. The usual
.data
and.env
pronouns also work, butpick()
andcur_data()
are not;
work off of.x
instead.
- If
epix_slide
has been made more likedplyr::group_modify
. It will no longer
perform element/row recycling for size stability, accepts slide computation
outputs containing any number of rows, and no longer supportsall_rows
.- To keep the old behavior, manually perform row recycling within
f
computations, and/orleft_join
a data frame representing the desired
output structure with the currentepix_slide()
result to obtain the
desired repetitions and completions expected withall_rows = TRUE
.
- To keep the old behavior, manually perform row recycling within
epix_slide
will only output grouped or ungrouped tibbles. Previously, it
would sometimes outputepi_df
s, but not consistently, and not always with
the metadata desired. Future versions will revisit this design, and consider
more closely whether/when/how to output anepi_df
.- To keep the old behavior, convert the output of
epix_slide()
toepi_df
when desired and set the metadata appropriately.
- To keep the old behavior, convert the output of
Improvements:
epi_slide
andepix_slide
now supportas_list_col = TRUE
when the slide
computations output atomic vectors, and output a list column in "chopped"
format (seetidyr::chop
).epi_slide
now works properly with slide computations that output just a
Date
vector, rather than convertingslide_value
to a numeric column.- Fix
?archive_cases_dv_subset
information regarding modifications of upstream
data by @brookslogan in (#299). - Update to use updated
epidatr
(fetch_tbl
->fetch
) by @brookslogan in
(#319).
New Contributors
Full Changelog: v0.6.0...v0.7.0
epiprocess 0.6.0
Breaking changes
- Changes to both
epi_slide
andepix_slide
- The
n
,align
, andbefore
arguments have been replaced by new
before
andafter
arguments. To migrate to the new version, replace
these arguments in everyepi_slide
andepix_slide
call. If you were
only using then
argument, then this means replacingn = <n value>
withbefore = <n value> - 1
. epi_slide
's time windows now extendbefore
time steps before and
after
time steps after the correspondingref_time_values
. See
?epi_slide
for details on matching old alignments.epix_slide
's time windows now extendbefore
time steps before the
correspondingref_time_values
all the way through the latest data
available at the correspondingref_time_values
.- Slide functions now keep any grouping of
x
in their results, like
mutate
andgroup_modify
.- To obtain the old behavior,
dplyr::ungroup
the slide results immediately.
- To obtain the old behavior,
- The
- Additional
epi_slide
changes:- When using
as_list_col = TRUE
together withref_time_values
and
all_rows=TRUE
, the marker for excluded computations is now aNULL
entry
in the list column, rather than aNA
; if you are usingtidyr::unnest()
afterward and want to keep these missing data markers, you will need to
replace theNULL
entries withNA
s. Skipped computations are now more
uniformly detectable usingvctrs
methods.
- When using
- Additional
epix_slide
changes:epix_slide
'sgroup_by
argument has been replaced bydplyr::group_by
and
dplyr::ungroup
S3 methods. Thegroup_by
method uses "data masking" (also
referred to as "tidy evaluation") rather than "tidy selection".- Old syntax:
x %>% epix_slide(<other args>, group_by=c(col1, col2))
x %>% epix_slide(<other args>, group_by=all_of(colname_vector))
- New syntax:
x %>% group_by(col1, col2) %>% epix_slide(<other args>)
x %>% group_by(across(all_of(colname_vector))) %>% epix_slide(<other args>)
- Old syntax:
epix_slide
no longer defaults to grouping by non-time_value
, non-version
key columns, instead considering all data to be in one big group.- To obtain the old behavior, precede each
epix_slide
call lacking a
group_by
argument with an appropriategroup_by
call.
- To obtain the old behavior, precede each
epix_slide
now guessesref_time_values
to be a regularly spaced sequence
covering all theDT$version
values and theversion_end
, rather than the
distinctDT$time_value
s. To obtain the old behavior, pass in
ref_time_values = unique(<ungrouped archive>$DT$time_value)
.
epi_archive
'sclobberable_versions_start
's default is nowNA
, so there
will be no warnings by default about potential nonreproducibility. To obtain
the old behavior, pass inclobberable_versions_start = max_version_with_row_in(x)
.
Potentially-breaking changes
- Fixed
[
on groupedepi_df
s to maintain the grouping if possible when
dropping theepi_df
class (e.g., when removing thetime_value
column). - Fixed
epi_df
operations to be more consistent about decaying into
non-epi_df
s when the result of the operation doesn't make sense as an
epi_df
(e.g., when removing thetime_value
column). - Changed
bind_rows
on groupedepi_df
s to not drop theepi_df
class. Like
with ungroupedepi_df
s, the metadata of the result is still simply taken
from the first result, and may be inappropriate
(#242). epi_slide
andepix_slide
now raise an error rather than silently filtering
outref_time_values
that don't meet their expectations.
New features
epix_slide
,<epi_archive>$slide
have a new parameterall_versions
. With
all_versions=TRUE
,epix_slide
will pass a filteredepi_archive
to each
computation rather than anepi_df
snapshot. This enables, e.g., performing
pseudoprospective forecasts with a revision-aware forecaster using nested
epix_slide
operations.
Improvements
- Added
dplyr::group_by
anddplyr::ungroup
S3 methods forepi_archive
objects, plus corresponding$group_by
and$ungroup
R6 methods. The
group_by
implementation supports the.add
and.drop
arguments, and
ungroup
supports partial ungrouping with...
. as_epi_archive
,epi_archive$new
now perform checks for the key uniqueness
requirement (part of
#154).
Cleanup
- Added a
NEWS.md
file to track changes to the package. - Implemented
?dplyr::dplyr_extending
forepi_df
s
(#223). - Fixed various small documentation issues (#217).
Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.6.0
epiprocess 0.5.0
Potentially-breaking changes
epix_slide
,<epi_archive>$slide
now feedf
anepi_df
rather than
converting to a tibble/tbl_df
first, allowing use ofepi_df
methods and
metadata, and often yieldingepi_df
s out of the slide as a result. To obtain
the old behavior, convert to a tibble withinf
.
Improvements
- Fixed
epix_merge
,<epi_archive>$merge
always raising error onsync="truncate"
.
Cleanup
- Added
Remotes:
entry forgenlasso
, which was removed from CRAN. - Added
as_epi_archive
tests. - Added missing
epix_merge
test forsync="truncate"
.
Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.5.0
epiprocess 0.4.0
Potentially-breaking changes
- Fixed
[.epi_df
to not reorder columns, which was incompatible with
downstream packages. - Changed
[.epi_df
decay-to-tibble logic to more coherent withepi_df
s
current tolerance of nonunique keys: stopped decaying to a tibble in some
cases where a unique key wouldn't have been preserved, since we don't
enforce a unique key elsewhere. - Fixed
[.epi_df
to adjust"other_keys"
metadata when corresponding
columns are selected out. - Fixed
[.epi_df
to raise an error if resulting column names would be
nonunique. - Fixed
[.epi_df
to drop metadata if decaying to a tibble (due to removal
of essential columns).
Improvements
- Added check that
epi_df
additional_metadata
is list. - Fixed some incorrect
as_epi_df
examples.
Cleanup
- Applied rename of upstream package in examples:
delphi.epidata
->
epidatr
. - Rounded out
[.epi_df
tests.
Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.4.0
epiprocess 0.3.0
Breaking changes
as_epi_archive
,epi_archive$new
:- Compactification (see below) by default may change results if working
directly with theepi_archive
'sDT
field; to disable, pass in
compactify=FALSE
.
- Compactification (see below) by default may change results if working
epi_archive
's wrappers and R6 methods have been updated to follow these
rules regarding reference semantics:epix_<method>
will not mutate inputepi_archive
s, but may alias them
or alias their fields (which should not be a worry if a user sticks to
theseepix_*
functions and "regular" R functions with
copy-on-write-like behavior, avoiding mutating functions[.data.table
).x$<method>
may mutatex
; if it mutatesx
, it will returnx
invisibly (where this makes sense), and, for each of its fields, may
either mutate the object to which it refers or reseat the reference (but
not both); ifx$<method>
does not mutatex
, its result may contain
aliases tox
or its fields.
epix_merge
,<epi_archive>$merge
:- Removed
...
,locf
, andnan
parameters. - Changed the default behavior, which now corresponds to using
by=key(x$DT)
(but demanding that is the same set of column names as
key(y$DT)
),all=TRUE
,locf=TRUE
,nan=NaN
(but with the
post-filling step fixed to only apply to gaps, and no longer fill over
NA
s originating fromx$DT
andy$DT
). x
andy
are no longer allowed to share names of non-by
columns.epix_merge
no longer mutates itsx
argument (but$merge
continues
to do so).- Removed (undocumented) capability of passing a
data.table
asy
.
- Removed
epix_slide
:- Removed inappropriate/misleading
n=7
default argument (due to
reporting latency,n=7
will not yield 7 days of data in a typical
daily-reporting surveillance data source, as one might have assumed).
- Removed inappropriate/misleading
New features
as_epi_archive
,epi_archive$new
:- New
compactify
parameter allows removal of rows that are redundant for the
purposes ofepi_archive
's methods, which use the last version of each
observation carried forward. - New
clobberable_versions_start
field allows marking a range of versions
that could be "clobbered" (rewritten without assigning new version
tags); previously, this was hard-coded asmax(<epi_archive>$DT$version)
. - New
versions_end
field allows marking a range of versions beyond
max(<epi_archive>$DT$version)
that were observed, but contained no
changes.
- New
epix_merge
,$merge
:- New
sync
parameter controls what to do ifx
andy
aren't equally
up to date (i.e., ifx$versions_end
andy$versions_end
are
different).
- New
- New function
epix_fill_through_version
, method
<epi_archive>$fill_through_version
: non-mutating & mutating way to
ensure that an archive contains versions at least through some
fill_versions_end
, extrapolating according tohow
if necessary. - Example archive data object is now constructed on demand from its
underlying data, so it will be based on the user's version of
epi_archive
rather than an outdated R6 implementation from whenever the
data object was generated.
Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.3.0
epiprocess 0.2.0
Breaking changes
- Removed default
n=7
argument toepix_slide
.
Improvements
- Ignore
NA
s when printingtime_value
range for anepi_archive
. - Fixed misleading column naming in
epix_slide
example. - Trimmed down
epi_slide
examples. - Synced out-of-date docs.
Cleanup
- Removed dependency of some
epi_archive
tests on an example archive.
object, and made them more understandable by reading without running. - Fixed
epi_df
tests relying on an S3 method forepi_df
implemented
externally toepiprocess
. - Added tests for
epi_archive
methods and wrapper functions. - Removed some dead code.
- Made
.{Rbuild,git}ignore
files more comprehensive.
Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.2.0
epiprocess 0.1.2
New features
- New
new_epi_df
function is similar toas_epi_df
, but (i) recalculates,
overwrites, and/or drops most metadata ofx
if it has any, (ii) may
still reorder the columns ofx
even if it's already anepi_df
, and
(iii) treatsx
as optional, constructing an emptyepi_df
by default.
Improvements
- Fixed
geo_type
guessing on alphabetical strings with more than 2
characters to yield"custom"
, not US"nation"
. - Fixed
time_type
guessing to actually detectDate
-classtime_value
s
regularly spaced 7 days apart as"week"
-type as intended. - Improved printing of
epi_df
s,epi_archives
s. - Fixed
as_of
to not cut off any (forecast-like) data withtime_value > max_version
. - Expanded
epi_df
docs to include conversion fromtsibble
/tbl_ts
objects,
usage ofother_keys
, and pre-processing objects not following the
geo_value
,time_value
naming scheme. - Expanded
epi_slide
examples to show how to use anf
argument with
named parameters. - Updated examples to print relevant columns given a common 80-column
terminal width. - Added growth rate examples.
- Improved
as_epi_archive
andepi_archive$new
/$initialize
documentation, including constructing a toy archive.
Cleanup
- Added tests for
epi_slide
,epi_cor
, and internal utility functions. - Fixed currently-unused internal utility functions
MiddleL
,MiddleR
to
yield correct results on odd-length vectors.
New Contributors
- @rachlobay made their first contribution in #116
- @kenmawer made their first contributions
Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.1.2
epiprocess 0.1.1
New features
- New example data objects allow one to quickly experiment with
epi_df
s
andepi_archives
without relying/waiting on an API to fetch data.
Improvements
- Improved
epi_slide
error messaging. - Fixed description of the appropriate parameters for an
f
argument to
epi_slide
; previous description would give incorrect behavior iff
had
named parameters that did not receive values fromepi_slide
's...
. - Added some examples throughout the package.
- Using example data objects in vignettes also speeds up vignette compilation.
Cleanup
- Set up gh-actions CI.
- Added tests for
epi_df
s.
New Contributors
Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.1.1
epiprocess 0.1.0
Implemented core functionality, vignettes
- Classes
epi_df
: specializedtbl_df
for geotemporal epidemiological time
series data, with optional metadata recording other key columns (e.g.,
demographic breakdowns) andas_of
what time/version this data was
current/published. Associated functions:as_epi_df
converts to anepi_df
, guessing thegeo_type
,
time_type
,other_keys
, andas_of
if not specified.as_epi_df.tbl_ts
andas_tsibble.epi_df
automatically set
other_keys
andkey
&index
, respectively.epi_slide
applies a user-supplied computation to a sliding/rolling
time window and user-specified groups, adding the results as new
columns, and recycling/broadcasting results to keep the result size
stable. Allows computation to be provided as a function,purrr
-style
formula, or tidyeval dots. Usesslider
underneath for efficiency.epi_cor
calculates Pearson, Kendall, or Spearman correlations
between two (optionally time-shifted) variables in anepi_df
within
user-specified groups.- Convenience function:
is_epi_df
.
epi_archive
: R6 class for version (patch) data for geotemporal
epidemiological time series data sets. Comes with S3 methods and regular
functions that wrap around this functionality for those unfamiliar with R6
methods. Associated functions:as_epi_archive
: prepares anepi_archive
object from a data frame
containing snapshots and/or patch data for every available version of
the data set.as_of
: extracts a snapshot of the data set as of some requested
version, inepi_df
format.epix_slide
,<epi_archive>$slide
: similar toepi_slide
, but for
epi_archive
s; for each requestedref_time_value
and group, applies
a time window and user-specified computation to a snapshot of the data
as ofref_time_value
.epix_merge
,<epi_archive>$merge
: likemerge
forepi_archive
s,
but allowing for the last version of each observation to be carried
forward to fill in gaps inx
ory
.- Convenience function:
is_epi_archive
.
- Additional functions
growth_rate
: estimates growth rate of a time series using one of a few
built-inmethod
s based on relative change, linear regression,
smoothing splines, or trend filtering.detect_outlr
: applies one or more outlier detection methods to a given
signal variable, and optionally aggregates the outputs to create a
consensus result.detect_outlr_rm
: outlier detection function based on a
rolling-median-based outlier detection function; one of the methods
included indetect_outlr
.detect_outlr_stl
: outlier detection function based on a seasonal-trend
decomposition using LOESS (STL); one of the methods included in
detect_outlr
.
New Contributors
- @ryantibs made their first contribution in #13
- @elray1 made their first contribution in #19
- @qpmnguyen made their first contribution in #37
- @jacobbien made their first contribution in #44
- @rafaelcatoia made their first contribution in #43
- @dshemetov made their first contribution in #59
Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.1.0