Skip to content

Releases: cmu-delphi/epiprocess

epiprocess 0.7.0

30 Nov 20:54
e7a4dfa
Compare
Choose a tag to compare

Breaking changes:

  • Changes to epi_slide and epix_slide:
    • If f is a function, it is now required to take at least three arguments.
      f must take an epi_df with the same column names as the archive's DT,
      minus the version column; followed by a one-row tibble containing the
      values of the grouping variables for the associated group; followed by a
      reference time value, usually as a Date object. Optionally, it can take
      any number of additional arguments after that, and forward values for those
      arguments through epi[x]_slide's ... args.
      • To make your existing slide computations work, add a third argument to
        your f function to accept this new input: e.g., change f = function(x, g, <any other arguments>) { <body> } to f = function(x, g, rt, <any other arguments>) { <body> }.

New features:

  • epi_slide and epix_slide also make the window data, group key and
    reference time value available to slide computations specified as formulas or
    tidy evaluation expressions, in additional or completely new ways.
    • If f is a formula, it can now access the reference time value via .z or
      .ref_time_value.
    • If f is missing, the tidy evaluation expression in ... can now refer to
      the window data as an epi_df or tibble with .x, the group key with
      .group_key, and the reference time value with .ref_time_value. The usual
      .data and .env pronouns also work, butpick() and cur_data() are not;
      work off of .x instead.
  • epix_slide has been made more like dplyr::group_modify. It will no longer
    perform element/row recycling for size stability, accepts slide computation
    outputs containing any number of rows, and no longer supports all_rows.
    • To keep the old behavior, manually perform row recycling within f
      computations, and/or left_join a data frame representing the desired
      output structure with the current epix_slide() result to obtain the
      desired repetitions and completions expected with all_rows = TRUE.
  • epix_slide will only output grouped or ungrouped tibbles. Previously, it
    would sometimes output epi_dfs, but not consistently, and not always with
    the metadata desired. Future versions will revisit this design, and consider
    more closely whether/when/how to output an epi_df.
    • To keep the old behavior, convert the output of epix_slide() to epi_df
      when desired and set the metadata appropriately.

Improvements:

  • epi_slide and epix_slide now support as_list_col = TRUE when the slide
    computations output atomic vectors, and output a list column in "chopped"
    format (see tidyr::chop).
  • epi_slide now works properly with slide computations that output just a
    Date vector, rather than converting slide_value to a numeric column.
  • Fix ?archive_cases_dv_subset information regarding modifications of upstream
    data by @brookslogan in (#299).
  • Update to use updated epidatr (fetch_tbl -> fetch) by @brookslogan in
    (#319).

New Contributors

Full Changelog: v0.6.0...v0.7.0

epiprocess 0.6.0

15 Nov 23:23
Compare
Choose a tag to compare

Breaking changes

  • Changes to both epi_slide and epix_slide
    • The n, align, and before arguments have been replaced by new
      before and after arguments. To migrate to the new version, replace
      these arguments in every epi_slide and epix_slide call. If you were
      only using the n argument, then this means replacing n = <n value>
      with before = <n value> - 1.
    • epi_slide's time windows now extend before time steps before and
      after time steps after the corresponding ref_time_values. See
      ?epi_slide for details on matching old alignments.
    • epix_slide's time windows now extend before time steps before the
      corresponding ref_time_values all the way through the latest data
      available at the corresponding ref_time_values.
    • Slide functions now keep any grouping of x in their results, like
      mutate and group_modify.
      • To obtain the old behavior, dplyr::ungroup the slide results immediately.
  • Additional epi_slide changes:
    • When using as_list_col = TRUE together with ref_time_values and
      all_rows=TRUE, the marker for excluded computations is now a NULL entry
      in the list column, rather than a NA; if you are using tidyr::unnest()
      afterward and want to keep these missing data markers, you will need to
      replace the NULL entries with NAs. Skipped computations are now more
      uniformly detectable using vctrs methods.
  • Additionalepix_slide changes:
    • epix_slide's group_by argument has been replaced by dplyr::group_by and
      dplyr::ungroup S3 methods. The group_by method uses "data masking" (also
      referred to as "tidy evaluation") rather than "tidy selection".
      • Old syntax:
        • x %>% epix_slide(<other args>, group_by=c(col1, col2))
        • x %>% epix_slide(<other args>, group_by=all_of(colname_vector))
      • New syntax:
        • x %>% group_by(col1, col2) %>% epix_slide(<other args>)
        • x %>% group_by(across(all_of(colname_vector))) %>% epix_slide(<other args>)
    • epix_slide no longer defaults to grouping by non-time_value, non-version
      key columns, instead considering all data to be in one big group.
      • To obtain the old behavior, precede each epix_slide call lacking a
        group_by argument with an appropriate group_by call.
    • epix_slide now guesses ref_time_values to be a regularly spaced sequence
      covering all the DT$version values and the version_end, rather than the
      distinct DT$time_values. To obtain the old behavior, pass in
      ref_time_values = unique(<ungrouped archive>$DT$time_value).
  • epi_archive's clobberable_versions_start's default is now NA, so there
    will be no warnings by default about potential nonreproducibility. To obtain
    the old behavior, pass in clobberable_versions_start = max_version_with_row_in(x).

Potentially-breaking changes

  • Fixed [ on grouped epi_dfs to maintain the grouping if possible when
    dropping the epi_df class (e.g., when removing the time_value column).
  • Fixed epi_df operations to be more consistent about decaying into
    non-epi_dfs when the result of the operation doesn't make sense as an
    epi_df (e.g., when removing the time_value column).
  • Changed bind_rows on grouped epi_dfs to not drop the epi_df class. Like
    with ungrouped epi_dfs, the metadata of the result is still simply taken
    from the first result, and may be inappropriate
    (#242).
  • epi_slide and epix_slide now raise an error rather than silently filtering
    out ref_time_values that don't meet their expectations.

New features

  • epix_slide, <epi_archive>$slide have a new parameter all_versions. With
    all_versions=TRUE, epix_slide will pass a filtered epi_archive to each
    computation rather than an epi_df snapshot. This enables, e.g., performing
    pseudoprospective forecasts with a revision-aware forecaster using nested
    epix_slide operations.

Improvements

  • Added dplyr::group_by and dplyr::ungroup S3 methods for epi_archive
    objects, plus corresponding $group_by and $ungroup R6 methods. The
    group_by implementation supports the .add and .drop arguments, and
    ungroup supports partial ungrouping with ....
  • as_epi_archive, epi_archive$new now perform checks for the key uniqueness
    requirement (part of
    #154).

Cleanup

  • Added a NEWS.md file to track changes to the package.
  • Implemented ?dplyr::dplyr_extending for epi_dfs
    (#223).
  • Fixed various small documentation issues (#217).

Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.6.0

epiprocess 0.5.0

15 Nov 23:05
Compare
Choose a tag to compare

Potentially-breaking changes

  • epix_slide, <epi_archive>$slide now feed f an epi_df rather than
    converting to a tibble/tbl_df first, allowing use of epi_df methods and
    metadata, and often yielding epi_dfs out of the slide as a result. To obtain
    the old behavior, convert to a tibble within f.

Improvements

  • Fixed epix_merge, <epi_archive>$merge always raising error on sync="truncate".

Cleanup

  • Added Remotes: entry for genlasso, which was removed from CRAN.
  • Added as_epi_archive tests.
  • Added missing epix_merge test for sync="truncate".

Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.5.0

epiprocess 0.4.0

15 Nov 23:01
Compare
Choose a tag to compare

Potentially-breaking changes

  • Fixed [.epi_df to not reorder columns, which was incompatible with
    downstream packages.
  • Changed [.epi_df decay-to-tibble logic to more coherent with epi_dfs
    current tolerance of nonunique keys: stopped decaying to a tibble in some
    cases where a unique key wouldn't have been preserved, since we don't
    enforce a unique key elsewhere.
  • Fixed [.epi_df to adjust "other_keys" metadata when corresponding
    columns are selected out.
  • Fixed [.epi_df to raise an error if resulting column names would be
    nonunique.
  • Fixed [.epi_df to drop metadata if decaying to a tibble (due to removal
    of essential columns).

Improvements

  • Added check that epi_df additional_metadata is list.
  • Fixed some incorrect as_epi_df examples.

Cleanup

  • Applied rename of upstream package in examples: delphi.epidata ->
    epidatr.
  • Rounded out [.epi_df tests.

Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.4.0

epiprocess 0.3.0

15 Nov 22:59
Compare
Choose a tag to compare

Breaking changes

  • as_epi_archive, epi_archive$new:
    • Compactification (see below) by default may change results if working
      directly with the epi_archive's DT field; to disable, pass in
      compactify=FALSE.
  • epi_archive's wrappers and R6 methods have been updated to follow these
    rules regarding reference semantics:
    • epix_<method> will not mutate input epi_archives, but may alias them
      or alias their fields (which should not be a worry if a user sticks to
      these epix_* functions and "regular" R functions with
      copy-on-write-like behavior, avoiding mutating functions [.data.table).
    • x$<method> may mutate x; if it mutates x, it will return x
      invisibly (where this makes sense), and, for each of its fields, may
      either mutate the object to which it refers or reseat the reference (but
      not both); if x$<method> does not mutate x, its result may contain
      aliases to x or its fields.
  • epix_merge, <epi_archive>$merge:
    • Removed ..., locf, and nan parameters.
    • Changed the default behavior, which now corresponds to using
      by=key(x$DT) (but demanding that is the same set of column names as
      key(y$DT)), all=TRUE, locf=TRUE, nan=NaN (but with the
      post-filling step fixed to only apply to gaps, and no longer fill over
      NAs originating from x$DT and y$DT).
    • x and y are no longer allowed to share names of non-by columns.
    • epix_merge no longer mutates its x argument (but $merge continues
      to do so).
    • Removed (undocumented) capability of passing a data.table as y.
  • epix_slide:
    • Removed inappropriate/misleading n=7 default argument (due to
      reporting latency, n=7 will not yield 7 days of data in a typical
      daily-reporting surveillance data source, as one might have assumed).

New features

  • as_epi_archive, epi_archive$new:
    • New compactify parameter allows removal of rows that are redundant for the
      purposes of epi_archive's methods, which use the last version of each
      observation carried forward.
    • New clobberable_versions_start field allows marking a range of versions
      that could be "clobbered" (rewritten without assigning new version
      tags); previously, this was hard-coded as max(<epi_archive>$DT$version).
    • New versions_end field allows marking a range of versions beyond
      max(<epi_archive>$DT$version) that were observed, but contained no
      changes.
  • epix_merge, $merge:
    • New sync parameter controls what to do if x and y aren't equally
      up to date (i.e., if x$versions_end and y$versions_end are
      different).
  • New function epix_fill_through_version, method
    <epi_archive>$fill_through_version: non-mutating & mutating way to
    ensure that an archive contains versions at least through some
    fill_versions_end, extrapolating according to how if necessary.
  • Example archive data object is now constructed on demand from its
    underlying data, so it will be based on the user's version of
    epi_archive rather than an outdated R6 implementation from whenever the
    data object was generated.

Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.3.0

epiprocess 0.2.0

15 Nov 22:59
Compare
Choose a tag to compare

Breaking changes

  • Removed default n=7 argument to epix_slide.

Improvements

  • Ignore NAs when printing time_value range for an epi_archive.
  • Fixed misleading column naming in epix_slide example.
  • Trimmed down epi_slide examples.
  • Synced out-of-date docs.

Cleanup

  • Removed dependency of some epi_archive tests on an example archive.
    object, and made them more understandable by reading without running.
  • Fixed epi_df tests relying on an S3 method for epi_df implemented
    externally to epiprocess.
  • Added tests for epi_archive methods and wrapper functions.
  • Removed some dead code.
  • Made .{Rbuild,git}ignore files more comprehensive.

Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.2.0

epiprocess 0.1.2

15 Nov 22:56
Compare
Choose a tag to compare

New features

  • New new_epi_df function is similar to as_epi_df, but (i) recalculates,
    overwrites, and/or drops most metadata of x if it has any, (ii) may
    still reorder the columns of x even if it's already an epi_df, and
    (iii) treats x as optional, constructing an empty epi_df by default.

Improvements

  • Fixed geo_type guessing on alphabetical strings with more than 2
    characters to yield "custom", not US "nation".
  • Fixed time_type guessing to actually detect Date-class time_values
    regularly spaced 7 days apart as "week"-type as intended.
  • Improved printing of epi_dfs, epi_archivess.
  • Fixed as_of to not cut off any (forecast-like) data with time_value > max_version.
  • Expanded epi_df docs to include conversion from tsibble/tbl_ts objects,
    usage of other_keys, and pre-processing objects not following the
    geo_value, time_value naming scheme.
  • Expanded epi_slide examples to show how to use an f argument with
    named parameters.
  • Updated examples to print relevant columns given a common 80-column
    terminal width.
  • Added growth rate examples.
  • Improved as_epi_archive and epi_archive$new/$initialize
    documentation, including constructing a toy archive.

Cleanup

  • Added tests for epi_slide, epi_cor, and internal utility functions.
  • Fixed currently-unused internal utility functions MiddleL, MiddleR to
    yield correct results on odd-length vectors.

New Contributors

Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.1.2

epiprocess 0.1.1

15 Nov 22:46
Compare
Choose a tag to compare

New features

  • New example data objects allow one to quickly experiment with epi_dfs
    and epi_archives without relying/waiting on an API to fetch data.

Improvements

  • Improved epi_slide error messaging.
  • Fixed description of the appropriate parameters for an f argument to
    epi_slide; previous description would give incorrect behavior if f had
    named parameters that did not receive values from epi_slide's ....
  • Added some examples throughout the package.
  • Using example data objects in vignettes also speeds up vignette compilation.

Cleanup

  • Set up gh-actions CI.
  • Added tests for epi_dfs.

New Contributors

Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.1.1

epiprocess 0.1.0

15 Nov 22:21
Compare
Choose a tag to compare

Implemented core functionality, vignettes

  • Classes
    • epi_df: specialized tbl_df for geotemporal epidemiological time
      series data, with optional metadata recording other key columns (e.g.,
      demographic breakdowns) and as_of what time/version this data was
      current/published. Associated functions:
      • as_epi_df converts to an epi_df, guessing the geo_type,
        time_type, other_keys, and as_of if not specified.
      • as_epi_df.tbl_ts and as_tsibble.epi_df automatically set
        other_keys and key&index, respectively.
      • epi_slide applies a user-supplied computation to a sliding/rolling
        time window and user-specified groups, adding the results as new
        columns, and recycling/broadcasting results to keep the result size
        stable. Allows computation to be provided as a function, purrr-style
        formula, or tidyeval dots. Uses slider underneath for efficiency.
      • epi_cor calculates Pearson, Kendall, or Spearman correlations
        between two (optionally time-shifted) variables in an epi_df within
        user-specified groups.
      • Convenience function: is_epi_df.
    • epi_archive: R6 class for version (patch) data for geotemporal
      epidemiological time series data sets. Comes with S3 methods and regular
      functions that wrap around this functionality for those unfamiliar with R6
      methods. Associated functions:
      • as_epi_archive: prepares an epi_archive object from a data frame
        containing snapshots and/or patch data for every available version of
        the data set.
      • as_of: extracts a snapshot of the data set as of some requested
        version, in epi_df format.
      • epix_slide, <epi_archive>$slide: similar to epi_slide, but for
        epi_archives; for each requested ref_time_value and group, applies
        a time window and user-specified computation to a snapshot of the data
        as of ref_time_value.
      • epix_merge, <epi_archive>$merge: like merge for epi_archives,
        but allowing for the last version of each observation to be carried
        forward to fill in gaps in x or y.
      • Convenience function: is_epi_archive.
  • Additional functions
    • growth_rate: estimates growth rate of a time series using one of a few
      built-in methods based on relative change, linear regression,
      smoothing splines, or trend filtering.
    • detect_outlr: applies one or more outlier detection methods to a given
      signal variable, and optionally aggregates the outputs to create a
      consensus result.
    • detect_outlr_rm: outlier detection function based on a
      rolling-median-based outlier detection function; one of the methods
      included in detect_outlr.
    • detect_outlr_stl: outlier detection function based on a seasonal-trend
      decomposition using LOESS (STL); one of the methods included in
      detect_outlr.

New Contributors

Full Changelog: https://github.com/cmu-delphi/epiprocess/commits/v0.1.0