Skip to content

Periodic Time Changepoint Detection: Computational Speedups & Packaging

Haoting Tan edited this page Apr 1, 2024 · 3 revisions

Background

Changepoint detection is the simplest departure from the assumption that the statistical properties of a signal remain constant over time. This is important in many real-world problems as the assumptions of stationarity is unrealistic. Changepoints have been used in many fields including: to detect degradation and restoration of climate; segment genomes and identify disease markers; and improve business and economic forecasting.

This project will take an important recent extension of changepoint methods to consider periodic time which is particularly important in health, environment and business applications where changepoints are expected to reoccur each cycle. For example, identifying an individual's general wake/sleep pattern by considering the day as periodic.

Related work

Existing packages e.g., changepoint, mosum, gfpop, all consider time to be linear. The DNAcopy package, available on Bioconductor, implements the circular binary segmentation package which has a locally circular approach rather than a global model assumption of periodic time behaviour. This would be the first package to make periodic time changepoint detection methods available.

Details of your coding project

As happens with statistical methods research, an implementation of the original (Bayesian) approach is available on GitHub but this has not been made user-friendly nor submitted to CRAN. This (Bayesian) approach has been extended to have both periodic and global changepoints in this paper.

When working with an industrial collaborator we reconigsed some limitations in the original model forms and created a frequentist approach which uses the Segment Neighbourhood and PELT multiple changepoint search methodologies.

This project will seek to take an initial implementation of the frequentist periodic time changepoint search method and

  1. Build an R package from existing research level code, including appropriate tests and documentation,
  2. Seek to speed up the code with moving the core of the algorithm from R code to C code.

There is currently no implementation of the segment neighbourhood algorithm in C. The algorithm is then embedded within a PELT implementation (also currently in R in the prototype) to assess global changes. There is an implementation of PELT within C that can be interfaced with.

If no obstacles during implementation are encountered then there may be time within the project to also implement confidence intervals for PELT (in general) and also the periodic changepoint approach. This would involve contributing to the changepoint CRAN package.

Expected impact

The package will be a dependency of the widely used changepoint package thus providing visibility to the package from an early stage. Once completed the package will be taught both within the new undergraduate changepoint module at Lancaster Unviersity (first classes Nov 2024) and added to the popular changepoint course that Rebecca Killick runs for conference courses, summer schools and privately for businesses. This will increase visibility of the output.

Periodic methods occur within a wide range of data and having a reliable, fast implementation of the approach will be of wide use to the R community across all fields.

Mentors

EVALUATING MENTOR Rebecca Killick, is Professor of Statistics at Lancaster University, UK and has 15+ years experience in developing changepoint models and R packages. Rebecca is maintainer of seven packages on CRAN, Editor-in-chief of the Journal of Statistical Software, and has been participating in GSOC as mentor since 2018. [email protected]

Owen Li, is a PhD student at Lancaster University, UK and has created the methodology that will be implemented in the R package for this project. Owen has been developing R code for six years and created one R package. [email protected]

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

  • Easy: install the existing (Bayesian) periodic code from https://github.com/taylors2/PeriodCPT and run it on some binary data. Create fully reproducible code in Rmarkdown.

  • Medium: write a function to take the (Bayesian) implementation from the easy task and identify the best location for a global changepoint in the Bayesian periodic changepoint process. Hint: use the original implementation and check each possible location for a changepoint. Return the maximum difference in the fit before and after the change.

  • Hard: Take the C code for the PELT implementation from the changepoint package on CRAN. Implement a new cost function for PELT which is the binary cost function from PeriodCPT.

Solutions of tests

Contributors, please post a link to your test results here.