-
Notifications
You must be signed in to change notification settings - Fork 8
Optimal partitioning algorithm for changepoint detection
The Optimal partitioning algorithm, arXiv paper, computes the solution to the penalized changepoint problem (best model for a given non-negative real-valued penalty parameter) in quadratic time. It is relatively straightforward to code in C, but there is no R package that provides a reference implementation.
There are many other R packages that compute optimal changepoint models.
- changepoint: provides the cpt.mean, … functions which compute the optimal solution to the penalized problem via the PELT algorithm, which is log-linear time complexity.
- Fpop::fpop computes the optimal solution to the penalized problem via the FPOP algorithm, which is also log-linear time complexity.
Code a new R package opart which implements the optimal partitioning algorithm.
- opart::opart_gaussian(data.vec, penalty) should compute the optimal changepoint model for a vector of real-valued data and a non-negative real-valued penalty, given the square loss (to minimize) / gaussian likelihood (to maximize).
- it should call a C++ function int opart_gaussian(int n_data, double *data_ptr, double penalty, double *cost_ptr, int *end_ptr), where
- return value is an error status code (0 for success).
- n_data is the number of observations.
- data_ptr is a pointer to those data values.
- penalty is the non-negative penalty constant.
- cost_ptr is a pointer to the optimal cost values of the best models from 1 to n_data.
- end_ptr is a pointer to the optimal segment ends.
- there are maximimally n_data segments, in which case end_ptr -> [ 0 1 2 … n_data-1 ].
- there is minimally 1 segment, in which case end_ptr -> [ n_data-1 -2 -2 … -2 ], where -2 is a placeholder value indicating that segment is not used in the optimal model.
- other functions that compute other optimal models if there is time.
- documentation with informative examples.
- tests for (1) correct outputs given valid inputs, and (2) informative error messages given invalid inputs.
- vignette comparing speed of opart::opart_gaussian with changepoint::cpt.mean and Fpop::fpop – plot computation time as a function of data set size.
This R package will benefit the changepoint community by providing an efficient C++ reference implementation of the standard Optimal Partitioning algorithm that has been missing in R. Its simplicity makes it easy to modify (which is not true of other algorithms such as FPOP), which will be useful when developing other changepoint models/algorithms.
Students, please contact mentors below after completing at least one of the tests below. Toby Hocking <[email protected]> and Guillem Rigaill <[email protected]> are the authors of several R packages for changepoint detection including Fpop.
Students, please do one or more of the following tests before contacting the mentors above.
- Easy: run either changepoint::cpt.mean or Fpop::fpop on one of the data sets (vector of logratio values for a given profile.id/chromosome combination) in neuroblatoma$profiles from data(neuroblastoma, package=”neuroblastoma”). For one penalty parameter, plot the data as black points and the optimal segment means as horizontal green line segments.
- Medium: for two data sets and three penalty parameters, plot the data and optimal models using a ggplot with facet_grid(segments ~ profile.id + chromosome)
- Hard: use microbenchmark to compute the timings of changepoint::cpt.mean and Fpop::fpop of several neuroblastoma data sets of different sizes (or all of them). Plot the computation time versus data set size.
Students, please post a link to your test results here.
Name : Divyansh Chawla
Email : [email protected], [email protected]
University : Indian Institute of Technology, Kanpur
Course : Integrated B.S. & M.S. in Economic Sciences (Majorly Econometrics)
Solution to Easy Test : Easy
__________________________________________________
Name: Garrett Tetrault
Email: [email protected], [email protected]
University: Cornell University, Ithaca NY
Program: Mathematics w/ concentration in Computer Science
Solution to Tests: Easy, Medium, Hard
__________________________________________________
Name: Anuraag Srivastava
Email: [email protected]
University: Northern Arizona University, Flagstaff AZ
Program: Masters of Science in Computer Science
Solution to Tests: Easy,Medium,Hard