Skip to content

Modern time series estimation

Andrea-S-8 edited this page Apr 7, 2025 · 2 revisions

Background

Time series are ubiquitous with their importance growing as more and more data are collected over high time frequencies and longer time periods. Yet the fundamental techniques that we use to analyse these time series haven't been updated since their inception in the 1970's. Recent methodological and theoretical work in this area has been revisiting these fundamental techniques and improving them with the understanding that we have now. There is a need to make these techniques available inside R.

Related work

Many of the functions such as acf, pacf, ar, arima are available in base R. There are also packages such as forecast, feasts and others available on the CRAN Task View for Time Series. The majority of these implement the traditional techniques from the 1970's, with forecast being a notable exception as it includes the banded and tapered approach from McMurry and Politis (2010).

Whilst we know that the acf and pacf estimates are biased, the implementations in R do not include bias corrections. This affects downstream analysis as the acf and pacf are typically utilized for model order estimation. Furthermore, there are no functions to enable time series researchers to efficiently create simulation studies for a wide range / random set of parameter configurations.

Details of your coding project

This project will implement the recent techniques from Gallagher et al. (2024) and Gallagher et al. (2025) for improved time series simulation studies covering the whole parameter space of time series models (and not just cherry-pick one or two parameter settings) and also bring modern estimates of the acf/pacf and consistency in AIC selection of ar models (which is not currently available).

Specifically, the project will:

  • create functions for acf and pacf estimation based on a dual shrinkage approach
  • create functions for ar estimation based on the dual shrinkage pacf
  • interface the new ar estimation function with standard methods such as AIC for consistent estimation
  • create new graphics for displaying these estimates that make clear to the user the effect of the dual shrinkage approach (as an option)
  • create functions to generate ARMA parameters that satisfy a correlation constraint (high/med/low or a fixed correlation strength)
  • create functions to generate random ARMA parameter values that cover the whole parameter space regardless of the model order
  • create a wrapper for the above two functions to generate simulation datasets for researchers in one line of code

All this functionality will require appropriate documentation, testing and a vignette for users.

Expected impact

Time series are ubiquitous and R is a popular tool for their analyses. These functions are expected to be utilised by a wide range of time series researchers and practitioners alike.

Mentors

Contributors, please contact mentors below after completing at least one of the tests below.

EVALUTATING MENTOR: Rebecca Killick [email protected], author of many R packages on CRAN & Github, and previous R-GSOC mentor for several years MENTOR: Colin Gallagher [email protected], main author of the techniques to be implemented, author of R packages on CRAN and Github.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

MENTORS: write several tests that potential contributors can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the contributors write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the contributors that apply for your project! Please modify the suggestions below to make them specific for your project.

  • Easy: download the forecast R package and compare the banded and tapered estimate (taperedacf()) with the standard acf() and pacf() estimates from base-R for a range of AR and MA processes.
  • Medium: Create a function that given a specified correlation range e.g.,c(0.6,0.8) or c(-0.4,0.3), it will generate M simulated AR(1) processes of length N that have parameters within that specified correlation range. Note that for AR(1) processes, the parameter phi is equal to the correlation.
  • Hard: Create an R package from the function from your medium test, you should have tests, documentation and your package should pass all CRAN checks (but do not submit to CRAN!). Use the --as-cran to run the CRAN checks.

Solutions of tests

Contributors, please post a link to your test results here.

Clone this wiki locally