Skip to content

BestPractices

jslanini edited this page Jun 14, 2019 · 13 revisions

Forecasting Best Practices

Data

Which data should I include?

A primary goal behind PyForecast development is to eliminate unnecessary subjectivity and use reproducible, statistically defensible methods for forecast development. To this end, PyForecast uses a search algorithm to determine "best" forecasting models. Also, DelSole and Shukla, 2009 discovered that variable pre-screening based on correlation with a prediction variable can result in model bias and artificial skill. We recommend erring toward inclusiveness rather than eliminating variables based on subjectivity.

However, it may be useful to eliminate related but not identical variables (Helsel and Hirsch, 2002). For example, air temperature and dewpoint temperature may describe nearly the same relationship but are slightly different.

How many years of data should I use?

Stationarity?

Should I use future variables?

Model building

Should I create a new model every year?

How do I know if my assumption of linearity is true?

Which regression technique should I select?

Which model skill metric should I use?

What cross-validation method should I use?

Probability density functions

How many models should I include in my density analysis?

How should I select models for inclusion in my density analysis?

Clone this wiki locally