The fforma package provides tools for forecasting using a model combination approach. It can be used for model averaging or model selection. It works by training a ‘classifier’ that learns to select/combine different forecast models.
More information about metalearning for forecasting, read/cite the paper:
This package came out of the FFORMA method presented to the M4 forecasting competition, but has been improved and no longer can be used for reproducing results. For exact reproducibilty of results, check the M4metalearning github repo. For empirical performance, the fforma package, not M4metalearning, should be used.
Temporarily, as a workaround, a custom version of the xgboost
package
is required. You may install it manually from:
# install.packages("devtools")
devtools::install_github("pmontman/customxgboost")
Please note this if you use the latest version of the real xgboost package, it will be overwritten. We will patch this problem/workaround as soon as posssible.
Then the package can then be installed:
#install.packages("devtools")
devtools::install_github("pmontman/fforma")
The package can be used easily with two main functions:
train_metalearning
and forecast_metalearning
. Both functions work on
a list
of elements with a particular data structure: A time series and
some meta-data. Basically, each element in this list has at least the
component $x
with a time series as a ts
object, which is the series
we want to forecast.
The train_metalearning
function will look for the component $h
in
the elements of the input list, where $h
represents the desired
prediction horizon. If not found, it will consider h
to one seasonal
period of the series. Then it substracts h
observations from the time
series $x
and sets them as true future values in the component $xx
.
This process is named temporal holdout. If the series in the training
set have the $xx
component, FFORMA will use it instead of removing the
last h
observations of each series.
Then the metalearning model is trained (takes a bit of time, see
Paralellism section below). The output of the train_metalearning
are
the components: the metalearning model, the training dataset (after the
temporal holdout) and the information about the training process. This
output of the training process can be used to forecast with the
forecast_metalearning
function.
In the example, we will use a dataset of time series from the Mcomp
package as training set, which already follows the required format (a
list with elements having the $x
. Additionally the $h
is provided).
set.seed(1234)
library(fforma)
#The dataset of time series we want to forecast
ts_dataset <- Mcomp::M3[sample(length(Mcomp::M3), 30)]
#train!
fforma_fit <- train_metalearning(ts_dataset)
The forecast_ metalearning
takes a metaleaning model (the output of
train_metalearning
or equivalent) and a dataset of time series we want
to forecast. This dataset is a list in the same format, though now the
$h
component if necessary, not optional. The dataset for forecasting
can be the same as the one used for training (since it uses
crossvalidation by temporal holdout for training).
fforma_forec <- forecast_metalearning(fforma_fit, ts_dataset)
Thats’ it, two lines of code! If the dataset we forecast has the $xx
component in its elements, fforma will use it as the ‘true’ future
values of each series $x
and calculate the OWA, MASE and SMAPE
forecast errors.
forecast_metalearning
outputs a dataset of time series similar to its
input, but with the added forecasts in the component $ff_meta_avg
of
each element of the list.
#get the forecasts of the first series
fforma_forec$dataset[[1]]$ff_meta_avg
FFORMA learns to combine individual forecast models. By default it uses
a set of forecasting models implemented in the forecast R package, such
as auto.arima
, ets
, thetaf
, etc. The set of methods that FFORMA
learns to combine is passed in the forec_methods
argument to the
train_metalearning
function. This argument should be a list of
strings. Each string is the list shoudl coincide with the name of an
existing forecasting function. This forecasting function is a simple
function with two arguments: a ts object and the number of forecast
horizons. When forecasting, FFORMA assumes that the custom functions
used for training are available in the environment. To illustrate this
process, we will fully customize fforma to learn to combine two forecast
models of our own design, one forecasting the mean of the series and the
other outputting zeroes.
#a function that takes x, a ts() and h an integer with the desired forecast horizon
my_mean_forec <- function(x, h) {
rep(mean(x), h)
}
my_zero_forec <- function(x,h) {
rep(0, h)
}
#a list of strings, each the name of the forecasting function
list_of_methods <- list("my_mean_forec", "my_zero_forec")
#call fforma with the customized forecast functions
custom_fforma_fit <- train_metalearning(ts_dataset, forec_methods=list_of_methods)
#the actual forecasting from the customized fforma
custom_fforma_forec <- forecast_metalearning(custom_fforma_fit, ts_dataset)
#errors should be quite hight
custom_fforma_forec$owa_errors
Forecasting with FFORMA can take a bit of time depending of the
individual models that are going to be combined for forecasting and the
size of the dataset. Parallelism through the future
package is
provided and the processing can be periodically saved to disk and
resumed in the case of failure (like power outage, or an impending
Windows update).
The user just needs to select the future::plan
and then paralellism is
handled transparently. More info about future plans/capabilities
here.
#the user enables, in this case, basic multicore parallelism through several processes
future::plan(future::multiprocess)
#train with parallelism enabled, no changes to the code
fforma_fit <- train_metalearning(ts_dataset)
#forecast with parallelism enabled, no changes to the code
fforma_forec <- forecast_metalearning(fforma_fit, ts_dataset)
For saving intermediate results, train_metalearning
and
forecast_metalearning
have the save_foldername
parameter, which must
be set to the name of the folder to save the intermediate results. If
this parameter is set to NULL
, no saving/resume is used. If
save_foldername
is set to an existing folder, the functions will try
to resume the processing from the state saved in the folder. So the
basic use is to launch train_metalearning
or test_metalearning
with
a specific save_foldername
, and if the process is interrupted, we
launch them again with the same save_foldername
vale an process will
resume.
An important additional parameter to use with chunk_size
, which
indicates how many time series are processed between savings. If we set
chunk_size=1000
, the traing/forecast process will stop to save
progress each 1000 series. Too large value for chunk_size
will run
risk of losing a lot of progress, too small will waste a lot of time
saving to disk. An automatic guess of chunk_size
is provided if
chunk_size=NULL
, but it is highly recommended that the users set it
manually to their needs.
Saving can be combined with parallelism.
An example of saving to disk
#run with saving to disk (NOTE chunk_size=10 is too low!, just for example)
fforma_fit <- train_metalearning(ts_dataset, chunk_size = 10, save_foldername = "my_tmp_fforma")
#imagine that the powers goes of when series 14 is being processed...
#...
#... BOOM!
#...
# Now we want to resume!
#We just call use the same function call, now it will try to resume
#train_metalearning will start from series 11 if it finds
#the temp files in save_foldername
fforma_fit <- train_metalearning(ts_dataset, chunk_size = 10, save_foldername = "my_tmp_fforma")
The users can select which basic forecast methods are combined through fforma. The default is based on the fforma submission to the M4 competition (see the reference)
The training can be fine-tuned towards either model selection or model
averaging by setting the objective
parameter in train_metalearning
too either "averaging"
(default) or "selection"
.
fforma_fit <- train_metalearning(ts_dataset, objective = "selection")
The package provides functions for manually tuning the training/forecasting processes. TO BE COMPLETED