ellisp / forecasthybrid Goto Github PK

Convenient functions for ensemble forecasts in R combining approaches from the {forecast} package

License: GNU General Public License v3.0

R 99.56% Makefile 0.44%

forecasthybrid's Issues

calculate MASE for cvts objects

Currently cvts() only saves residuals in the object return. The actual or fitted values will need to be included and the accuracy.cvts() method will need to be modified to implement this.

hybridModel behavior when component models perfectly fit data

Consider the case the individual component models perfectly fit the time series.

hm <- hybridModel(ts(1:100, f = 4), weights = "insample.errors")
hm #weights of 0/NaN
accuracy(hm)#NA/NaN
fitted(hm)#NA/NaN
forecast(hm)# NaN point forecasts

This is definitely a corner case and not likely to occur with real data, however, it's worth considering what the correct behavior should be (e.g. throw error, revert to weights = "equal", etc)

Separate weights for each forecast horizon

We should consider the option of separate weights to each forecast horizon (e.g. auto.arima performs very well for short horizons, nnetar for medium horizons, and ets for long horizons, so if we are forecasting a series the first through sixth predictions receives higher weight from the auto.arima model, the 7th through 10th receives higher weight from the nnetar model, and later observations receive heavier weight from the ets model).

This would ideally be calculated for each horizon, so the weights could be stored in an h * n matrix (n models and h periods). This would involve a minor reworking of the way the weights are stored and used for forecasts, but the bigger issue is computing each horizon since we set the weights during the model selection process hybridModel(), but we wouldn't know how many periods in advance are needed until forecast.hybridModel() is called. We could fix this in several ways. Perhaps we could cap the horizon at some reasonably far distance (e.g. h no longer than twice the length of the input series) and compute these horizons during model selection. Or we could add an argument in hybridModel() that the user sets for the maximum horizon to compute if these separate weights are wanted. Finally, the separate weights could be exposed through an additional argument separate = TRUE. In any case we should consider what happens if h greater than these precomputed horizon weights is input (perhaps throw a warning and then fall back on equal weights or the weight values for the last available horizon)

parallelize cvts

As a first step towards #5, allow cvts to work in parallel.

Decide on and build plot methods

Enhanced prediction intervals

Very conservative prediction intervals are not working (i.e. taking the highest prediction interval value from an individual model for the upper limit and the lowest individual prediction intervals value for the lower limit). We should consider how to enhance this with additional methods that we could expose through a pi.method argument. This could include taking the mean of the model intervals, applying the weights from the models to the intervals, taking the tightest intervals, or bootstrap simulations. There are lots of issues that could arise from the various combination methods (e.g. upper/lower limits that are lower/higher than the mean forecast, etc), so this will need careful thought.

add generic methods for hybridModel objects

print(), summary(), etc

Add vignette

automated model selection

choose the best combination of contributing models

Create documentation

Create Rd files and possibly a vignette.

add unit tests

Add unit tests to enforce behavior between versions and add stringency during package building. Travis continuous integration would be nice.

accuracy() for forecast objects

Do we want accuracy (compared to a known forecast gold standard) to work on our forecast objects out of the box? At the moment you need to know to manually specify the mean of the forecast object is what is to be tested.

> series <- M3[[1]]
> x <- series$x      # ie the data to be fitted
> xx <- series$xx    # ie the true, actual values of the forecast period
> h <- length(xx)    # ie the length of the forecast period
> tmp <- forecast(hybridModel(x), h = h)
Warning message:
In hybridModel(x) :
  The stlm model requires that the input data be a seasonal ts object. The stlm model will not be used.
> accuracy(tmp, xx)
Error in NextMethod(.Generic) : cannot assign 'tsp' to zero-length vector
> accuracy(tmp$mean, xx)
               ME     RMSE      MAE      MPE     MAPE     ACF1 Theil's U
Test set 488.2159 635.8854 522.7854 5.853305 6.495892 0.489442 0.7779601

improve documentation

Improve documentation for list arguments (a.args, e.args, n.args, s.args, t.args), make it clear they take lists. Also change documentation so that xreg is a dataframe instead of a matrix.

inputenc problem in Windows

When I run devtools::check("pkg") on Windows I get an error:

Package inputenc Error: inputenc is not designed for xetex or luatex.

But I don't have that problem on a Linux machine.

I'm not sure what this is - it might be only my own LaTeX setup - but we should work it out before submitting to CRAN.

Here's the full error message for posterity. I've assigned this to me to look into in the first instance by @dashaub if you've any ideas they'd be appreciated.

* checking PDF version of manual ... [37s]
 WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
! Package inputenc Error: inputenc is not designed for xetex or luatex.
(inputenc)                only UTF-8 supported.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
* checking PDF version of manual without hyperrefs or index ...Warning: running command '"C:/PROGRA~1/R/R-32~1.4RE/bin/x64/Rcmd.exe" Rd2pdf  --batch --no-preview --build-dir="C:/Users/PETERE~1/AppData/Local/Temp/Rtmp8qwnBt/Rd2pdf131c2e4535dd" --no-clean -o  forecastHybrid-manual.pdf  "C:/Users/PETERE~1/DOCUME~1/GitHub/FORECA~3/FORECA~1.RCH/forecastHybrid"' had status 1
 ERROR
* DONE
Status: 1 ERROR, 1 WARNING, 1 NOTE
Warning: running command '"C:/PROGRA~1/R/R-32~1.4RE/bin/x64/Rcmd.exe" Rd2pdf  --batch --no-preview --build-dir="C:/Users/PETERE~1/AppData/Local/Temp/Rtmp8qwnBt/Rd2pdf131c34547cb7" --no-clean --no-index -o  forecastHybrid-manual.pdf  C:/Users/PETERE~1/DOCUME~1/GitHub/FORECA~3/FORECA~1.RCH/forecastHybrid' had status 1

See
  'C:/Users/Peter Ellis/Documents/GitHub/forecastHybrid/forecastHybrid.Rcheck/00check.log'
for details.

Error: Command failed (1)
Execution halted

Exited with status 1.

nnetar model fails when frequency(y) * 2L <= * length(y)

Related to issue #12 , nnetar will not work when frequency(y) * 2L <= * length(y). This can be easily handled by throwing a warning and removing the nnetar model from the ensemble.

ggplot graphics

The basic plots were closed in #1 , but we should still consider adding ggplot2 support.

Refitting a hybridModel object

Add the ability to update the weights/errorMethod on an existing hybridModel object. For errorMethod this can easily be achieved without recalculating the component models since the RMSE/MAE/MASE errors are all still extractable from the fit model. For weights, changes to any other method except cv.errors (which is currently unimplemented anyway) can similarly be done without recalculuating the component models.

Support for STL as one of the models?

This of course uses STL to de-seasonalize, model the trend-cycle-remainder using a non-seasonal forecasting method (typically ARIMA or ETS), and then re-seasonalizes using the last annum of seasonal components.

In my experience, this has worked as well an sometimes better than ARIMA or ETS. Any thoughts of including this as well?

problematic ets and stlm forecasts

ets() ignores seasonality for long-seasonal periods, e.g. plot(forecast(ets(taylor), h = 500)) These forecasts are really bad since we're not really estimating an ets model.

stlm() will produce an error on a series that is not periodic, e.g. set.seed(123); stlm(rnorm(100)). We could accept this error and concede that the user should manually remove the s model, or we could drop the ets/stlm models in these cases (and throw a warning). In the case of ets, it is easy to check if frequency > 24. Similarly for stlm(), we can easily check if it is a ts object with at least two periods. Both approaches have advantages/disadvantages.

Remove fpp dependency

There are plenty of datasets that can be used as examples without importing fpp. At a minimum the individual fpp datasets should be imported rather than clobbering the namespace with all fpp objects.

fix overflowing text in plot(forecast()) titles

Add Xgboost into models

Thank you for developing this package. This is awesome. Can we add xgboost into this hybridforecast. Most of the winning solutions for forecasting competitions having xgboost models in them. So that this package covers all time series models plus advanced machine learning algorithms (nnetar and xgboost).

allow arguments to pass through to auto.arima, ets etc in hybridModel

probably by

, a.args = list(), e.args = list(), b.args = list(), t.args = list()

parallelization

Implement parallelization between models. When training multiple models, parallelize between the models, but if additional cores are present, allow individual models (e.g. tbats, auto.arima) to run in parallel themselves.

broken accuracy() method

Somewhere in preparing the S3 generic method accuracy() for CRAN the method became broken. This fix should also remove the useless documentation for the generic method and redirect to accuracy() from the "forecast" package.

documentation build fail

Cryptic warnings on CRAN winbuilder for Latex PDF files. Build on a local LInux machine to debug

OO approach instead of string approach for API

How do you feel about an OO to the API instead of using strings and if statements to specify the model? It might provide greater flexibility and cleaner code.

library(forecast)
library(forecastHybrid)

some_ts <- ts(1:144, frequency = 12)
model_1 <- forecast::stlf(some_ts)
model_2 <- forecast::ets(some_ts)
model_3 <- forecast::auto.arima(some_ts)
model_4 <- forecast::nnetar(some_ts)

ensembled_model <- forecastHybrid::hybridModel(model_1, model_2, model_3, model_4, ...) # other args
class(ensembled_model) # "forecast" "ensemble"

# other API calls consistent with forecast, e.g.
plot(forecast(ensembled_model, h = 14))

hybridModel <- function(models, ...) {
  for (model in models) {
    # code that fits and finds weights and ensembles
  }
  return(ensembled_model)
}

forecastHybrid::hybridModel auto-magically finds weights and returns an object of class c("forecast", "ensemble") that is consistent with all the other functions in the forecast package.

no error when forecast is given a single number as an object

Our tests expect an error in this situation but forecast::forecast does return a (uninteresting) forecast. I'm inclined to remove the test.

> expect_error(forecast(object = 1))
Error: forecast(object = 1) did not throw an error.
> forecast(object = 1)
   Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
1               1    NA    NA    NA    NA
2               1    NA    NA    NA    NA
3               1    NA    NA    NA    NA
4               1    NA    NA    NA    NA
5               1    NA    NA    NA    NA
6               1    NA    NA    NA    NA
7               1    NA    NA    NA    NA
8               1    NA    NA    NA    NA
9               1    NA    NA    NA    NA
10              1    NA    NA    NA    NA

allow stlm with xreg

The documentation needs to be made consistent with the fact that xreg can be used with stlm models but only when stlm(..., method = "arima").

broken xreg

Note that this works

m <- matrix(rnorm(1 * length(wineind)), ncol = 1)
h1 <- hybridModel(wineind, a.args = list(xreg = m))

but this doesn't

mm <- matrix(rnorm(2 * length(wineind)), ncol = 2)
hh <- hybridModel(wineind, a.args = list(xreg = mm))

It appears the problem occurs in do.call(), so this might be related and fixed with issue #27 anyway.

messy function call

The function calls are not formatted the same was as the base models in the "forecast" package.
For example, compare

a1 <- auto.arima(wineind)
a1$call

with

h1 <- hybridModel(wineind)
h1$auto.arima$call

This might be able to be fixed by modifying the do.call() functions that call the individual component models.

e.args passed to auto.arima

See below

hm2 <- hybridModel(y = gas, models = "aenst",
                   a.args = list(max.p = 12, max.q = 12, approximation = FALSE),
                   e.args = list(allow.multiplicative.trend = TRUE),
                   n.args = list(repeats = 50),
                   s.args = list(robust = TRUE),
                   t.args = list(use.armga.errors = FALSE))

Weights for nnetar objects

Forecast 6.2 does not support accuracy(nnetar(rnorm(100))). This will be fixed in v6.3 (see here robjhyndman/forecast#237)

We can probably ignore this for now and wait for this to be resolved in the forecast 6.3 release.

Edit: The same issue exists with accuracy(stlm(AirPassengers)) This has also been fixed for forecast 6.3.

The forecast horizon h must be a positive integer

Hi. I'm getting this error trying to forecast with a hybridmodel object.

forecast(x.ray.m1, h = 18) Error in forecast.hybridModel(x.ray.m1, h = 18) :

My data looks like this:

x.ray.ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 332 346 368 357 269 321 321 330 302 390 319 336
2012 366 334 367 370 378 324 305 353 307 374 397 298
2013 333 362 351 364 440 319 345 409 369 328 317 346
2014 394 322 281 313 293 210 422 483 584 534 444 499
2015 501 445 550 503 494 483 482 478 468 473 350 425
2016 479 519 511 461

Could it be because I'm forecasting by month up to April 2016?

cross validation

This should be pretty easy to implement like these sources:
http://robjhyndman.com/hyndsight/tscvexample/
http://www.r-bloggers.com/functional-and-parallel-time-series-cross-validation/

The biggest thought will need to go into the function header and how to handle parallelization. If we're not careful parallelization could be a complete disaster given all the levels we can parallize (i.e. within models, between models, for cross validation, and for choice of aenst models).

add thetaf as part of the ensemble?

was there a reason we didn't include thetaf in the ensemble, even as an option? If we just didn't get around to it, I'm happy to put some belated work in to add it.

deprecate weights = "insample.errors"

It probably never was a good idea to include this. Equal weights almost certainly lead to models with less variance, and cross validated error weights probably lead to lower model forecasting errors. It might be worth replacing this with the ability to set custom weights, e.g.

hybridModel(wineind, models = “aen”, weights = c(0.2, 0.6, 0.2))

preserve ts attributes in cvts

The current implementation only preserves the frequency component of the ts object, not the start and end attributes

Homogenize documentation

Increase consistency of Rd files with R base man pages.

fpp not explicitly imported

currently the build checks report a problem with {fpp} being listed as a dependency but never being imported. I'm presuming we're likely to use it for example data; so I'm just logging an issue so we remember to explicitly import either the whole namespace or just an explicit dataset when we get to that point.

warning message for unused lists

See below:

trainSet <- beaver1[1:100, ]
testSet <- beaver1[-(1:100), ]
trainXreg <- data.frame(trainSet$activ)
testXreg <- data.frame(testSet$activ)

beaverhm <- hybridModel(trainSet$temp,
                        models = "aest",
                        a.args = list(xreg = trainXreg),
                        n.args = list(xreg = trainXreg),
                        s.args = list(xreg = trainXreg))
forecast(beaverhm, xreg = testXreg)

Warning message:
In hybridModel(trainSet$temp, models = "aent", a.args = list(xreg = trainXreg), :
stlm was not selected in the models argument, but s.args was passed. Ignoring a.args

Add in prediction intervals to forecast.hybridModel

Is the Point Forecast calculation wrong?

Forst of all, really great package with tons of adjustment possibilities.I really like it.
But for me it seems that the point forecast calculation is kind of weird. (Please check the attached image) If it is not wrong, can you please explain me the calculation of it? Thanks in advance.

do we want to check on a.args, s.args, etc separately; or just leave it up to auto.arima() etc to cope?

Just putting this as an issue while cleaning the comment out of the code

Broken generic methods

The print() and summary() methods now don't print nice output after the switch to do.call() for each model call.

nnetar not working with some multi-level prediction intervals

Spotted this error in the README while working on the branch, and I don't see what I might have done to it.

test_that("forecast nnetar prediction intervals with different levels work", {
   hm2 <- hybridModel(wineind, models = "anst", weights = "equal")
   expect_error(forecast(hm2, h = 48), NA)
   expect_error(forecast(hm2, h = 48, level = c(70, 80, 90, 95)), NA)
})

The first forecast in the above works, the second doesn't:

Error: forecast(hm2, h = 48, level = c(70, 80, 90, 95)) threw an error.
subscript out of bounds

I think there's an easy fix but I wanted to merge in first.

Additional models

We should consider handling additional models. If we implement issue #11 this will need to be considered.
Furthermore, once this package is stable I'd like to work on and include arar/ararma models (see robjhyndman/forecast#8)

bats() models might be an addition. Since they are quite similar to tbats() it may not be worth inclusion, but they can provide better fits with default settings in some causes (e.g. compare bats(taylor) to tbats(taylor)).

individual xreg in forecast.hybridModel

Currently forecast.hybridModel() issues a warning if the xreg was only used for one component model and not the others. This behavior could be kept, but more flexibility would be given by allowing forecast.hybridModel() to accept individual xreg arguments in a.xreg and n.xreg the same way a.args and n.args can be used with hybridModel(). This leads to more verbose function calls, but it is certainly conceivable that a user would want to use different xregs (e.g. fourier terms for seasonality) in different models. Scratchpad code below for demonstrating current xreg usage:

library(forecastHybrid)
dat <- wineind
set.seed(12354)
# Test 1
m <- matrix(rnorm(length(dat)), ncol = 1)
mm <- matrix(rnorm(2 * length(dat)), ncol = 2)
a1 <- auto.arima(dat, xreg = m)
h1 <- hybridModel(dat, a.args = list(xreg = m))
a1
h1$auto.arima
forecast(a1, xreg = m)$mean == forecast(h1, xreg = m)$auto.arima$mean


# Test 2
a2 <- auto.arima(dat, xreg = mm)
h2 <- hybridModel(dat, a.args = list(xreg = data.frame(mm)))
a2
h2$auto.arima
forecast(a2, xreg = mm)$mean == forecast(h2, xreg = mm)$auto.arima$mean

General ability to weight models

Start with in-model accuracy but make it easy to integrate cross-validation later