ellisp / forecasthybrid Goto Github PK
View Code? Open in Web Editor NEWConvenient functions for ensemble forecasts in R combining approaches from the {forecast} package
License: GNU General Public License v3.0
Convenient functions for ensemble forecasts in R combining approaches from the {forecast} package
License: GNU General Public License v3.0
Currently cvts()
only saves residuals in the object return. The actual or fitted values will need to be included and the accuracy.cvts()
method will need to be modified to implement this.
Consider the case the individual component models perfectly fit the time series.
hm <- hybridModel(ts(1:100, f = 4), weights = "insample.errors")
hm #weights of 0/NaN
accuracy(hm)#NA/NaN
fitted(hm)#NA/NaN
forecast(hm)# NaN point forecasts
This is definitely a corner case and not likely to occur with real data, however, it's worth considering what the correct behavior should be (e.g. throw error, revert to weights = "equal"
, etc)
We should consider the option of separate weights to each forecast horizon (e.g. auto.arima
performs very well for short horizons, nnetar
for medium horizons, and ets
for long horizons, so if we are forecasting a series the first through sixth predictions receives higher weight from the auto.arima
model, the 7th through 10th receives higher weight from the nnetar
model, and later observations receive heavier weight from the ets
model).
This would ideally be calculated for each horizon, so the weights could be stored in an h * n matrix (n models and h periods). This would involve a minor reworking of the way the weights are stored and used for forecasts, but the bigger issue is computing each horizon since we set the weights during the model selection process hybridModel()
, but we wouldn't know how many periods in advance are needed until forecast.hybridModel()
is called. We could fix this in several ways. Perhaps we could cap the horizon at some reasonably far distance (e.g. h no longer than twice the length of the input series) and compute these horizons during model selection. Or we could add an argument in hybridModel()
that the user sets for the maximum horizon to compute if these separate weights are wanted. Finally, the separate weights could be exposed through an additional argument separate = TRUE
. In any case we should consider what happens if h
greater than these precomputed horizon weights is input (perhaps throw a warning and then fall back on equal weights or the weight values for the last available horizon)
As a first step towards #5, allow cvts to work in parallel.
Very conservative prediction intervals are not working (i.e. taking the highest prediction interval value from an individual model for the upper limit and the lowest individual prediction intervals value for the lower limit). We should consider how to enhance this with additional methods that we could expose through a pi.method
argument. This could include taking the mean of the model intervals, applying the weights from the models to the intervals, taking the tightest intervals, or bootstrap simulations. There are lots of issues that could arise from the various combination methods (e.g. upper/lower limits that are lower/higher than the mean forecast, etc), so this will need careful thought.
print(), summary(), etc
choose the best combination of contributing models
Create Rd files and possibly a vignette.
Add unit tests to enforce behavior between versions and add stringency during package building. Travis continuous integration would be nice.
Do we want accuracy (compared to a known forecast gold standard) to work on our forecast objects out of the box? At the moment you need to know to manually specify the mean of the forecast object is what is to be tested.
> series <- M3[[1]]
> x <- series$x # ie the data to be fitted
> xx <- series$xx # ie the true, actual values of the forecast period
> h <- length(xx) # ie the length of the forecast period
> tmp <- forecast(hybridModel(x), h = h)
Warning message:
In hybridModel(x) :
The stlm model requires that the input data be a seasonal ts object. The stlm model will not be used.
> accuracy(tmp, xx)
Error in NextMethod(.Generic) : cannot assign 'tsp' to zero-length vector
> accuracy(tmp$mean, xx)
ME RMSE MAE MPE MAPE ACF1 Theil's U
Test set 488.2159 635.8854 522.7854 5.853305 6.495892 0.489442 0.7779601
Improve documentation for list arguments (a.args
, e.args
, n.args
, s.args
, t.args
), make it clear they take lists. Also change documentation so that xreg
is a dataframe instead of a matrix.
When I run devtools::check("pkg")
on Windows I get an error:
Package inputenc Error: inputenc is not designed for xetex or luatex.
But I don't have that problem on a Linux machine.
I'm not sure what this is - it might be only my own LaTeX setup - but we should work it out before submitting to CRAN.
Here's the full error message for posterity. I've assigned this to me to look into in the first instance by @dashaub if you've any ideas they'd be appreciated.
* checking PDF version of manual ... [37s]
WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
! Package inputenc Error: inputenc is not designed for xetex or luatex.
(inputenc) only UTF-8 supported.
See the inputenc package documentation for explanation.
Type H <return> for immediate help.
* checking PDF version of manual without hyperrefs or index ...Warning: running command '"C:/PROGRA~1/R/R-32~1.4RE/bin/x64/Rcmd.exe" Rd2pdf --batch --no-preview --build-dir="C:/Users/PETERE~1/AppData/Local/Temp/Rtmp8qwnBt/Rd2pdf131c2e4535dd" --no-clean -o forecastHybrid-manual.pdf "C:/Users/PETERE~1/DOCUME~1/GitHub/FORECA~3/FORECA~1.RCH/forecastHybrid"' had status 1
ERROR
* DONE
Status: 1 ERROR, 1 WARNING, 1 NOTE
Warning: running command '"C:/PROGRA~1/R/R-32~1.4RE/bin/x64/Rcmd.exe" Rd2pdf --batch --no-preview --build-dir="C:/Users/PETERE~1/AppData/Local/Temp/Rtmp8qwnBt/Rd2pdf131c34547cb7" --no-clean --no-index -o forecastHybrid-manual.pdf C:/Users/PETERE~1/DOCUME~1/GitHub/FORECA~3/FORECA~1.RCH/forecastHybrid' had status 1
See
'C:/Users/Peter Ellis/Documents/GitHub/forecastHybrid/forecastHybrid.Rcheck/00check.log'
for details.
Error: Command failed (1)
Execution halted
Exited with status 1.
Related to issue #12 , nnetar
will not work when frequency(y) * 2L <= * length(y)
. This can be easily handled by throwing a warning and removing the nnetar
model from the ensemble.
The basic plots were closed in #1 , but we should still consider adding ggplot2
support.
Add the ability to update the weights
/errorMethod
on an existing hybridModel object. For errorMethod
this can easily be achieved without recalculating the component models since the RMSE/MAE/MASE errors are all still extractable from the fit model. For weights
, changes to any other method except cv.errors
(which is currently unimplemented anyway) can similarly be done without recalculuating the component models.
This of course uses STL to de-seasonalize, model the trend-cycle-remainder using a non-seasonal forecasting method (typically ARIMA or ETS), and then re-seasonalizes using the last annum of seasonal components.
In my experience, this has worked as well an sometimes better than ARIMA or ETS. Any thoughts of including this as well?
ets()
ignores seasonality for long-seasonal periods, e.g. plot(forecast(ets(taylor), h = 500))
These forecasts are really bad since we're not really estimating an ets model.
stlm()
will produce an error on a series that is not periodic, e.g. set.seed(123); stlm(rnorm(100)).
We could accept this error and concede that the user should manually remove the s
model, or we could drop the ets/stlm models in these cases (and throw a warning). In the case of ets, it is easy to check if frequency > 24. Similarly for stlm(), we can easily check if it is a ts
object with at least two periods. Both approaches have advantages/disadvantages.
There are plenty of datasets that can be used as examples without importing fpp
. At a minimum the individual fpp
datasets should be imported rather than clobbering the namespace with all fpp
objects.
Thank you for developing this package. This is awesome. Can we add xgboost into this hybridforecast. Most of the winning solutions for forecasting competitions having xgboost models in them. So that this package covers all time series models plus advanced machine learning algorithms (nnetar and xgboost).
probably by
, a.args = list(), e.args = list(), b.args = list(), t.args = list()
Implement parallelization between models. When training multiple models, parallelize between the models, but if additional cores are present, allow individual models (e.g. tbats, auto.arima) to run in parallel themselves.
Somewhere in preparing the S3 generic method accuracy()
for CRAN the method became broken. This fix should also remove the useless documentation for the generic method and redirect to accuracy()
from the "forecast" package.
Cryptic warnings on CRAN winbuilder for Latex PDF files. Build on a local LInux machine to debug
How do you feel about an OO to the API instead of using strings and if statements to specify the model? It might provide greater flexibility and cleaner code.
library(forecast)
library(forecastHybrid)
some_ts <- ts(1:144, frequency = 12)
model_1 <- forecast::stlf(some_ts)
model_2 <- forecast::ets(some_ts)
model_3 <- forecast::auto.arima(some_ts)
model_4 <- forecast::nnetar(some_ts)
ensembled_model <- forecastHybrid::hybridModel(model_1, model_2, model_3, model_4, ...) # other args
class(ensembled_model) # "forecast" "ensemble"
# other API calls consistent with forecast, e.g.
plot(forecast(ensembled_model, h = 14))
hybridModel <- function(models, ...) {
for (model in models) {
# code that fits and finds weights and ensembles
}
return(ensembled_model)
}
forecastHybrid::hybridModel
auto-magically finds weights and returns an object of class c("forecast", "ensemble")
that is consistent with all the other functions in the forecast package.
Our tests expect an error in this situation but forecast::forecast does return a (uninteresting) forecast. I'm inclined to remove the test.
> expect_error(forecast(object = 1))
Error: forecast(object = 1) did not throw an error.
> forecast(object = 1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
1 1 NA NA NA NA
2 1 NA NA NA NA
3 1 NA NA NA NA
4 1 NA NA NA NA
5 1 NA NA NA NA
6 1 NA NA NA NA
7 1 NA NA NA NA
8 1 NA NA NA NA
9 1 NA NA NA NA
10 1 NA NA NA NA
The documentation needs to be made consistent with the fact that xreg
can be used with stlm
models but only when stlm(..., method = "arima")
.
Note that this works
m <- matrix(rnorm(1 * length(wineind)), ncol = 1)
h1 <- hybridModel(wineind, a.args = list(xreg = m))
but this doesn't
mm <- matrix(rnorm(2 * length(wineind)), ncol = 2)
hh <- hybridModel(wineind, a.args = list(xreg = mm))
It appears the problem occurs in do.call()
, so this might be related and fixed with issue #27 anyway.
The function calls are not formatted the same was as the base models in the "forecast" package.
For example, compare
a1 <- auto.arima(wineind)
a1$call
with
h1 <- hybridModel(wineind)
h1$auto.arima$call
This might be able to be fixed by modifying the do.call()
functions that call the individual component models.
See below
hm2 <- hybridModel(y = gas, models = "aenst",
a.args = list(max.p = 12, max.q = 12, approximation = FALSE),
e.args = list(allow.multiplicative.trend = TRUE),
n.args = list(repeats = 50),
s.args = list(robust = TRUE),
t.args = list(use.armga.errors = FALSE))
Forecast 6.2 does not support accuracy(nnetar(rnorm(100)))
. This will be fixed in v6.3 (see here robjhyndman/forecast#237)
We can probably ignore this for now and wait for this to be resolved in the forecast 6.3 release.
Edit: The same issue exists with accuracy(stlm(AirPassengers))
This has also been fixed for forecast 6.3.
Hi. I'm getting this error trying to forecast with a hybridmodel object.
forecast(x.ray.m1, h = 18) Error in forecast.hybridModel(x.ray.m1, h = 18) :
My data looks like this:
x.ray.ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 332 346 368 357 269 321 321 330 302 390 319 336
2012 366 334 367 370 378 324 305 353 307 374 397 298
2013 333 362 351 364 440 319 345 409 369 328 317 346
2014 394 322 281 313 293 210 422 483 584 534 444 499
2015 501 445 550 503 494 483 482 478 468 473 350 425
2016 479 519 511 461
Could it be because I'm forecasting by month up to April 2016?
This should be pretty easy to implement like these sources:
http://robjhyndman.com/hyndsight/tscvexample/
http://www.r-bloggers.com/functional-and-parallel-time-series-cross-validation/
The biggest thought will need to go into the function header and how to handle parallelization. If we're not careful parallelization could be a complete disaster given all the levels we can parallize (i.e. within models, between models, for cross validation, and for choice of aenst
models).
was there a reason we didn't include thetaf
in the ensemble, even as an option? If we just didn't get around to it, I'm happy to put some belated work in to add it.
It probably never was a good idea to include this. Equal weights almost certainly lead to models with less variance, and cross validated error weights probably lead to lower model forecasting errors. It might be worth replacing this with the ability to set custom weights, e.g.
hybridModel(wineind, models = “aen”, weights = c(0.2, 0.6, 0.2))
The current implementation only preserves the frequency
component of the ts object, not the start and end attributes
Increase consistency of Rd files with R base man pages.
currently the build checks report a problem with {fpp} being listed as a dependency but never being imported. I'm presuming we're likely to use it for example data; so I'm just logging an issue so we remember to explicitly import either the whole namespace or just an explicit dataset when we get to that point.
See below:
trainSet <- beaver1[1:100, ]
testSet <- beaver1[-(1:100), ]
trainXreg <- data.frame(trainSet$activ)
testXreg <- data.frame(testSet$activ)
beaverhm <- hybridModel(trainSet$temp,
models = "aest",
a.args = list(xreg = trainXreg),
n.args = list(xreg = trainXreg),
s.args = list(xreg = trainXreg))
forecast(beaverhm, xreg = testXreg)
Warning message:
In hybridModel(trainSet$temp, models = "aent", a.args = list(xreg = trainXreg), :
stlm was not selected in the models argument, but s.args was passed. Ignoring a.args
Just putting this as an issue while cleaning the comment out of the code
The print()
and summary()
methods now don't print nice output after the switch to do.call()
for each model call.
Spotted this error in the README while working on the branch, and I don't see what I might have done to it.
test_that("forecast nnetar prediction intervals with different levels work", {
hm2 <- hybridModel(wineind, models = "anst", weights = "equal")
expect_error(forecast(hm2, h = 48), NA)
expect_error(forecast(hm2, h = 48, level = c(70, 80, 90, 95)), NA)
})
The first forecast in the above works, the second doesn't:
Error: forecast(hm2, h = 48, level = c(70, 80, 90, 95)) threw an error.
subscript out of bounds
I think there's an easy fix but I wanted to merge in first.
We should consider handling additional models. If we implement issue #11 this will need to be considered.
Furthermore, once this package is stable I'd like to work on and include arar/ararma models (see robjhyndman/forecast#8)
bats()
models might be an addition. Since they are quite similar to tbats()
it may not be worth inclusion, but they can provide better fits with default settings in some causes (e.g. compare bats(taylor)
to tbats(taylor)
).
Currently forecast.hybridModel()
issues a warning if the xreg
was only used for one component model and not the others. This behavior could be kept, but more flexibility would be given by allowing forecast.hybridModel()
to accept individual xreg
arguments in a.xreg
and n.xreg
the same way a.args
and n.args
can be used with hybridModel()
. This leads to more verbose function calls, but it is certainly conceivable that a user would want to use different xregs (e.g. fourier terms for seasonality) in different models. Scratchpad code below for demonstrating current xreg
usage:
library(forecastHybrid)
dat <- wineind
set.seed(12354)
# Test 1
m <- matrix(rnorm(length(dat)), ncol = 1)
mm <- matrix(rnorm(2 * length(dat)), ncol = 2)
a1 <- auto.arima(dat, xreg = m)
h1 <- hybridModel(dat, a.args = list(xreg = m))
a1
h1$auto.arima
forecast(a1, xreg = m)$mean == forecast(h1, xreg = m)$auto.arima$mean
# Test 2
a2 <- auto.arima(dat, xreg = mm)
h2 <- hybridModel(dat, a.args = list(xreg = data.frame(mm)))
a2
h2$auto.arima
forecast(a2, xreg = mm)$mean == forecast(h2, xreg = mm)$auto.arima$mean
Start with in-model accuracy but make it easy to integrate cross-validation later
Hi. I'm getting this error trying to forecast with a hybridmodel object.
forecast(x.ray.m1, h = 18) Error in forecast.hybridModel(x.ray.m1, h = 18) :
My data looks like this:
x.ray.ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 332 346 368 357 269 321 321 330 302 390 319 336
2012 366 334 367 370 378 324 305 353 307 374 397 298
2013 333 362 351 364 440 319 345 409 369 328 317 346
2014 394 322 281 313 293 210 422 483 584 534 444 499
2015 501 445 550 503 494 483 482 478 468 473 350 425
2016 479 519 511 461
Could it be because I'm forecasting by month up to April 2016?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.