business-science / modeltime.ensemble Goto Github PK

View Code? Open in Web Editor NEW

71.0 5.0 19.0 20.98 MB

Time Series Ensemble Forecasting

Home Page: https://business-science.github.io/modeltime.ensemble/

License: Other

R 94.58% CSS 5.21% Rez 0.21%

modeltime time time-series timeseries forecasting forecast ensemble ensemble-learning tidymodels stacking

modeltime.ensemble's Introduction

modeltime.ensemble

Ensemble Algorithms for Time Series Forecasting with Modeltime

A modeltime extension that implements ensemble forecasting methods including model averaging, weighted averaging, and stacking.

Installation

Install the CRAN version:

install.packages("modeltime.ensemble")

Or, install the development version:

remotes::install_github("business-science/modeltime.ensemble")

Getting Started

Getting Started with Modeltime: Learn the basics of forecasting with Modeltime.
Getting Started with Modeltime Ensemble: Learn the basics of forecasting with Modeltime ensemble models.

Make Your First Ensemble in Minutes

Load the following libraries.

library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(dplyr)
library(timetk)

Step 1 - Create a Modeltime Table

Create a Modeltime Table using the modeltime package.

m750_models
#> # Modeltime Table
#> # A tibble: 3 × 3
#>   .model_id .model     .model_desc            
#>       <int> <list>     <chr>                  
#> 1         1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2         2 <workflow> PROPHET                
#> 3         3 <workflow> GLMNET

Step 2 - Make a Modeltime Ensemble

Then turn that Modeltime Table into a Modeltime Ensemble.

ensemble_fit <- m750_models %>%
    ensemble_average(type = "mean")

ensemble_fit
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (MEAN)
#> 
#> # Modeltime Table
#> # A tibble: 3 × 3
#>   .model_id .model     .model_desc            
#>       <int> <list>     <chr>                  
#> 1         1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2         2 <workflow> PROPHET                
#> 3         3 <workflow> GLMNET

Step 3 - Forecast!

To forecast, just follow the Modeltime Workflow.

# Calibration
calibration_tbl <- modeltime_table(
    ensemble_fit
) %>%
    modeltime_calibrate(testing(m750_splits), quiet = FALSE)

# Forecast vs Test Set
calibration_tbl %>%
    modeltime_forecast(
        new_data    = testing(m750_splits),
        actual_data = m750
    ) %>%
    plot_modeltime_forecast(.interactive = FALSE)

Meet the modeltime ecosystem

Learn a growing ecosystem of forecasting packages

The modeltime ecosystem is growing

Modeltime is part of a growing ecosystem of Modeltime forecasting packages.

Take the High-Performance Forecasting Course

Become the forecasting expert for your organization

High-Performance Time Series Course

Time Series is Changing

Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.

High-Performance Forecasting Systems will save companies by improving accuracy and scalability. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).

How to Learn High-Performance Time Series Forecasting

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. You will learn:

Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
Deep Learning with GluonTS (Competition Winners)
Time Series Preprocessing, Noise Reduction, & Anomaly Detection
Feature engineering using lagged variables & external regressors
Hyperparameter Tuning
Time series cross-validation
Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
Scalable Forecasting - Forecast 1000+ time series in parallel
and more.

Become the Time Series Expert for your organization.

Take the High-Performance Time Series Forecasting Course

modeltime.ensemble's People

Contributors

Stargazers

Watchers

Forkers

stjordanis liuyicai bihailantian21 jingmouren tonyk7440 albertoalmuinha minghao2016 topepo shizelong1985 vishalbelsare khalil628 regisely andreschprr poluru z3br4p01nt olivroy dearborn-enterprises th2harold

modeltime.ensemble's Issues

ensemble_model_spec() %>% add_recipe(recipe_spec)

... Is this a feature request, or could we archive something similar already?

Application:
My xgboost-metalearner fails, if a prediction of a submodel fails. With a recipe, we could eliminate/impute NA an NaN entries in submodel_predictions.

Feature request. Option to select tune_race_anova in ensemble_model_spec

Is it be possible to add tune_race_anova() to ensemble_model_spec() ?

Would speed tuning up a lot.

Information about xgboost engine specific parameters in metalearner output (ensemble_model_spec, param_info)

As I am starting to tune the engine specific parameters in my models, I stumbled across an issue, while tuning scale weights and L1+L2 regularization.
When I define an additional parameter input via param_input, I do not get any error or warning messages, while tuning via ensemble_model_spec, but I do not see any results either.
Where do I find information about engine specific parameter tuning results? Maybe I did misspecify something. Here a small code example:

# META XGB
# scaling
scale_pos_weight <- function(range = c(0.8, 1.2), trans = NULL) {
  new_quant_param(
    type = "double",
    range = range,
    inclusive = c(TRUE, TRUE),
    trans = trans,
    default = 1,
    label = c(scale_pos_weight = "Balance of Events and Non-Events"),
    finalize = NULL
  )
}
# L1 and L2 regularization
penalty_L2 <- function(range = c(-10, 1), trans = log10_trans()) {
  new_quant_param(
    type = "double",
    range = range,
    inclusive = c(TRUE, TRUE),
    trans = trans,
    label = c(penalty_L2 = "Amount of L2 Regularization"),
    finalize = NULL
  )
}
penalty_L1 <- function(range = c(-10, 1), trans = log10_trans()) {
  new_quant_param(
    type = "double",
    range = range,
    inclusive = c(TRUE, TRUE),
    trans = trans,
    label = c(penalty_L1 = "Amount of L1 Regularization"),
    finalize = NULL
  )
}
# hypercube grid
set.seed(23)
xgb_grid_stack <- grid_latin_hypercube(
  learn_rate(range = c(-6.0, -1.0)
  ),
  finalize(mtry(range = c(3, ncol(training(splits))-4)
  ), training(splits)
  ),
  min_n(range = c(2, 25)
  ),
  tree_depth(range = c(2, 17)
  ),
  sample_prop(range = c(0.75, 0.95)
    ),
  loss_reduction(
  ),
  size = 30
)
param_set <- parameters(list(
  scale_pos_weight(),
  penalty_L1(),
  penalty_L2()
  )
  )
set.seed(123)
cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
ensemble_fit_xgboost <- submodel_predictions %>%
  ensemble_model_spec(
    model_spec = boost_tree(
      trees            = 1000,
      tree_depth       = tune(),
      learn_rate       = tune(),
      min_n            = tune(), 
      mtry             = tune(),
      sample_size      = tune(),  
      loss_reduction   = tune(),
    ) %>%
      set_engine("xgboost"),
    kfolds = 10,
    grid   = xgb_grid_stack,
    param_info = param_set,
    control = control_grid(verbose = TRUE,
      allow_par = TRUE
    )
  )
stopCluster(cl)

Vignette - Recursive Forecast Ensembles

Create a vignette that uses the following examples.

Single Time Series
Multiple Time Series

Example 1 - Single Time Series

library(modeltime.ensemble)
library(modeltime)
library(tidymodels)
library(tidyverse)
library(lubridate)
library(timetk)

FORECAST_HORIZON <- 24

m750_extended <- m750 %>%
    group_by(id) %>%
    future_frame(
        .length_out = FORECAST_HORIZON,
        .bind_data  = TRUE
    ) %>%
    ungroup()
#> .date_var is missing. Using: date

recipe_lag <- recipe(value ~ date, m750_extended) %>%
    step_lag(value, lag = 1:FORECAST_HORIZON)

# Data Preparation
m750_lagged <- recipe_lag %>% prep() %>% juice()
m750_lagged
#> # A tibble: 330 x 26
#>    date       value lag_1_value lag_2_value lag_3_value lag_4_value lag_5_value
#>    <date>     <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1 1990-01-01  6370          NA          NA          NA          NA          NA
#>  2 1990-02-01  6430        6370          NA          NA          NA          NA
#>  3 1990-03-01  6520        6430        6370          NA          NA          NA
#>  4 1990-04-01  6580        6520        6430        6370          NA          NA
#>  5 1990-05-01  6620        6580        6520        6430        6370          NA
#>  6 1990-06-01  6690        6620        6580        6520        6430        6370
#>  7 1990-07-01  6000        6690        6620        6580        6520        6430
#>  8 1990-08-01  5450        6000        6690        6620        6580        6520
#>  9 1990-09-01  6480        5450        6000        6690        6620        6580
#> 10 1990-10-01  6820        6480        5450        6000        6690        6620
#> # … with 320 more rows, and 19 more variables: lag_6_value <dbl>,
#> #   lag_7_value <dbl>, lag_8_value <dbl>, lag_9_value <dbl>,
#> #   lag_10_value <dbl>, lag_11_value <dbl>, lag_12_value <dbl>,
#> #   lag_13_value <dbl>, lag_14_value <dbl>, lag_15_value <dbl>,
#> #   lag_16_value <dbl>, lag_17_value <dbl>, lag_18_value <dbl>,
#> #   lag_19_value <dbl>, lag_20_value <dbl>, lag_21_value <dbl>,
#> #   lag_22_value <dbl>, lag_23_value <dbl>, lag_24_value <dbl>

train_data <- m750_lagged %>%
    filter(!is.na(value)) %>%
    drop_na()

future_data <- m750_lagged %>%
    filter(is.na(value))

### Fitting models

model_fit_lm <- linear_reg() %>%
    set_engine("lm") %>%
    fit(value ~ ., data = train_data)

model_fit_mars <- mars("regression") %>%
    set_engine("earth", endspan = 24) %>%
    fit(value ~ ., data = train_data)

recursive_ensemble <- modeltime_table(
    model_fit_lm,
    model_fit_mars
) %>%
    ensemble_average(type = "mean") %>%
    recursive(
        transform  = recipe_lag,
        train_tail = tail(train_data, FORECAST_HORIZON)
    )

fcast <- modeltime_table(
    recursive_ensemble
) %>%
    modeltime_forecast(
        new_data = future_data,
        actual_data = m750
    )

fcast %>%
    plot_modeltime_forecast(
        .interactive = FALSE,
        .conf_interval_show = FALSE,
    )

^{Created on 2021-04-02 by the reprex package (v1.0.0)}

Example 2 - Multiple Time Series Panel

library(modeltime.ensemble)
library(modeltime)
library(tidymodels)
library(tidyverse)
library(lubridate)
library(timetk)

m4_monthly
#> # A tibble: 1,574 x 3
#>    id    date       value
#>    <fct> <date>     <dbl>
#>  1 M1    1976-06-01  8000
#>  2 M1    1976-07-01  8350
#>  3 M1    1976-08-01  8570
#>  4 M1    1976-09-01  7700
#>  5 M1    1976-10-01  7080
#>  6 M1    1976-11-01  6520
#>  7 M1    1976-12-01  6070
#>  8 M1    1977-01-01  6650
#>  9 M1    1977-02-01  6830
#> 10 M1    1977-03-01  5710
#> # … with 1,564 more rows

FORECAST_HORIZON <- 24

m4_extended <- m4_monthly %>%
    group_by(id) %>%
    future_frame(
        .length_out = FORECAST_HORIZON,
        .bind_data  = TRUE
    ) %>%
    ungroup()
#> .date_var is missing. Using: date

# TRANSFORM FUNCTION ----
# - NOTE - We create lags by group
lag_transformer_grouped <- function(data){
    data %>%
        group_by(id) %>%
        tk_augment_lags(value, .lags = 1:FORECAST_HORIZON) %>%
        ungroup()
}

m4_lags <- m4_extended %>%
    lag_transformer_grouped()

train_data <- m4_lags %>%
    drop_na()

future_data <- m4_lags %>%
    filter(is.na(value))


model_fit_lm <- linear_reg() %>%
    set_engine("lm") %>%
    fit(value ~ ., data = train_data)

model_fit_mars <- mars("regression") %>%
    set_engine("earth") %>%
    fit(value ~ ., data = train_data)

recursive_ensemble_p <- modeltime_table(
    model_fit_mars,
    model_fit_lm
) %>%
    ensemble_average(type = "median") %>%
    recursive(
        transform  = lag_transformer_grouped,
        train_tail = panel_tail(train_data, id, FORECAST_HORIZON),
        id = "id"
    )

fcast <- modeltime_table(
    recursive_ensemble_p
) %>%
    modeltime_forecast(
        new_data = future_data,
        actual_data = m4_lags,
        keep_data = TRUE
    )

fcast %>%
    group_by(id) %>%
    plot_modeltime_forecast(
        .interactive = FALSE,
        .conf_interval_show = FALSE
    )

^{Created on 2021-04-02 by the reprex package (v1.0.0)}

Ensembles for Nested Forecasting and at local IDs

Add method for working with Local ID and Nested Forecast Ensembles.

Issues with new parsnip

My local revdep checks did not see an issue but CRAN flagged an error from the new version of parsnip.

══ Testing test-panel-data.R ═══════════════════════════════════════════════════
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 33 ]

── Failure (test-panel-data.R:74:5): ensemble_average(): Forecast Jumbled ──────
accuracy_tbl$mae < 500 is not TRUE

`actual`:   FALSE
`expected`: TRUE 

── Failure (test-panel-data.R:136:5): ensemble_weighted(): Forecast Jumbled ────
accuracy_tbl$mae < 400 is not TRUE

`actual`:   FALSE
`expected`: TRUE 

[ FAIL 2 | WARN 0 | SKIP 0 | PASS 33 ]

The issue is related to the change in the xgboost parameter in tidymodels/parsnip#499

The test case uses prophet_boost and boost_tree models. The latter has the same results; I think that it doesn't tune over mtry. The prophet model is different, due to the mtry difference. Partial waldo diffs are:

old$fit$fit$models$model_2$call vs new$fit$fit$models$model_2$call
  `xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, `
- `    colsample_bytree = 1, min_child_weight = 1, subsample = 1, `
+ `    colsample_bytree = 0.0588235294117647, colsample_bynode = 1, `
- `    objective = "reg:squarederror"), data = x$data, nrounds = 15, `
+ `    min_child_weight = 1, subsample = 1, objective = "reg:squarederror"), `
- `    watchlist = x$watchlist, verbose = 0, nthread = 1)`
+ `    data = x$data, nrounds = 15, watchlist = x$watchlist, verbose = 0, `
+ `    nthread = 1)`

`old$fit$fit$models$model_2$params` is length 9
`new$fit$fit$models$model_2$params` is length 10

    names(old$fit$fit$models$model_2$params) | names(new$fit$fit$models$model_2$params)    
[2] "max_depth"                              | "max_depth"                              [2]
[3] "gamma"                                  | "gamma"                                  [3]
[4] "colsample_bytree"                       | "colsample_bytree"                       [4]
                                             - "colsample_bynode"                       [5]
[5] "min_child_weight"                       | "min_child_weight"                       [6]
[6] "subsample"                              | "subsample"                              [7]
[7] "objective"                              | "objective"                              [8]

`old$fit$fit$models$model_2$params$colsample_bytree`: 1.0
`new$fit$fit$models$model_2$params$colsample_bytree`: 0.1

`old$fit$fit$models$model_2$params$colsample_bynode` is absent
`new$fit$fit$models$model_2$params$colsample_bynode` is a double vector (1)

I'm not sure what to do about this since the new mapping of mtry to colsample_bynode is more appropriate (that xgboost argument didn't exist when parsnip was written).

We could change the call to prophet_boost and tune over mtry. I think that prophet_boost() needs to be updated anyway to be consistent with the new mapping and let users use

set_engine("prophet_xgboost", colsample_bytree = tune()))

if they want.

modeltime.ensemble was removed from CRAN

Hi,

Thank you for your work on this package. It seems the package was removed from CRAN though. Is there an expected time at which it will be available there again?

Thanks.

Error in Tuning Ensembles

I am working on the ensembles portion of the Business Science Forecasting Course. Each time I attempt to tune parameters within the ensemble_model_spec(), I receive the error below. It works when I choose parameters like penalty and mixture manually, but not when I use tune(). I am using the development version of the model.ensemble package:

! ... must be empty.
✖️ Problematic argument:
• ..1 = metric
ℹ️ Did you forget to name an argument?

modeltime_stack: Integrate stacks into modeltime.ensemble package

This is a placeholder for integrating the stacks functionality into modeltime.ensemble.

References:

tidymodels/stacks#2

CC:
@simonpcouch @topepo

modeltime.ensemble removed from cran

Hi,

Thank you for your work on this package. The package is removed from CRAN. Also, can't install dev version from github. Is there an expected time at which it will be available there again? We have a dependency on this package to run our forecast pipeline weekly.

Thanks.

Fatal Error when using xgboost as subModel and metaLearner...

<simpleError in xgboost::xgb.DMatrix(x, label = y, missing = NA): [06:56:24] amalgamation/../src/data/data.cc:981: Check failed: valid: Input data contains infornan Stack trace: [bt] (0) x86_64-pc-linux-gnu-library/4.0/xgboost/libs/xgboost.so(+0x59db5) [0x7f0ba7102db5] [bt] (1) x86_64-pc-linux-gnu-library/4.0/xgboost/libs/xgboost.so(+0x19a2c9) [0x7f0ba72432c9] [bt] (2) x86_64-pc-linux-gnu-library/4.0/xgboost/libs/xgboost.so(+0x19a4bd) [0x7f0ba72434bd] [bt] (3) x86_64-pc-linux-gnu-library/4.0/xgboost/libs/xgboost.so(+0x19a755) [0x7f0ba7243755] [bt] (4) x86_64-pc-linux-gnu-library/4.0/xgboost/libs/xgboost.so(XGDMatrixCreateFromMat_omp+0x82) [0x7f0ba7142da2] [bt] (5) x86_64-pc-linux-gnu-library/4.0/xgboost/libs/xgboost.so(XGDMatrixCreateFromMat_R+0x235) [0x7f0ba7100955] [bt] (6) lib/libR.so(+0xf69b0) [0x7f0bddef39b0] [bt] (7) lib/R/lib/libR.so(+0x134b0e) [0x7f0bddf31b0e] [bt] (8) lib/R/lib/libR.so(Rf_eval+0x1a0) [0x7f0bddf3b5>

Error with modeltime_fit_resamples

data_tbl.xlsx

I am getting the following error when using modeltime_fit_resamples

* Model ID: 3 SEASONAL DECOMP: ETS(A,AD,N)
i Slice1: preprocessor 1/1
v Slice1: preprocessor 1/1
i Slice1: preprocessor 1/1, model 1/1
frequency = 3 observations per 1 quarter
External regressors (xregs) detected. STLM + ETS is a univariate method. Ignoring xregs.
v Slice1: preprocessor 1/1, model 1/1
i Slice1: preprocessor 1/1, model 1/1 (predictions)
Error: Problem with `mutate()` column `.resample_results`.
i `.resample_results = purrr::pmap(...)`.
x <text>:1:2: unexpected ','
1: 0,
     ^

> rlang::last_error()
<error/dplyr:::mutate_error>
Problem with `mutate()` column `.resample_results`.
i `.resample_results = purrr::pmap(...)`.
x <text>:1:2: unexpected ','
1: 0,
     ^
Backtrace:
Run `rlang::last_trace()` to see the full context.

> rlang::last_trace()
<error/dplyr:::mutate_error>
Problem with `mutate()` column `.resample_results`.
i `.resample_results = purrr::pmap(...)`.
x <text>:1:2: unexpected ','
1: 0,
     ^
Backtrace:
     x
  1. +-`%>%`(...)
  2. +-modeltime.resample::modeltime_fit_resamples(...)
  3. +-modeltime.resample:::modeltime_fit_resamples.mdl_time_tbl(...)
  4. | \-modeltime.resample:::map_fit_resamples(data, resamples, control)
  5. |   \-`%>%`(...)
  6. +-dplyr::mutate(...)
  7. +-dplyr:::mutate.data.frame(...)
  8. | \-dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
  9. |   +-base::withCallingHandlers(...)
 10. |   \-mask$eval_all_mutate(quo)
 11. +-purrr::pmap(...)
 12. | \-modeltime.resample:::.f(...)
 13. |   \-cli::cli_li(stringr::str_glue("Model ID: {cli::col_blue(as.character(id))} {cli::col_blue(desc)}"))
 14. |     +-cli:::cli__message(...)
 15. |     | \-"id" %in% names(args)
 16. |     \-base::lapply(items, glue_cmd, .envir = .envir)
 17. |       \-cli:::FUN(X[[i]], ...)
 18. |         \-glue::glue(...)
 19. |           \-glue::glue_data(...)
 20. +-(function (expr) ...
 21. | \-cli:::.transformer(expr, env)
 22. |   \-base::stop(res)
 23. \-(function (e) ...
<error/simpleError>
<text>:1:2: unexpected ','
1: 0,
     ^

Here is the full script:

# Lib Load ----------------------------------------------------------------

if(!require(pacman)) install.packages("pacman")
pacman::p_load(
  "tidymodels",
  "modeltime",
  "tidyverse",
  "lubridate",
  "timetk",
  "odbc",
  "DBI",
  "janitor",
  "timetk",
  "tidyquant",
  "modeltime.ensemble",
  "modeltime.resample",
  "modeltime.h2o"
)

interactive <- TRUE

data_tbl <- xlsx::read.xlsx("data_tbl.xlsx",sheetIndex = 1)

# TS Plot -----------------------------------------------------------------

start_date <- min(data_tbl$date_col)
end_date   <- max(data_tbl$date_col)

plot_time_series(
  .data = data_tbl
  , .date_var = date_col
  , .value = excess_days
  , .title = paste0(
    "Excess Days for IP Discharges from: "
    , start_date
    , " to "
    , end_date
  )
  , .interactive = FALSE
)

plot_seasonal_diagnostics(
  .data = data_tbl
  , .date_var = date_col
  , .value = excess_days
)

plot_anomaly_diagnostics(
  .data = data_tbl
  , .date_var = date_col
  , .value = excess_days
)


# Data Split --------------------------------------------------------------
data_final_tbl <- data_tbl %>%
  select(date_col, excess_days)

splits <- initial_time_split(
  data_final_tbl
  , prop = 0.8
  , cumulative = TRUE
)

# Features ----------------------------------------------------------------

recipe_base <- recipe(excess_days ~ ., data = training(splits)) %>%
  step_timeseries_signature(date_col)

recipe_final <- recipe_base %>%
  step_rm(matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)")) %>%
  step_normalize(contains("index.num"), date_col_year) %>%
  step_dummy(contains("lbl"), one_hot = TRUE) %>%
  step_fourier(date_col, period = 365/12, K = 2) %>%
  step_holiday_signature(date_col) %>%
  step_YeoJohnson(excess_days)

# Models ------------------------------------------------------------------

# Auto ARIMA --------------------------------------------------------------

model_spec_arima_no_boost <- arima_reg() %>%
  set_engine(engine = "auto_arima")

wflw_fit_arima_no_boost <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_arima_no_boost) %>%
  fit(training(splits))

# Boosted Auto ARIMA ------------------------------------------------------

model_spec_arima_boosted <- arima_boost(
    min_n = 2
    , learn_rate = 0.015
  ) %>%
  set_engine(engine = "auto_arima_xgboost")

wflw_fit_arima_boosted <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_arima_boosted) %>%
  fit(training(splits))


# ETS ---------------------------------------------------------------------

model_spec_ets <- exp_smoothing() %>%
  set_engine(engine = "ets") 

wflw_fit_ets <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_ets) %>%
  fit(training(splits))

# model_spec_croston <- exp_smoothing() %>%
#   set_engine(engine = "croston")
# 
# wflw_fit_croston <- workflow() %>%
#   add_recipe(recipe = recipe_final) %>%
#   add_model(model_spec_croston) %>%
#   fit(training(splits))

# model_spec_theta <- exp_smoothing() %>%
#   set_engine(engine = "theta")
# 
# wflw_fit_theta <- workflow() %>%
#   add_recipe(recipe = recipe_final) %>%
#   add_model(model_spec_theta) %>%
#   fit(training(splits))


# STLM ETS ----------------------------------------------------------------

model_spec_stlm_ets <- seasonal_reg() %>%
  set_engine("stlm_ets")

wflw_fit_stlm_ets <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_stlm_ets) %>%
  fit(training(splits))

model_spec_stlm_tbats <- seasonal_reg() %>%
  set_engine("tbats")

wflw_fit_stlm_tbats <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_stlm_tbats) %>%
  fit(training(splits))

model_spec_stlm_arima <- seasonal_reg() %>%
  set_engine("stlm_arima")

wflw_fit_stlm_arima <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_stlm_arima) %>%
  fit(training(splits))

# NNETAR ------------------------------------------------------------------

model_spec_nnetar <- nnetar_reg() %>%
  set_engine("nnetar")

wflw_fit_nnetar <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_nnetar) %>%
  fit(training(splits))

# Prophet -----------------------------------------------------------------

model_spec_prophet <- prophet_reg() %>%
  set_engine(engine = "prophet")

wflw_fit_prophet <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_prophet) %>%
  fit(training(splits))

model_spec_prophet_boost <- prophet_boost(learn_rate = 0.1) %>% 
  set_engine("prophet_xgboost") 

wflw_fit_prophet_boost <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_prophet_boost) %>%
  fit(training(splits))

# TSLM --------------------------------------------------------------------

model_spec_lm <- linear_reg() %>%
  set_engine("lm")

wflw_fit_lm <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_lm) %>%
  fit(training(splits))


# MARS --------------------------------------------------------------------

model_spec_mars <- mars(mode = "regression") %>%
  set_engine("earth")

wflw_fit_mars <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_mars) %>%
  fit(training(splits))

# H2O AutoML --------------------------------------------------------------
# h2o.init(
#   nthreads = -1
#   , ip = 'localhost'
#   , port = 54321
# )
# 
# model_spec <- automl_reg(mode = 'regression') %>%
#   set_engine(
#     engine                     = 'h2o',
#     max_runtime_secs           = 5, 
#     max_runtime_secs_per_model = 3,
#     max_models                 = 3,
#     nfolds                     = 5,
#     exclude_algos              = c("DeepLearning"),
#     verbosity                  = NULL,
#     seed                       = 786
#   ) 
# 
# model_spec
# 
# model_fitted <- model_spec %>%
#   fit(excess_days ~ ., data = training(splits))
# 
# model_fitted
# 
# predict(model_fitted, testing(splits))

# Model Table -------------------------------------------------------------

models_tbl <- modeltime_table(
  #wflw_fit_arima_no_boost,
  wflw_fit_arima_boosted,
  wflw_fit_ets,
  wflw_fit_stlm_ets,
  wflw_fit_stlm_tbats,
  wflw_fit_nnetar,
  wflw_fit_prophet,
  wflw_fit_prophet_boost,
  wflw_fit_lm, 
  wflw_fit_mars
)

# Model Ensemble Table ----------------------------------------------------
resample_tscv <- training(splits) %>%
  time_series_cv(
    date_var      = date_col
    , assess      = "12 months"
    , initial     = "24 months"
    , skip        = "3 months"
    , slice_limit = 1
  )

submodel_predictions <- models_tbl %>% # Model Failure Here 
  modeltime_fit_resamples(
    resamples = resample_tscv
    , control = control_resamples(verbose = TRUE)
  )

ensemble_fit <- submodel_predictions %>%
  ensemble_model_spec(
    model_spec = linear_reg(
      penalty  = tune()
      , mixture = tune()
    ) %>%
      set_engine("glmnet")
    , kfold    = 5
    , grid     = 6
    , control  = control_grid(verbose = TRUE)
  )

fit_mean_ensemble <- models_tbl %>%
  ensemble_average(type = "mean")

fit_median_ensemble <- models_tbl %>%
  ensemble_average(type = "median")

Add combining global and nested models in ensembles

When working with many time series a I'd like to be able to combine nested and global models to have a top performing ensemble (per id).

Tests - Recursive Ensembles

Add tests to cover fitting and refitting recursive() for both single and panel data.

Only a fraction of models show up in modeltime_resamples...

Ubuntu 16.x LTS, R latest, modeltime.ensemble latest

A submodels_tbl has 15 correctly fitted models.
When I try to use them with modeltime_fit_resamples(), only a fraction of them show up in the result of that function (only 4)
Is there an explanation available?

resamples_tscv <- df_train %>%
	time_series_cv(
		assess   = test_len
		,initial = train_len
		#skip    = "2 years",
		,slice_limit = dplyr::n()
	) 

submodel_predictions <- submodels_tbl %>%
	modeltime_fit_resamples(
	resamples = resamples_tscv,
	control   = control_resamples(verbose = TRUE)
)

debugAnalyse<-1
if(debugAnalyse>0){
	# Visualize the Resample Sets
	myPlot<-resamples_tscv %>%
		tk_time_series_cv_plan() %>%
		plot_time_series_cv_plan(
			date, value,
			.facet_ncol  = 2,
			.interactive = TRUE
		)

	print(myPlot)    

	myPlot<-submodel_predictions %>%
		plot_modeltime_resamples(
			.interactive = TRUE
		)

	print(myPlot)

	#View(submodel_predictions)
	
	predictions_tbl <- modeltime.resample::unnest_modeltime_resamples(submodel_predictions)
	
	View(predictions_tbl)

	predictions_tbl$editDate        <- format(Sys.time(), "%Y-%m-%d %H:%M:%S")
	predictions_tbl$pslMetaLearner  <- metaLearner
	fastInOut('predictions_tbl.Rds',predictions_tbl)

	predictions_by_rowid_tbl <- predictions_tbl %>%
		dplyr::select(.row_id, .model_id, .pred) %>%
		dplyr::mutate(.model_id = stringr::str_c(".model_id_", .model_id)) %>%
		tidyr::pivot_wider(names_from  = .model_id,values_from = .pred)

	View(predictions_by_rowid_tbl)                        
				
}

Side note: When I use less then the 15 models, the code breaks while fitting a glmnet-metaLearner. It promps:

 x Slice1: preprocessor 1/1, model 1/1: Error: For the glmnet engine, `penalty` 
 must be a single number (or a value of `tune()`).

... where the model ist correctly tagged with 'penalty=tune::tune()'
I noticed the same effect with lasso (mixture=1).

My guess is, it will be forgotten anywhere in modeltime.ensemble-internal code. Currently I'm testing different metaLearners. Xgboost metaLearner seem to work only without xgboost submodels. Others work fine so far.

It would be nice to have a fallback/try-catch option in modeltime.resample. Otherwise code breaks in huge projects, any time something fails at this point.

Relocate recursive ensemble from `modeltime` to `modeltime.ensemble`

modeltime_fit_resamples does not work with ensemble

This code will run but the ensemble is not present when I calculate accuracy:

# Create average ensemble and add to the modeltime table
ml_mtbl <- ml_mtbl %>% 
    combine_modeltime_tables(
        ml_mtbl %>% 
            ensemble_average() %>% 
            modeltime_table()
    )

# TS CV
resamples_tscv <- time_series_cv(
    data        = train_data,
    assess      = "11 days",
    initial     = "730 days",
    skip        = 11,
    slice_limit = 20,
    cumulative = TRUE
    )

resamples_fitted <- ml_mtbl %>% 
    modeltime_fit_resamples(
        resamples = resamples_tscv,
        control   = control_resamples(verbose = FALSE, allow_par = TRUE)
    )

resamples_fitted
# Modeltime Table
# A tibble: 6 x 4
  .model_id .model         .model_desc               .resample_results
      <int> <list>         <chr>                     <list>           
1         1 <workflow>     XGBOOST                   <rsmp[+]>        
2         2 <workflow>     RANGER                    <rsmp[+]>        
3         3 <workflow>     GLMNET                    <rsmp[+]>        
4         4 <workflow>     KERNLAB                   <rsmp[+]>        
5         5 <workflow>     KERNLAB                   <rsmp[+]>        
6         6 <ensemble [5]> ENSEMBLE (MEAN): 5 MODELS <lgl [1]>      

resamples_fitted %>%
    modeltime_resample_accuracy()

I.e. the output of modeltime_resample_accuracy() will only include the models, not the ensemble.

Same if I have a table with stacked ensembles and try to run this:

model_stack_level_2_mtbl
# Modeltime Table
# A tibble: 5 x 3
  .model_id .model         .model_desc                       
      <int> <list>         <chr>                             
1         1 <ensemble [9]> ENSEMBLE (GLMNET STACK): 9 MODELS 
2         2 <ensemble [9]> ENSEMBLE (RANGER STACK): 9 MODELS 
3         3 <ensemble [9]> ENSEMBLE (XGBOOST STACK): 9 MODELS
4         4 <ensemble [9]> ENSEMBLE (CUBIST STACK): 9 MODELS 
5         5 <ensemble [9]> ENSEMBLE (KERNLAB STACK): 9 MODELS

stacking_rsample_tscv <- model_stack_level_2_mtbl %>%
    modeltime_fit_resamples(
        resamples = test_data,
        control = control_resamples(
            verbose = TRUE,
            pkgs = c("catboost", "treesnip", "Cubist", "rules")
        )
    )

-- Fitting Resamples --------------------------------------------

* Model ID: 1 ENSEMBLE (GLMNET STACK): 9 MODELS
Error: no applicable method for 'mdl_time_fit_resamples' applied to an object of class "c('mdl_time_ensemble_model_spec', 'mdl_time_ensemble')"
* Model ID: 2 ENSEMBLE (RANGER STACK): 9 MODELS
Error: no applicable method for 'mdl_time_fit_resamples' applied to an object of class "c('mdl_time_ensemble_model_spec', 'mdl_time_ensemble')"
* Model ID: 3 ENSEMBLE (XGBOOST STACK): 9 MODELS
Error: no applicable method for 'mdl_time_fit_resamples' applied to an object of class "c('mdl_time_ensemble_model_spec', 'mdl_time_ensemble')"
* Model ID: 4 ENSEMBLE (CUBIST STACK): 9 MODELS
Error: no applicable method for 'mdl_time_fit_resamples' applied to an object of class "c('mdl_time_ensemble_model_spec', 'mdl_time_ensemble')"
* Model ID: 5 ENSEMBLE (KERNLAB STACK): 9 MODELS
Error: no applicable method for 'mdl_time_fit_resamples' applied to an object of class "c('mdl_time_ensemble_model_spec', 'mdl_time_ensemble')"

Session info:

- Session info ------------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  Icelandic_Iceland.1252      
 ctype    Icelandic_Iceland.1252      
 tz       Africa/Casablanca           
 date     2021-08-16                  

- Packages ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ! package            * version    date       lib source                                              
   askpass              1.1        2019-01-13 [1] CRAN (R 4.1.0)                                      
   assertthat           0.2.1      2019-03-21 [1] CRAN (R 4.1.0)                                      
   AzureAuth            1.3.2      2021-05-19 [1] CRAN (R 4.1.0)                                      
   AzureGraph           1.3.1      2021-06-04 [1] CRAN (R 4.1.0)                                      
   AzureKeyVault      * 1.0.4      2020-10-12 [1] CRAN (R 4.1.0)                                      
   AzureRMR             2.4.2      2021-06-03 [1] CRAN (R 4.1.0)                                      
   AzureStor          * 3.5.0      2021-08-03 [1] Github (Azure/AzureStor@19f879b)                    
   backports            1.2.1      2020-12-09 [1] CRAN (R 4.1.0)                                      
   broom              * 0.7.9      2021-07-27 [1] CRAN (R 4.1.0)                                      
   catboost             0.25.1     2021-05-28 [1] url                                                 
   cellranger           1.1.0      2016-07-27 [1] CRAN (R 4.1.0)                                      
   class                7.3-19     2021-05-03 [2] CRAN (R 4.1.0)                                      
   cli                  3.0.1      2021-07-17 [1] CRAN (R 4.1.0)                                      
   codetools            0.2-18     2020-11-04 [2] CRAN (R 4.1.0)                                      
   colorspace           2.0-2      2021-06-24 [1] CRAN (R 4.1.0)                                      
   crayon               1.4.1      2021-02-08 [1] CRAN (R 4.1.0)                                      
   crosstalk            1.1.1      2021-01-12 [1] CRAN (R 4.1.0)                                      
   Cubist               0.3.0      2021-05-28 [1] CRAN (R 4.1.0)                                      
   curl                 4.3.2      2021-06-23 [1] CRAN (R 4.1.0)                                      
   data.table           1.14.0     2021-02-21 [1] CRAN (R 4.1.0)                                      
   DBI                  1.1.1      2021-01-15 [1] CRAN (R 4.1.0)                                      
   dbplyr               2.1.1      2021-04-06 [1] CRAN (R 4.1.0)                                      
   dials              * 0.0.9.9000 2021-07-29 [1] Github (tidymodels/dials@dc9e020)                   
   DiceDesign           1.9        2021-02-13 [1] CRAN (R 4.1.0)                                      
   digest               0.6.27     2020-10-24 [1] CRAN (R 4.1.0)                                      
   doFuture           * 0.12.0     2021-01-04 [1] CRAN (R 4.1.0)                                      
   dplyr              * 1.0.7      2021-06-18 [1] CRAN (R 4.1.0)                                      
   ellipsis             0.3.2      2021-04-29 [1] CRAN (R 4.1.0)                                      
   fansi                0.5.0      2021-05-25 [1] CRAN (R 4.1.0)                                      
   finetune           * 0.1.0      2021-07-21 [1] CRAN (R 4.1.0)                                      
   forcats            * 0.5.1      2021-01-27 [1] CRAN (R 4.1.0)                                      
   foreach            * 1.5.1      2020-10-15 [1] CRAN (R 4.1.0)                                      
   fs                   1.5.0      2020-07-31 [1] CRAN (R 4.1.0)                                      
   furrr                0.2.3      2021-06-25 [1] CRAN (R 4.1.0)                                      
   future             * 1.21.0     2020-12-10 [1] CRAN (R 4.1.0)                                      
   generics             0.1.0      2020-10-31 [1] CRAN (R 4.1.0)                                      
   ggplot2            * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)                                      
   glmnet               4.1-2      2021-06-24 [1] CRAN (R 4.1.0)                                      
   globals              0.14.0     2020-11-22 [1] CRAN (R 4.1.0)                                      
   glue                 1.4.2      2020-08-27 [1] CRAN (R 4.1.0)                                      
   gower                0.2.2      2020-06-23 [1] CRAN (R 4.1.0)                                      
   GPfit                1.0-8      2019-02-08 [1] CRAN (R 4.1.0)                                      
   gridExtra            2.3        2017-09-09 [1] CRAN (R 4.1.0)                                      
   gtable               0.3.0      2019-03-25 [1] CRAN (R 4.1.0)                                      
   hardhat              0.1.6      2021-07-14 [1] CRAN (R 4.1.0)                                      
   haven                2.4.3      2021-08-04 [1] CRAN (R 4.1.0)                                      
   hms                  1.1.0      2021-05-17 [1] CRAN (R 4.1.0)                                      
   htmltools            0.5.1.1    2021-01-22 [1] CRAN (R 4.1.0)                                      
   htmlwidgets          1.5.3      2020-12-10 [1] CRAN (R 4.1.0)                                      
   httr                 1.4.2      2020-07-20 [1] CRAN (R 4.1.0)                                      
   infer              * 0.5.4      2021-01-13 [1] CRAN (R 4.1.0)                                      
   ipred                0.9-11     2021-03-12 [1] CRAN (R 4.1.0)                                      
   iterators            1.0.13     2020-10-15 [1] CRAN (R 4.1.0)                                      
   janitor              2.1.0      2021-01-05 [1] CRAN (R 4.1.0)                                      
   jsonlite             1.7.2      2020-12-09 [1] CRAN (R 4.1.0)                                      
   kernlab              0.9-29     2019-11-12 [1] CRAN (R 4.1.0)                                      
   labeling             0.4.2      2020-10-20 [1] CRAN (R 4.1.0)                                      
   lattice              0.20-44    2021-05-02 [2] CRAN (R 4.1.0)                                      
   lava                 1.6.9      2021-03-11 [1] CRAN (R 4.1.0)                                      
   lazyeval             0.2.2      2019-03-15 [1] CRAN (R 4.1.0)                                      
   lhs                  1.1.1      2020-10-05 [1] CRAN (R 4.1.0)                                      
   lifecycle            1.0.0      2021-02-15 [1] CRAN (R 4.1.0)                                      
   listenv              0.8.0      2019-12-05 [1] CRAN (R 4.1.0)                                      
   lubridate          * 1.7.10     2021-02-26 [1] CRAN (R 4.1.0)                                      
   magrittr             2.0.1      2020-11-17 [1] CRAN (R 4.1.0)                                      
   MASS                 7.3-54     2021-05-03 [2] CRAN (R 4.1.0)                                      
   Matrix               1.3-3      2021-05-04 [2] CRAN (R 4.1.0)                                      
   mime                 0.11       2021-06-23 [1] CRAN (R 4.1.0)                                      
   modeldata          * 0.1.1      2021-07-14 [1] CRAN (R 4.1.0)                                      
   modelr               0.1.8      2020-05-19 [1] CRAN (R 4.1.0)                                      
   modeltime          * 0.7.0      2021-07-16 [1] CRAN (R 4.1.0)                                      
   modeltime.ensemble * 0.4.2      2021-07-16 [1] CRAN (R 4.1.0)                                      
   modeltime.resample * 0.2.0      2021-05-27 [1] Github (business-science/modeltime.resample@abde5ac)
   munsell              0.5.0      2018-06-12 [1] CRAN (R 4.1.0)                                      
   nnet                 7.3-16     2021-05-03 [2] CRAN (R 4.1.0)                                      
   openssl              1.4.4      2021-04-30 [1] CRAN (R 4.1.0)                                      
   parallelly           1.27.0     2021-07-19 [1] CRAN (R 4.1.0)                                      
   parsnip            * 0.1.7      2021-07-21 [1] CRAN (R 4.1.0)                                      
   pillar               1.6.2      2021-07-29 [1] CRAN (R 4.1.0)                                      
   pkgconfig            2.0.3      2019-09-22 [1] CRAN (R 4.1.0)                                      
   plotly               4.9.4.1    2021-06-18 [1] CRAN (R 4.1.0)                                      
   plyr                 1.8.6      2020-03-03 [1] CRAN (R 4.1.0)                                      
   png                  0.1-7      2013-12-03 [1] CRAN (R 4.1.0)                                      
   pROC                 1.17.0.1   2021-01-13 [1] CRAN (R 4.1.0)                                      
   prodlim              2019.11.13 2019-11-17 [1] CRAN (R 4.1.0)                                      
   progressr            0.8.0      2021-06-10 [1] CRAN (R 4.1.0)                                      
   prophet              1.0        2021-03-30 [1] CRAN (R 4.1.0)                                      
   purrr              * 0.3.4      2020-04-17 [1] CRAN (R 4.1.0)                                      
   R6                   2.5.0      2020-10-28 [1] CRAN (R 4.1.0)                                      
   ranger               0.13.1     2021-07-14 [1] CRAN (R 4.1.0)                                      
   rappdirs             0.3.3      2021-01-31 [1] CRAN (R 4.1.0)                                      
   Rcpp                 1.0.7      2021-07-07 [1] CRAN (R 4.1.0)                                      
 D RcppParallel         5.1.4      2021-05-04 [1] CRAN (R 4.1.0)                                      
   readr              * 2.0.0      2021-07-20 [1] CRAN (R 4.1.0)                                      
   readxl               1.3.1      2019-03-13 [1] CRAN (R 4.1.0)                                      
   recipes            * 0.1.16     2021-04-16 [1] CRAN (R 4.1.0)                                      
   reprex               2.0.1      2021-08-05 [1] CRAN (R 4.1.0)                                      
   reshape2             1.4.4      2020-04-09 [1] CRAN (R 4.1.0)                                      
   reticulate           1.20       2021-05-03 [1] CRAN (R 4.1.0)                                      
   rlang              * 0.4.11     2021-04-30 [1] CRAN (R 4.1.0)                                      
   rpart                4.1-15     2019-04-12 [2] CRAN (R 4.1.0)                                      
   rsample            * 0.1.0      2021-05-08 [1] CRAN (R 4.1.0)                                      
   rstudioapi           0.13       2020-11-12 [1] CRAN (R 4.1.0)                                      
   rules              * 0.1.2      2021-08-07 [1] CRAN (R 4.1.0)                                      
   rvest                1.0.1      2021-07-26 [1] CRAN (R 4.1.0)                                      
   scales             * 1.1.1      2020-05-11 [1] CRAN (R 4.1.0)                                      
   sessioninfo          1.1.1      2018-11-05 [1] CRAN (R 4.1.0)                                      
   shape                1.4.6      2021-05-19 [1] CRAN (R 4.1.0)                                      
   snakecase            0.11.0     2019-05-25 [1] CRAN (R 4.1.0)                                      
   StanHeaders          2.21.0-7   2020-12-17 [1] CRAN (R 4.1.0)                                      
   stringi              1.7.3      2021-07-16 [1] CRAN (R 4.1.0)                                      
   stringr            * 1.4.0      2019-02-10 [1] CRAN (R 4.1.0)                                      
   sumots             * 0.1.0      2021-08-16 [1] local                                               
   survival             3.2-11     2021-04-26 [2] CRAN (R 4.1.0)                                      
   tibble             * 3.1.3      2021-07-23 [1] CRAN (R 4.1.0)                                      
   tictoc             * 1.0.1      2021-04-19 [1] CRAN (R 4.1.0)                                      
   tidymodels         * 0.1.3.9000 2021-05-27 [1] Github (tidymodels/tidymodels@69ebce8)              
   tidyr              * 1.1.3      2021-03-03 [1] CRAN (R 4.1.0)                                      
   tidyselect           1.1.1      2021-04-30 [1] CRAN (R 4.1.0)                                      
   tidyverse          * 1.3.1      2021-04-15 [1] CRAN (R 4.1.0)                                      
   timeDate             3043.102   2018-02-21 [1] CRAN (R 4.1.0)                                      
   timetk             * 2.6.1      2021-01-18 [1] CRAN (R 4.1.0)                                      
   treesnip           * 0.1.0.9000 2021-05-31 [1] Github (curso-r/treesnip@bf27cd8)                   
   tune               * 0.1.6      2021-07-21 [1] CRAN (R 4.1.0)                                      
   tzdb                 0.1.2      2021-07-20 [1] CRAN (R 4.1.0)                                      
   utf8                 1.2.2      2021-07-24 [1] CRAN (R 4.1.0)                                      
   vctrs              * 0.3.8      2021-04-29 [1] CRAN (R 4.1.0)                                      
   vip                * 0.3.2      2020-12-17 [1] CRAN (R 4.1.0)                                      
   viridisLite          0.4.0      2021-04-13 [1] CRAN (R 4.1.0)                                      
   withr                2.4.2      2021-04-18 [1] CRAN (R 4.1.0)                                      
   workflows          * 0.2.3      2021-07-16 [1] CRAN (R 4.1.0)                                      
   workflowsets       * 0.1.0      2021-07-22 [1] CRAN (R 4.1.0)                                      
   xgboost            * 1.4.1.1    2021-04-22 [1] CRAN (R 4.1.0)                                      
   xml2                 1.3.2      2020-04-23 [1] CRAN (R 4.1.0)                                      
   xts                  0.12.1     2020-09-09 [1] CRAN (R 4.1.0)                                      
   yaml                 2.2.1      2020-02-01 [1] CRAN (R 4.1.0)                                      
   yardstick          * 0.0.8      2021-03-28 [1] CRAN (R 4.1.0)                                      
   zoo                  1.8-9      2021-03-09 [1] CRAN (R 4.1.0)

Ensemble Model Spec example code causes an error

I have been running the code chunk from
https://business-science.github.io/modeltime.ensemble/reference/ensemble_model_spec.html and i get this error at ensemble_fit_glmnet <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg(
penalty = tune(),
mixture = tune()
) %>%
set_engine("glmnet"),
grid = 2,
control = control_grid(verbose = TRUE)
)

`
── Tuning Model Specification ───────────────────────────────────
ℹ Performing 5-Fold Cross Validation.

Error in tune::show_best():
! ... must be empty.
✖ Problematic argument:
• ..1 = metric
ℹ Did you forget to name an argument?
Backtrace:

submodel_predictions %>% ...
tune:::show_best.tune_results(., metric, n = 1)`

presumably this is because a deprecation in tune a while ago.

predict.mdl_time_ensemble_avg

I was wondering if there was interest in taking in a pull request for a predict.mdl_time_ensemble_avg() function to predict on ensembles (ensemble_average) and aggregate predictions using rowMeans and matrixStats::rowMedians() based on type.

Also, possibly with an option to control how na values are handled if an individual model produce NA values.

Error with modeltime_refit() using resample spec

data_tbl.xlsx

I am running a script where I have created a cross validation that is being passed to modeltime_refit I do believe this may be an underlying issue with tune but am posting here because I am using the modeltime_refit function. Data is attached.

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forecast_8.14              xgboost_1.4.1.1            vctrs_0.3.8               
 [4] rlang_0.4.11               modeltime.h2o_0.1.1        h2o_3.32.1.3              
 [7] modeltime.ensemble_0.4.1   modeltime.resample_0.2.0   tidyquant_1.0.3           
[10] quantmod_0.4.18            TTR_0.24.2                 PerformanceAnalytics_2.0.4
[13] xts_0.12.1                 zoo_1.8-9                  janitor_2.1.0             
[16] DBI_1.1.1                  odbc_1.3.2                 timetk_2.6.1              
[19] lubridate_1.7.10           forcats_0.5.1              stringr_1.4.0             
[22] readr_1.4.0                tidyverse_1.3.1            modeltime_0.6.0           
[25] yardstick_0.0.8            workflowsets_0.0.2         workflows_0.2.2           
[28] tune_0.1.5                 tidyr_1.1.3                tibble_3.1.2              
[31] rsample_0.1.0              recipes_0.1.16             purrr_0.3.4               
[34] parsnip_0.1.6              modeldata_0.1.0            infer_0.5.4               
[37] ggplot2_3.3.3              dplyr_1.0.6                dials_0.0.9               
[40] scales_1.1.1               broom_0.7.6                tidymodels_0.1.3          
[43] pacman_0.5.1              

loaded via a namespace (and not attached):
  [1] readxl_1.3.1         backports_1.2.1      plyr_1.8.6           lazyeval_0.2.2      
  [5] splines_4.0.3        crosstalk_1.1.1      listenv_0.8.0        inline_0.3.19       
  [9] digest_0.6.27        foreach_1.5.1        htmltools_0.5.1.1    earth_5.3.0         
 [13] fansi_0.5.0          magrittr_2.0.1       xlsx_0.6.5           globals_0.14.0      
 [17] modelr_0.1.8         gower_0.2.2          matrixStats_0.59.0   RcppParallel_5.1.4  
 [21] hardhat_0.1.5        prettyunits_1.1.1    tseries_0.10-48      colorspace_2.0-1    
 [25] blob_1.2.1           rvest_1.0.0          haven_2.4.1          callr_3.7.0         
 [29] crayon_1.4.1         RCurl_1.98-1.3       jsonlite_1.7.2       progressr_0.7.0     
 [33] survival_3.2-11      iterators_1.0.13     glue_1.4.2           gtable_0.3.0        
 [37] ipred_0.9-11         V8_3.4.2             pkgbuild_1.2.0       rstan_2.21.2        
 [41] Quandl_2.10.0        Rcpp_1.0.6           plotrix_3.8-1        viridisLite_0.4.0   
 [45] GPfit_1.0-8          bit_4.0.4            Formula_1.2-4        stats4_4.0.3        
 [49] lava_1.6.9           StanHeaders_2.21.0-7 prodlim_2019.11.13   htmlwidgets_1.5.3   
 [53] httr_1.4.2           ellipsis_0.3.2       rJava_1.0-4          loo_2.4.1           
 [57] pkgconfig_2.0.3      farver_2.1.0         nnet_7.3-16          dbplyr_2.1.1        
 [61] utf8_1.2.1           tidyselect_1.1.1     labeling_0.4.2       DiceDesign_1.9      
 [65] reactR_0.4.4         TeachingDemos_2.12   munsell_0.5.0        cellranger_1.1.0    
 [69] tools_4.0.3          cli_2.5.0            generics_0.1.0       yaml_2.2.1          
 [73] processx_3.5.2       bit64_4.0.5          fs_1.5.0             nlme_3.1-152        
 [77] future_1.21.0        reactable_0.2.3      tictoc_1.0.1         xml2_1.3.2          
 [81] LICHospitalR_0.2.0   compiler_4.0.3       rstudioapi_0.13      plotly_4.9.3        
 [85] curl_4.3.1           reprex_2.0.0         lhs_1.1.1            stringi_1.6.2       
 [89] plotmo_3.6.0         ps_1.6.0             lattice_0.20-44      Matrix_1.3-4        
 [93] urca_1.3-0           pillar_1.6.1         lifecycle_1.0.0      furrr_0.2.2         
 [97] lmtest_0.9-38        data.table_1.14.0    bitops_1.0-7         R6_2.5.0            
[101] gridExtra_2.3        parallelly_1.25.0    codetools_0.2-18     MASS_7.3-54         
[105] assertthat_0.2.1     xlsxjars_0.6.1       withr_2.4.2          fracdiff_1.5-1      
[109] parallel_4.0.3       hms_1.1.0            quadprog_1.5-8       grid_4.0.3          
[113] rpart_4.1-15         timeDate_3043.102    class_7.3-19         snakecase_0.11.0    
[117] prophet_1.0          pROC_1.17.0.1

Script Fails at this:

resample_tscv <- training(splits) %>%
  time_series_cv(
    date_var      = date_col
    , assess      = "12 months"
    , initial     = "24 months"
    , skip        = "3 months"
    , slice_limit = 1
  )

refit_tbl <- calibration_tbl %>%
  modeltime_refit(
    data        = data_tbl
    , resamples = resample_tscv
    , control   = control_resamples(verbose = TRUE)
  ) # Fail

The error message produced:

> refit_tbl <- calibration_tbl %>%
+   modeltime_refit(
+     data        = data_tbl
+     , resamples = resample_tscv
+     , control   = control_resamples(verbose = TRUE)
+   )
Error in if ((control$cores > 1) && control$allow_par) { : 
  missing value where TRUE/FALSE needed

Full script:

# Lib Load ----------------------------------------------------------------

if(!require(pacman)) install.packages("pacman")
pacman::p_load(
  "tidymodels",
  "modeltime",
  "tidyverse",
  "lubridate",
  "timetk",
  "odbc",
  "DBI",
  "janitor",
  "timetk",
  "tidyquant",
  "modeltime.ensemble",
  "modeltime.resample",
  "modeltime.h2o"
)

interactive <- TRUE

# Read Data ----
data_final_tbl <- # read in the excel file here

# Data Split --------------------------------------------------------------
data_final_tbl <- data_tbl %>%
  select(date_col, excess_days)

splits <- initial_time_split(
  data_final_tbl
  , prop = 0.8
  , cumulative = TRUE
)

# Features ----------------------------------------------------------------

recipe_base <- recipe(excess_days ~ ., data = training(splits)) %>%
  step_timeseries_signature(date_col)

recipe_final <- recipe_base %>%
  step_rm(matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)")) %>%
  step_normalize(contains("index.num"), date_col_year) %>%
  step_dummy(contains("lbl"), one_hot = TRUE) %>%
  step_fourier(date_col, period = 365/12, K = 2) %>%
  step_holiday_signature(date_col) %>%
  step_YeoJohnson(excess_days)

# Models ------------------------------------------------------------------

# Auto ARIMA --------------------------------------------------------------

model_spec_arima_no_boost <- arima_reg() %>%
  set_engine(engine = "auto_arima")

wflw_fit_arima_no_boost <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_arima_no_boost) %>%
  fit(training(splits))

# Boosted Auto ARIMA ------------------------------------------------------

model_spec_arima_boosted <- arima_boost(
    min_n = 2
    , learn_rate = 0.015
  ) %>%
  set_engine(engine = "auto_arima_xgboost")

wflw_fit_arima_boosted <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_arima_boosted) %>%
  fit(training(splits))


# ETS ---------------------------------------------------------------------

model_spec_ets <- exp_smoothing() %>%
  set_engine(engine = "ets") 

wflw_fit_ets <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_ets) %>%
  fit(training(splits))

model_spec_croston <- exp_smoothing() %>%
  set_engine(engine = "croston")

wflw_fit_croston <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_croston) %>%
  fit(training(splits))

model_spec_theta <- exp_smoothing() %>%
  set_engine(engine = "theta")

wflw_fit_theta <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_theta) %>%
  fit(training(splits))


# STLM ETS ----------------------------------------------------------------

model_spec_stlm_ets <- seasonal_reg() %>%
  set_engine("stlm_ets")

wflw_fit_stlm_ets <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_stlm_ets) %>%
  fit(training(splits))

model_spec_stlm_tbats <- seasonal_reg() %>%
  set_engine("tbats")

wflw_fit_stlm_tbats <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_stlm_tbats) %>%
  fit(training(splits))

model_spec_stlm_arima <- seasonal_reg() %>%
  set_engine("stlm_arima")

wflw_fit_stlm_arima <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_stlm_arima) %>%
  fit(training(splits))

# NNETAR ------------------------------------------------------------------

model_spec_nnetar <- nnetar_reg() %>%
  set_engine("nnetar")

wflw_fit_nnetar <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_nnetar) %>%
  fit(training(splits))

# Prophet -----------------------------------------------------------------

model_spec_prophet <- prophet_reg() %>%
  set_engine(engine = "prophet")

wflw_fit_prophet <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_prophet) %>%
  fit(training(splits))

model_spec_prophet_boost <- prophet_boost(learn_rate = 0.1) %>% 
  set_engine("prophet_xgboost") 

wflw_fit_prophet_boost <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_prophet_boost) %>%
  fit(training(splits))

# TSLM --------------------------------------------------------------------

model_spec_lm <- linear_reg() %>%
  set_engine("lm")

wflw_fit_lm <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_lm) %>%
  fit(training(splits))


# MARS --------------------------------------------------------------------

model_spec_mars <- mars(mode = "regression") %>%
  set_engine("earth")

wflw_fit_mars <- workflow() %>%
  add_recipe(recipe = recipe_final) %>%
  add_model(model_spec_mars) %>%
  fit(training(splits))

# Model Table -------------------------------------------------------------

models_tbl <- modeltime_table(
  #wflw_fit_arima_no_boost,
  wflw_fit_arima_boosted,
  wflw_fit_ets,
  wflw_fit_theta,
  wflw_fit_stlm_ets,
  wflw_fit_stlm_tbats,
  wflw_fit_nnetar,
  wflw_fit_prophet,
  wflw_fit_prophet_boost,
  wflw_fit_lm, 
  wflw_fit_mars
)

# Model Ensemble Table ----------------------------------------------------
resample_tscv <- training(splits) %>%
  time_series_cv(
    date_var      = date_col
    , assess      = "12 months"
    , initial     = "24 months"
    , skip        = "3 months"
    , slice_limit = 1
  )

# This fails from #
submodel_predictions <- models_tbl %>%
  modeltime_fit_resamples(
    resamples = resample_tscv
    , control = control_resamples(verbose = TRUE)
  )

ensemble_fit <- submodel_predictions %>%
  ensemble_model_spec(
    model_spec = linear_reg(
      penalty  = tune()
      , mixture = tune()
    ) %>%
      set_engine("glmnet")
    , kfold    = 5
    , grid     = 6
    , control  = control_grid(verbose = TRUE)
  )

fit_mean_ensemble <- models_tbl %>%
  ensemble_average(type = "mean")

fit_median_ensemble <- models_tbl %>%
  ensemble_average(type = "median")

# Model Table -------------------------------------------------------------

models_tbl <- modeltime_table(
  #wflw_fit_arima_no_boost,
  wflw_fit_arima_boosted,
  wflw_fit_ets,
  wflw_fit_theta,
  wflw_fit_stlm_ets,
  wflw_fit_stlm_tbats,
  wflw_fit_nnetar,
  wflw_fit_prophet,
  wflw_fit_prophet_boost,
  wflw_fit_lm, 
  wflw_fit_mars,
  fit_mean_ensemble,
  fit_median_ensemble
)

models_tbl

# Calibrate Model Testing -------------------------------------------------

calibration_tbl <- models_tbl %>%
  modeltime_calibrate(new_data = testing(splits))
calibration_tbl

# Testing Accuracy --------------------------------------------------------

calibration_tbl %>%
  modeltime_forecast(
    new_data    = testing(splits),
    actual_data = data_tbl
  ) %>%
  plot_modeltime_forecast(
    .legend_max_width   = 25,
    .interactive        = interactive,
    .conf_interval_show = FALSE
  )

calibration_tbl %>%
  modeltime_accuracy() %>%
  arrange(mae) %>%
  table_modeltime_accuracy(resizable = TRUE, bordered = TRUE)

# Refit to all Data -------------------------------------------------------
# **** Failure **** ----
refit_tbl <- calibration_tbl %>%
  modeltime_refit(
    data        = data_tbl
    , resamples = resample_tscv
    , control   = control_resamples(verbose = TRUE)
  )

top_two_models <- refit_tbl %>% 
  modeltime_accuracy() %>% 
  arrange(mae) %>% 
  head(2)

ensemble_models <- refit_tbl %>%
  filter(
    .model_desc %>% 
      str_to_lower() %>%
      str_detect("ensemble")
  ) %>%
  modeltime_accuracy()

model_choices <- rbind(top_two_models, ensemble_models)

refit_tbl %>%
  filter(.model_id %in% model_choices$.model_id) %>%
  modeltime_forecast(h = "1 year", actual_data = data_tbl) %>%
  plot_modeltime_forecast(
    .legend_max_width     = 25
    , .interactive        = FALSE
    , .conf_interval_show = FALSE
  )

# Misc --------------------------------------------------------------------
models_tbl %>%
  modeltime_calibrate(new_data = testing(splits)) %>%
  modeltime_residuals() %>%
  plot_modeltime_residuals()

Recursive ensemble?

In fact, this feature would have to be implemented in modeltime package, but I decided to put it here.
Recursive ensemble instead of ensemble of recursive models may have two potential cons:

We expect the ensemble's forecast to be more accurate in terms of RMSE, MAPE etc., so the input data for the next steps should be better too
We add lagged data once per every time step (and do not recreate it in every sub-model individually)

I could implement it, if you'd be interested in. I've already made a PoC of recursive ensemble, but using stacks.
By the way, it's interesting that in the stacks you cannot simply juxtapose a set off models - we always have to wrap them with some objects for parameter tuning.

Add option to keep submodel forecasts after creating ensemble

Is there an option to also keep the submodel forecasts in the output when forecasting using an ensemble?

At the moment I'm currently doing something like

# Put both the submodels and ensemble into the table
ensemble_tbl <- submodel_table %>% 
  add_modeltime_model(ensemble_fit_glmnet)

It works, but is pretty much duplicating the whole forecasting time duration.

I had a look through this repos modeltime_forecast function and I see the forecasts are made but not kept, I was thinking of adding a flag in those forecast functions to keep_submodel_forecasts, what do you think?

Unable to set parameter ranges in metalearners (misleading error message in "ensemble_model_spec")

When using the standard modeling workflow, without stacked ensembles, I do not experience any hardship in setting individual parameter ranges like this:

xgb_grid <- grid_latin_hypercube(
learn_rate(range = c(-5.0, -0.1)),
size = 30
)

Also I know, how to update parameters and how to pull them from workflow objects.

What I do not know is, how this works with metalearner stacks. If I am not fundamentally wrong, the argument "param_info" is used for this purpose.
As the documentation of ensemble_model_spec states, param_info can take a.dials parameter object as an input. However, ether I am not getting the concept of dials param objects right, or there is some problem with my code, because my solution is resulting in "all models failed, see .notes column".
This error message is not helpful, as I do not have a tuned object created by for example a tune_grid function. Where do I see a notes column here? From this point on I am stucked, because I have no clear indication about the source of the error.

reprex:

# time series ML
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(modeltime.ensemble))
suppressPackageStartupMessages(library(modeltime.resample))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(doParallel))

data <- m4_monthly

H = 6
# training + forecast
full_data_tbl <- data %>%   
  group_by(id) %>%
  future_frame(
    .length_out = H,
    .bind_data  = TRUE
  ) %>%
  ungroup() %>% 
  mutate(id = fct_drop(id)
         )
# training and test
data_prepared_tbl <- full_data_tbl %>%
  filter(!is.na(value)
         )
# forecast 
future_tbl <- full_data_tbl %>%
  filter(is.na(value)
         )
# splits
set.seed(544)
splits <- data_prepared_tbl %>%
  time_series_split(
    date_var    = date,
    assess      = H,
    cumulative = TRUE
  )
resamples_tscv <- data_prepared_tbl %>%
  time_series_cv(
    date_var    = date,
    assess      = "6 months",
    skip        = "6 months",
    cumulative  = TRUE,
    slice_limit = 3
  )
#recipe
recipe_spec_mars <- recipe(value ~ .,
                       data = training(splits)
                       ) %>%  
  update_role(date, new_role = "ID") %>%
  step_dummy(all_nominal(), one_hot = TRUE) 
set.seed(522)
wflw_fit_mars <- workflow() %>%
  add_model(
    mars(num_terms = 5, prod_degree = 2,
         mode = "regression"
    ) %>%
      set_engine("earth")
  ) %>%
  add_recipe(recipe_spec_mars) %>%
  fit(training(splits)
      )
# lasso -----------------------
set.seed(522)
wflw_fit_lasso <- workflow() %>%
  add_model(
    linear_reg(penalty = 0.1, mixture = 1,
               mode = "regression"
    ) %>%
      set_engine("glmnet")
  ) %>%
  add_recipe(recipe_spec_mars) %>%
  fit(training(splits)
      )

#### STACK ------------------------
submodels_stacks <- modeltime_table(
  wflw_fit_lasso,
  wflw_fit_mars
)
# fit resamples
cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
set.seed(234)
submodel_predictions <- submodels_stacks %>%
  modeltime_fit_resamples(
    resamples = resamples_tscv,
    control   = control_resamples(verbose = TRUE)
  )
stopCluster(cl)
# Metalearner XGBOOST
set.seed(123)
cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
ensemble_fit_xgboost <- submodel_predictions %>%
  ensemble_model_spec(
    model_spec = boost_tree(
      trees          = tune(),
      tree_depth     = tune(),
      learn_rate     = tune(),
      loss_reduction = tune(),
      min_n          = tune(), 
      mtry           = tune()
    ) %>%
      set_engine("xgboost"),
    kfolds = 10,
    grid   = 30,
    param_info = tune::parameters(learn_rate(range = c(-0.5, -0.01))
    ),
    control = control_grid(verbose = TRUE,
                           allow_par = TRUE)
  )
stopCluster(cl)