load namespaces

Using foreach(.packages) fully attaches the packages. We should load their namespace inside the worker sessions. However, as we've learned form caret, this is fragile across parallel backends (topepo/caret#1017)

Split from #6

acquisition functions

confidence bounds, probability of improvement, etc.

This should be functions that create a classed object and should have a argument for the metric. These are meant to be passed to the objective function.

more specific tunable methods

This would help with engine arguments in parsnip. For example, if we had

boost_tree(mode = "classification") %>% 
  set_engine("C5.0", rules = TRUE)

then the user would need to create their own grid. For common engine parameters, there should be something a tunable.boost_tree() method that would insert a dials reference when someone chooses to optimize rules.

min_grid breaks

This is more of a parsnip issue but the problem shows up here.

If a model has a fixed parameter, min_grid() breaks

mod <- linear_reg(penalty = tune(), mixture = 1)

grd <- tibble(penalty = 1:5)

min_grid(mod, grd)
#> Error: Result must have length 4, not 0

Created on 2019-08-16 by the reprex package (v0.2.1)

issue template proposal

Making a new issue for tune

Please follow the template below.

If the question is related at all to a specific data analysis, please include a minimal reprex (reproducible example). If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.


  • Here is a good example issue: #46

  • Issues without a reprex will have a lower priority than the others.

  • We don't want you to use confidential data; you can blind the data or simulate other data to demonstrate the issue. The functions caret::twoClassSim() or caret::SLC14_1() might be good tools to simulate data for you.

  • Unless the problem is explicitly about parallel processing, please run sequentially.

    • Even if it about parallel processing, please make sure that it runs sequentially first.
  • Make liberal use of set.seed() to help reproducibility

  • Please check or to see if someone has already asked the same question (see: Yihui's Rule).

  • You might need to install these:

install.packages(c("reprex", "sessioninfo"), repos = "")

When are ready to file the issue, please delete the parts above this line:
< -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

The problem

I'm having trouble with ... or

Have you considered ...

Reproducible example

Copy your code to the clipboard and run:

reprex::reprex(si = TRUE)

catch warnings from estimate_perf

so we don't get a huge number of

A correlation computation is required, but estimate is constant and has 0 standard deviation, resulting in a divide by 0 error. NA will be returned.

at the end

getting_started.Rmd won't build

The problem

getting_started.Rmd fails to build. It appears to be an issue in tune_Bayes, but I'm not yet able to understand what causes the issue.

Reproducible example

(run everything in the getting_started.Rmd before line 254)

ctrl <- Bayes_control(verbose = TRUE)
knn_search <- tune_Bayes(knn_wflow, rs = cv_splits, initial = 5, iter = 20,
                         param_info = knn_param, control = ctrl)

❯  Generating a set of 5 initial parameter resultsInitialization complete

Error: All of the models failed.
In addition: Warning message:
All models failed in tune_grid(). 
getting installation error


trying to install tune


This then asks:

These packages have more recent versions available.
Which would you like to update?

1: All                                          
2: CRAN packages only                           
3: None                                         
4: parsnip ( -> ae42617a9...) [GitHub]
5: cli     (1.1.0      -> ad6410aee...) [GitHub]

Enter one or more numbers, or an empty line to skip updates:
parsnip   ( -> ae42617a9...) [GitHub]
cli       (1.1.0      -> ad6410aee...) [GitHub]
backports (NA         -> 1.1.5       ) [CRAN]

Getting the error:

Error: Failed to install 'tune' from GitHub:
  (converted from warning) cannot remove prior installation of packagebackports

make grid and Bayes functions S3?

Right now they only accept workflows but maybe they would except recipes, formulas, and a model in case no post-processing is needed. We can still use a workflow under the hood

All models failed in tune_grid()

The problem

I'm having trouble reproducing the examples in the grid search article (I think, similar to #59)

Running the code in that article top to bottom, I encounter Warning: All models failed in tune_grid(). when running the grid search, and don't get the output presented in the article.

I should note that I've only used the wider set of tidymodels packages a few times, so I may be missing something more fundamental here, but at the moment I'm not clear why the code in the article doesn't produce the output shown.

Advanced apologies if this isn't enough to reproduce the issue on your end, please let me know what other details I can provide.

Reproducible example


Ionosphere <- Ionosphere %>% select(-V2)

svm_mod <- svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
  set_mode("classification") %>%

iono_rec <-
  recipe(Class ~ ., data = Ionosphere)  %>%
  # In case V1 is has a single value sampled
  step_zv(all_predictors()) %>% 
  # convert it to a dummy variable
  step_dummy(V1) %>%
  # Scale it the same as the others

iono_rs <- bootstraps(Ionosphere, times = 30)

roc_vals <- metric_set(roc_auc)

ctrl <- grid_control(verbose = FALSE)

grid_form <- tune_grid(Class ~ ., model = svm_mod, rs = iono_rs, perf = roc_vals, control = ctrl)
#> Warning: All models failed in tune_grid().
#> # Bootstrap sampling 
#> # A tibble: 30 x 3
#>    splits            id          .metrics
#>  * <list>            <chr>       <list>  
#>  1 <split [351/120]> Bootstrap01 <NULL>  
#>  2 <split [351/130]> Bootstrap02 <NULL>  
#>  3 <split [351/137]> Bootstrap03 <NULL>  
#>  4 <split [351/141]> Bootstrap04 <NULL>  
#>  5 <split [351/131]> Bootstrap05 <NULL>  
#>  6 <split [351/131]> Bootstrap06 <NULL>  
#>  7 <split [351/127]> Bootstrap07 <NULL>  
#>  8 <split [351/123]> Bootstrap08 <NULL>  
#>  9 <split [351/131]> Bootstrap09 <NULL>  
#> 10 <split [351/117]> Bootstrap10 <NULL>  
#> # ... with 20 more rows

Created on 2019-10-09 by the reprex package (v0.3.0)

[1] C:/Users/jamesleach/Documents/R/R-3.6.1/library

(I'm on a corporate machine at the moment, hence the local installs for tune, cli, and parsnip as installing directly from GitHub is problematic).

average predictions

collect_predictions() needs an average argument that will return one prediction per sample (in cases of bootstrap and repeated CV).

This should work for all types of predictions. For probabilities, the columns should be averaged and the set of probability columns should sum to one. For classes, the mode should be used. (see below)

collector functions

collect_pred() instead of get_predictions() and create collect_metrics().

use tunable in tune_args.step

tune_args.step() has a long white-list. Instead of this, tunable() should be run on the step to get the list of arguments that should be evaluated.

by can't contain join column id

The problem

I'm having trouble with running a simple random forest tuning.

Reproducible example


df <- tibble(
  x1 = runif(1000, 5, 10),
  x2 = runif(1000, 20, 30),
  outcome = as.factor(ifelse(x1 + x2 > 32, 1, 0))

rec <- recipe(df, outcome ~ .)
rf_mod <- rand_forest(mode = "classification", trees = tune()) %>% 
cv_splits <- vfold_cv(df, v = 4)
grid <- expand.grid(trees = c(100, 200))

tune_grid(rec, model = rf_mod, rs = cv_splits, grid = grid)

#> by can't contain join column id which is missing from RHS

there should be a method of getting best estimator and best score

As we do tunning like Bayesian hyper-parameter tuning or a simple grid search, there should be a method to get the best tuned estimates e.g.

# define model_spec
spec_xgboost <- boost_tree(mode = "classification", 
                           tree_depth = tune(),
                           trees = tune(),
                           learn_rate = tune(),
                           mtry = tune(),
                           min_n = tune()) %>% set_engine("xgboost")

## The workflow should have a recipe or a formula attached to it.
xgb_wf <- workflow() %>%
            add_model(spec_xgboost) %>%
          add_formula(Class ~ .)

# final grid
grid <- xgb_wf %>%
  param_set() %>%
  update(mtry = mtry(c(1,20)))%>%
  grid_max_entropy(size = 20)

grid_results <- tune_grid(xgb_wf, rs = folds, grid = grid,
                          perf = metric_set(roc_auc),
                          control = grid_control(verbose = TRUE))


tb_results <- tune_Bayes(ames_wflow,rs = cv_splits,initial = initial_grid,perf = metric_set(rmse),
                   objective = exp_improve(foo),
                   iter = 20,
                   control = Bayes_control(verbose = TRUE,uncertain = 10,extract = num_leaves))


I called it get_best_params() and get_best_score(). These set of best estimate params should then could be directly passed to the model fit to build the final model

apologies if its already being implemented.


glmnet for fixed penalty

Since parsnip wants to fit the whole path (and ignores the given single penalty value), we need to find an approach for using linear_reg(penalty = 10^-5, mixture = tune()) or similar. Maybe a custom update() method for glmnet models?

parallel processing

For now, this should probably work for the outer resampling grid.

For tune_rec() and tune_mod(), we might be able to collapse the resampling and parameter loops but this should wait until post-processing is worked out.

We should also disable parallel processing if any engine = "keras".

Also also, there should be a control option to opt-out of parallel processing even if it is available.

first every try and getting error (xgboost parameter tunning)


after installation I tried building my first ever xgboost with parsnip + tune.
A little guidance would be highly appreciated :D

Here is my workflow:

load("Data/okc.RData") # data comes from Max's aml-training repo (

folds <- vfold_cv(okc_train, strata = "Class")

spec_xgboost <- boost_tree(mode = "classification") %>% set_engine("xgboost")

#check which parameters are tunable
spec_xgboost %>% tunable()

# redefine the model_spec
spec_xgboost <- boost_tree(mode = "classification", 
                           tree_depth = tune(),
                           trees = tune(),
                           learn_rate = tune(),
                           mtry = tune(),
                           min_n = tune()) %>% set_engine("xgboost")

# check if these are really tunable (look at the col=tunable)
spec_xgboost %>% tune_args()

xgb_wf <- workflow() %>%

grid <- xgb_wf %>%
  param_set() %>%
  update(mtry = mtry(c(1,20)))%>%
  grid_max_entropy(size = 20)

grid_results <- tune_grid(xgb_wf, rs = folds, grid = grid,
                          control = grid_control(verbose = TRUE))

here is the error:

Error: `tune_rec & !tune_model ~ rlang::call2("tune_rec", !!!args)`, `tune_rec & tune_model ~ rlang::call2("tune_rec_and_mod", !!!args)`, `has_form & tune_model ~ rlang::call2("tune_mod_with_formula", 
    !!!args)`, `!tune_rec & tune_model ~ rlang::call2("tune_mod_with_recipe", 
    !!!args)` must be length 0 or one, not 6

Here is the traceback:

> rlang::last_error()
message: `tune_rec & !tune_model ~ rlang::call2("tune_rec", !!!args)`, `tune_rec & tune_model ~ rlang::call2("tune_rec_and_mod", !!!args)`, `has_form & tune_model ~ rlang::call2("tune_mod_with_formula", 
    !!!args)`, `!tune_rec & tune_model ~ rlang::call2("tune_mod_with_recipe", 
    !!!args)` must be length 0 or one, not 6
class:   `rlang_error`
 1. tune::tune_grid(xgb_wf, rs = folds, grid = grid)
 2. tune:::tune_grid.workflow(xgb_wf, rs = folds, grid = grid)
 3. tune:::tune_grid_workflow(...)
 4. tune:::quarterback(object)
 5. dplyr::case_when(...)
 6. dplyr:::validate_case_when_length(query, value, fs)
 7. dplyr:::bad_calls(...)
 8. dplyr:::glubort(fmt_calls(calls), ..., .envir = .envir)
Call `rlang::last_trace()` to see the full backtrace
> rlang::last_trace()
 1. +-tune::tune_grid(xgb_wf, rs = folds, grid = grid)
 2. \-tune:::tune_grid.workflow(xgb_wf, rs = folds, grid = grid)
 3.   \-tune:::tune_grid_workflow(...)
 4.     \-tune:::quarterback(object)
 5.       \-dplyr::case_when(...)
 6.         \-dplyr:::validate_case_when_length(query, value, fs)
 7.           \-dplyr:::bad_calls(...)
 8.             \-dplyr:::glubort(fmt_calls(calls), ..., .envir = .envir)

optimize iterators

In cases where there are a lot of workers and a fast pre-processor, collapse the for loops and parallel process over all combinations. Also, if there is a single resample, we should iterate over the parameters.

tune_grid() should be able to have good heuristics on what the best approach would be given the tuning grid. In the future, we should have more workflows and have quarterback() be able to more dynamically determine the appropriate items to iterate over.

add a notes column

catalog warnings and errors here (and other stuff). When parallel process, no output comes back and if something goes wrong all you get is a message that "all models failed" (there was details in output but it fails to show).

example issue

The problem

I'm having trouble running grid search to optimize the nonlinearity in two predictors. I'm using a recipe to specify the tuning parameters. The error below is confusing since I'm not using any ID annotations for the parameters:

Error: There are duplicate id values listed in tune(): 'deg_free', 'degree'.

Reproducible example

options (width = 100)
data_folds <- vfold_cv(mtcars, repeats = 2)
spline_rec <-
  recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_predictors()) %>% 
  step_bs(disp, degree = tune(), deg_free = tune()) %>%
  step_bs(wt, degree = tune(), deg_free = tune())

lm_model <-
  linear_reg(mode = "regression") %>%

cars_res <- tune_grid(spline_rec, lm_model, rs = data_folds)
#> Error: There are duplicate `id` values listed in `tune()`: 'deg_free', 'degree'.

Created on 2019-10-01 by the reprex package (v0.2.1)

glmnet fails with one penalty value

Created on 2019-09-04 by the reprex package (v0.2.1)

candidate sets and duplicate parameters

The grid that Bayesian optimization uses to evaluate candidate sets with semi-random but could end up choosing a set of existing parameters, especially if they are integers or qualitative.

We should do an anti_join() on them to filter existing solutions out and exit if there are no additional values to check.

