tidymodels / finetune Goto Github PK
View Code? Open in Web Editor NEWAdditional functions for model tuning
Home Page: https://finetune.tidymodels.org/
License: Other
Additional functions for model tuning
Home Page: https://finetune.tidymodels.org/
License: Other
Prepare for release:
git pull
urlchecker::url_check()
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)
See tidymodels/workshops#107 (comment)
Solution is to add an scale_x_continuous(breaks= pretty_breaks())
within the function
The master
branch of this repository will soon be renamed to main
, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.
That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master
--> main
change.
The purpose of this issue is to:
message id: euphoric_snowdog
Prepare for release:
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
Submit to CRAN:
usethis::use_version('major')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
Running revdeps for workflows 0.2.3 revealed a failure with finetune.
Specifically, this test fails now because it expects silence
finetune/tests/testthat/test-anova-filter.R
Lines 131 to 141 in 67658bc
It is no longer silent with workflows 0.2.3 because somewhere in the tuning process (I think in {tune}) one of the pull_*()
functions from workflows is called. Since those are soft-deprecated, a warning is thrown when tests are run (but not when used interactively) and that warning causes the test to fail.
I plan to submit to CRAN anyways, and link to this issue noting that we are aware of this and will update tune and finetune to use the new extract_*()
functions instead
Prepare for release:
git pull
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
git push
usethis::use_github_release()
usethis::use_dev_version()
git push
Following on from https://github.com/tidymodels/extratests/pull/156/files/c147241a882641e12e8c0b89cfdd7aa64817aed4#r1439642019
show_best.tune_race()
should only error, not warn and then error, if a metric is used that is not included in the tune_results
object.
This will also require updating the corresponding tests in extratests.
library(tidymodels)
library(censored)
#> Loading required package: survival
library(finetune)
data("mlc_churn")
mlc_churn <-
mlc_churn %>%
mutate(
churned = ifelse(churn == "yes", 1, 0),
event_time = survival::Surv(account_length, churned)
) %>%
select(event_time, account_length, area_code, total_eve_calls)
set.seed(6941)
churn_rs <- vfold_cv(mlc_churn)
eval_times <- c(50, 100, 150)
churn_rec <-
recipe(event_time ~ ., data = mlc_churn) %>%
step_dummy(area_code) %>%
step_normalize(all_predictors())
tree_spec <-
decision_tree(cost_complexity = tune(), min_n = 2) %>%
set_mode("censored regression")
stc_met <- metric_set(concordance_survival)
set.seed(22)
race_stc_res <- tree_spec %>%
tune_race_anova(
event_time ~ .,
resamples = churn_rs,
grid = tibble(cost_complexity = 10^c(-1.4, -2.5, -3, -5)),
metrics = stc_met
)
show_best(race_stc_res, metric = "brier_survival_integrated")
#> Warning: Metric "concordance_survival" was used to evaluate model candidates in the race
#> but "brier_survival_integrated" has been chosen to rank the candidates. These
#> results may not agree with the race.
#> Error in `show_best()`:
#> ! "brier_survival_integrated" was not in the metric set. Please choose
#> from: "concordance_survival".
#> Backtrace:
#> β
#> 1. ββtune::show_best(race_stc_res, metric = "brier_survival_integrated")
#> 2. ββfinetune:::show_best.tune_race(race_stc_res, metric = "brier_survival_integrated")
#> 3. ββbase::NextMethod(...)
#> 4. ββtune:::show_best.tune_results(...)
#> 5. ββtune::choose_metric(x, metric)
#> 6. ββtune:::check_metric_in_tune_results(mtr_info, metric, call = call)
#> 7. ββcli::cli_abort(...)
#> 8. ββrlang::abort(...)
Created on 2024-01-05 with reprex v2.0.2
It doesn't appear to say what metric is used for optimization (context). We can use the same verbiage as tune_bayes()
.
The resamples in tune_race_anova()
are not assessed in sequence, they are assessed in random order. For models that take long time to tune, it is hard to know the current progress (how many resamples are done and how many are remaining). It would be nice to have a progress indicator as the count of finalised resamples vs the count of the remaining ones.
library(kernlab)
library(tidymodels)
library(finetune)
data(cells, package = "modeldata")
cells <- cells %>% select(-case) %>% slice_head(n = 1000)
set.seed(6376)
rs <- bootstraps(cells, times = 5)
svm_spec <-
svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
set_engine("kernlab") %>%
set_mode("classification")
svm_rec <-
recipe(class ~ ., data = cells) %>%
step_YeoJohnson(all_predictors()) %>%
step_normalize(all_predictors())
svm_wflow <-
workflow() %>%
add_model(svm_spec) %>%
add_recipe(svm_rec)
set.seed(1)
svm_grid <-
svm_spec %>%
parameters() %>%
grid_latin_hypercube(size = 5)
set.seed(2)
svm_wflow %>%
tune_race_anova(
resamples = rs,
grid = svm_grid,
control = control_race(
verbose = TRUE,
verbose_elim = TRUE))
#> i Bootstrap4: preprocessor 1/1
#> β Bootstrap4: preprocessor 1/1
#> i Bootstrap4: preprocessor 1/1, model 1/5
#> β Bootstrap4: preprocessor 1/1, model 1/5
#> i Bootstrap4: preprocessor 1/1, model 1/5 (predictions)
#> i Bootstrap4: preprocessor 1/1
#> β Bootstrap4: preprocessor 1/1
#> i Bootstrap4: preprocessor 1/1, model 2/5
#> β Bootstrap4: preprocessor 1/1, model 2/5
#> i Bootstrap4: preprocessor 1/1, model 2/5 (predictions)
#> i Bootstrap4: preprocessor 1/1
#> β Bootstrap4: preprocessor 1/1
#> i Bootstrap4: preprocessor 1/1, model 3/5
#> β Bootstrap4: preprocessor 1/1, model 3/5
#> i Bootstrap4: preprocessor 1/1, model 3/5 (predictions)
#> i Bootstrap4: preprocessor 1/1
#> β Bootstrap4: preprocessor 1/1
#> i Bootstrap4: preprocessor 1/1, model 4/5
#> β Bootstrap4: preprocessor 1/1, model 4/5
#> i Bootstrap4: preprocessor 1/1, model 4/5 (predictions)
#> i Bootstrap4: preprocessor 1/1
#> β Bootstrap4: preprocessor 1/1
#> i Bootstrap4: preprocessor 1/1, model 5/5
#> β Bootstrap4: preprocessor 1/1, model 5/5
#> i Bootstrap4: preprocessor 1/1, model 5/5 (predictions)
#> i Bootstrap1: preprocessor 1/1
#> β Bootstrap1: preprocessor 1/1
#> i Bootstrap1: preprocessor 1/1, model 1/5
#> β Bootstrap1: preprocessor 1/1, model 1/5
#> i Bootstrap1: preprocessor 1/1, model 1/5 (predictions)
#> i Bootstrap1: preprocessor 1/1
#> β Bootstrap1: preprocessor 1/1
#> i Bootstrap1: preprocessor 1/1, model 2/5
#> β Bootstrap1: preprocessor 1/1, model 2/5
#> i Bootstrap1: preprocessor 1/1, model 2/5 (predictions)
#> i Bootstrap1: preprocessor 1/1
#> β Bootstrap1: preprocessor 1/1
#> i Bootstrap1: preprocessor 1/1, model 3/5
#> β Bootstrap1: preprocessor 1/1, model 3/5
#> i Bootstrap1: preprocessor 1/1, model 3/5 (predictions)
#> i Bootstrap1: preprocessor 1/1
#> β Bootstrap1: preprocessor 1/1
#> i Bootstrap1: preprocessor 1/1, model 4/5
#> β Bootstrap1: preprocessor 1/1, model 4/5
#> i Bootstrap1: preprocessor 1/1, model 4/5 (predictions)
#> i Bootstrap1: preprocessor 1/1
#> β Bootstrap1: preprocessor 1/1
#> i Bootstrap1: preprocessor 1/1, model 5/5
#> β Bootstrap1: preprocessor 1/1, model 5/5
#> i Bootstrap1: preprocessor 1/1, model 5/5 (predictions)
#> i Bootstrap3: preprocessor 1/1
#> β Bootstrap3: preprocessor 1/1
#> i Bootstrap3: preprocessor 1/1, model 1/5
#> β Bootstrap3: preprocessor 1/1, model 1/5
#> i Bootstrap3: preprocessor 1/1, model 1/5 (predictions)
#> i Bootstrap3: preprocessor 1/1
#> β Bootstrap3: preprocessor 1/1
#> i Bootstrap3: preprocessor 1/1, model 2/5
#> β Bootstrap3: preprocessor 1/1, model 2/5
#> i Bootstrap3: preprocessor 1/1, model 2/5 (predictions)
#> i Bootstrap3: preprocessor 1/1
#> β Bootstrap3: preprocessor 1/1
#> i Bootstrap3: preprocessor 1/1, model 3/5
#> β Bootstrap3: preprocessor 1/1, model 3/5
#> i Bootstrap3: preprocessor 1/1, model 3/5 (predictions)
#> i Bootstrap3: preprocessor 1/1
#> β Bootstrap3: preprocessor 1/1
#> i Bootstrap3: preprocessor 1/1, model 4/5
#> β Bootstrap3: preprocessor 1/1, model 4/5
#> i Bootstrap3: preprocessor 1/1, model 4/5 (predictions)
#> i Bootstrap3: preprocessor 1/1
#> β Bootstrap3: preprocessor 1/1
#> i Bootstrap3: preprocessor 1/1, model 5/5
#> β Bootstrap3: preprocessor 1/1, model 5/5
#> i Bootstrap3: preprocessor 1/1, model 5/5 (predictions)
#> βΉ Racing will maximize the roc_auc metric.
#> βΉ Resamples are analyzed in a random order.
#> βΉ Bootstrap4: 3 eliminated; 2 candidates remain.
#> i Bootstrap2: preprocessor 1/1
#> β Bootstrap2: preprocessor 1/1
#> i Bootstrap2: preprocessor 1/1, model 1/2
#> β Bootstrap2: preprocessor 1/1, model 1/2
#> i Bootstrap2: preprocessor 1/1, model 1/2 (predictions)
#> i Bootstrap2: preprocessor 1/1
#> β Bootstrap2: preprocessor 1/1
#> i Bootstrap2: preprocessor 1/1, model 2/2
#> β Bootstrap2: preprocessor 1/1, model 2/2
#> i Bootstrap2: preprocessor 1/1, model 2/2 (predictions)
#> βΉ Bootstrap2: 0 eliminated; 2 candidates remain.
#> i Bootstrap5: preprocessor 1/1
#> β Bootstrap5: preprocessor 1/1
#> i Bootstrap5: preprocessor 1/1, model 1/2
#> β Bootstrap5: preprocessor 1/1, model 1/2
#> i Bootstrap5: preprocessor 1/1, model 1/2 (predictions)
#> i Bootstrap5: preprocessor 1/1
#> β Bootstrap5: preprocessor 1/1
#> i Bootstrap5: preprocessor 1/1, model 2/2
#> β Bootstrap5: preprocessor 1/1, model 2/2
#> i Bootstrap5: preprocessor 1/1, model 2/2 (predictions)
#> # Tuning results
#> # Bootstrap sampling
#> # A tibble: 5 x 5
#> splits id .order .metrics .notes
#> <list> <chr> <int> <list> <list>
#> 1 <split [1000/370]> Bootstrap1 2 <tibble [10 Γ 6]> <tibble [0 Γ 1]>
#> 2 <split [1000/358]> Bootstrap3 3 <tibble [10 Γ 6]> <tibble [0 Γ 1]>
#> 3 <split [1000/365]> Bootstrap4 1 <tibble [10 Γ 6]> <tibble [0 Γ 1]>
#> 4 <split [1000/365]> Bootstrap2 4 <tibble [4 Γ 6]> <tibble [0 Γ 1]>
#> 5 <split [1000/360]> Bootstrap5 5 <tibble [4 Γ 6]> <tibble [0 Γ 1]>
Error in if (sum(filters_results$pass) == 2) { :
missing value where TRUE/FALSE needed
Part of tidymodels/tune#704 and analogous to #91
Here is currently errors but it would be nice if only warned once about this, like in #91
library(tidymodels)
library(censored)
#> Loading required package: survival
library(finetune)
lung_surv <- lung %>%
dplyr::mutate(surv = Surv(time, status), .keep = "unused")
# mode is not censored regression
set.seed(2193)
tune_res <-
linear_reg(penalty = tune(), engine = "glmnet") %>%
tune_race_win_loss(
mpg ~ .,
resamples = vfold_cv(mtcars, 5),
metrics = metric_set(rmse),
eval_time = 10
)
#> Warning in tune::tune_grid(., resamples = tmp_resamples, param_info = param_info, : Evaluation times are only required when the model mode is "censored regression"
#> (and will be ignored).
#> Error in caller_env(): could not find function "caller_env"
# static metric
set.seed(2193)
tune_res <-
proportional_hazards(penalty = tune(), engine = "glmnet") %>%
tune_race_win_loss(
surv ~ .,
resamples = vfold_cv(lung_surv, 5),
metrics = metric_set(concordance_survival),
eval_time = 10
)
#> Warning in tune::tune_grid(., resamples = tmp_resamples, param_info = param_info, : Evaluation times are only required when dynmanic or integrated metrics are used
#> (and will be ignored here).
#> Error in caller_env(): could not find function "caller_env"
Created on 2024-01-16 with reprex v2.0.2
For this example, the restart was set to 10 iterations but it restarts at iteration 8:
ctrl_sa <- control_sim_anneal(verbose = TRUE, no_improve = 10L)
set.seed(1234)
svm_sa <-
svm_wflow %>%
tune_sim_anneal(
resamples = penguins_folds,
metrics = roc_res,
initial = svm_initial,
param_info = svm_param,
iter = 50,
control = ctrl_sa
)
## Optimizing roc_auc
## Initial best: 0.84948
## 1 β― accept suboptimal roc_auc=0.57004 (+/-0.172)
## 2 β― accept suboptimal roc_auc=0.57004 (+/-0.172)
## 3 β― accept suboptimal roc_auc=0.56876 (+/-0.1715)
## 4 β― accept suboptimal roc_auc=0.56876 (+/-0.1715)
## 5 β― accept suboptimal roc_auc=0.56876 (+/-0.1715)
## 6 + better suboptimal roc_auc=0.56942 (+/-0.1715)
## 7 β― accept suboptimal roc_auc=0.56848 (+/-0.1718)
## 8 β restart from best roc_auc=0.56876 (+/-0.1715)
## 9 β― accept suboptimal roc_auc=0.84948 (+/-0.01685)
## 10 β― accept suboptimal roc_auc=0.84948 (+/-0.01685)
Also, for some other reason, the initial grid gives different ROC results each run. This might be due to the Platt scaling used by kernlab
; its CV doesn't use a controllable seed. However, we don't see this in the book example.
This SO question seems to highlight a situation where finetune isn't handling an edge case very well:
library(tidymodels)
library(finetune)
#> Registered S3 method overwritten by 'finetune':
#> method from
#> obj_sum.tune_race tune
data(cells, package = "modeldata")
set.seed(31)
split <- cells %>%
select(-case) %>%
initial_split(prop = 0.8)
set.seed(234)
folds <- training(split) %>% vfold_cv(v = 3)
folds
#> # 3-fold cross-validation
#> # A tibble: 3 Γ 2
#> splits id
#> <list> <chr>
#> 1 <split [1076/539]> Fold1
#> 2 <split [1077/538]> Fold2
#> 3 <split [1077/538]> Fold3
xgb_spec <- boost_tree(mode = "classification", trees = tune())
set.seed(234)
workflow(class ~ ., xgb_spec) %>%
tune_grid(
resamples = folds,
grid = 5
)
#> # Tuning results
#> # 3-fold cross-validation
#> # A tibble: 3 Γ 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [1076/539]> Fold1 <tibble [10 Γ 5]> <tibble [0 Γ 3]>
#> 2 <split [1077/538]> Fold2 <tibble [10 Γ 5]> <tibble [0 Γ 3]>
#> 3 <split [1077/538]> Fold3 <tibble [10 Γ 5]> <tibble [0 Γ 3]>
set.seed(345)
workflow(class ~ ., xgb_spec) %>%
tune_race_anova(
resamples = folds,
grid = 5
)
#> Error in `mutate()`:
#> ! Problem while computing `col = purrr::map(splits, ~NULL)`.
#> x `col` must be size 1, not 0.
Created on 2022-02-22 by the reprex package (v2.0.1)
The error is coming from tune:::pulley()
and I think maybe it is removing all the candidates at some step? Because they are too similar? It results in a pretty confusing error.
When trying to use tune_race_anova
, tune_race_win_loss
, or tune_sim_anneal
functions, The following error messages pop up:
>at <- learner_xgboost %>% # finalize mtry doesn't work here.
+ tune_sim_anneal(
+ resamples = cv_folds,
+ iter = 2,
+ metrics = metric_set(roc_auc),
+ control = control_sim_anneal(verbose = F)
+ )
Error in `dials::grid_latin_hypercube()`:
! These arguments contains unknowns: `mtry`. See the `finalize()` function.
Run `rlang::last_error()` to see where the error occurred.
> at <- learner_xgboost %>% # finalize mtry doesn't work here.
+ tune_race_anova(
+ resamples = cv_folds,
+ grid = 2,
+ metrics = metric_set(roc_auc),
+ control = control_race(verbose = F)
+ )
i Creating pre-processing data to finalize unknown parameter: mtry
Error in `vec_slice()`:
! Column `splits` (size 1) must match the data frame (size 3).
βΉ In file slice.c at line 188.
βΉ Install the winch package to get additional debugging info the next time you
get this error.
βΉ This is an internal error in the rlang package, please report it to the package
authors.
Backtrace:
β
1. ββlearner_xgboost %>% ...
2. ββfinetune::tune_race_anova(...)
3. ββfinetune:::tune_race_anova.workflow(...)
4. β ββfinetune:::tune_race_anova_workflow(...)
5. β ββobject %>% ...
6. ββtune::tune_grid(...)
7. ββtune:::tune_grid.workflow(...)
8. β ββtune:::tune_grid_workflow(...)
9. β ββtune:::tune_grid_loop(...)
10. β ββtune:::pull_metrics(resamples, results, control)
11. β ββtune:::pulley(resamples, res, ".metrics")
12. β ββdplyr::arrange(resamples, !!!syms(id_cols))
13. β ββdplyr:::arrange.data.frame(resamples, !!!syms(id_cols))
14. β ββdplyr::dplyr_row_slice(.data, loc)
15. β ββdplyr:::dplyr_row_slice.data.frame(.data, loc)
16. β ββdplyr::dplyr_reconstruct(vec_slice(data, i), data)
17. β β ββdplyr:::dplyr_new_data_frame(data)
18. β β ββrow.names %||% .row_names_info(x, type = 0L)
19. β β ββbase::.row_names_info(x, type = 0L)
20. β ββvctrs::vec_slice(data, i)
21. ββrlang:::stop_internal_c_lib(...)
22. ββrlang::abort(message, call = call, .internal = TRUE)
> at <- learner_xgboost %>% # finalize mtry doesn't work here.
+ tune_race_win_loss(
+ resamples = cv_folds,
+ grid = 2,
+ metrics = metric_set(roc_auc),
+ control = control_race(verbose = F)
+ )
i Creating pre-processing data to finalize unknown parameter: mtry
Error in `mutate()`:
! Problem while computing `col = purrr::map(splits, ~NULL)`.
β `col` must be size 1, not 0.
Run `rlang::last_error()` to see where the error occurred.
Everything work fine when using tune_grid()
, tho.
Pre-history
usethis::use_readme_rmd()
usethis::use_roxygen_md()
usethis::use_github_links()
usethis::use_pkgdown_github_pages()
usethis::use_tidy_github_labels()
usethis::use_tidy_style()
usethis::use_tidy_description()
urlchecker::url_check()
2020
usethis::use_package_doc()
@importFrom
directives here.usethis::use_import_from()
is handy for this.usethis::use_testthat(3)
and upgrade to 3e, testthat 3e vignetteR/
files and test/
files for workflow happiness.usethis::rename_files()
can be helpful.2021
usethis::use_tidy_dependencies()
usethis::use_tidy_github_actions()
and update artisanal actions to use setup-r-dependencies
cran-comments.md
2022
usethis::use_tidy_coc()
development
is mode: auto
in pkgdown configmaster
--> main
issuesI'm having trouble using the tune_race_anova() function with a spatial_block_cv object from the spatialsample package whilst using parallel computing. I receive the error βThere were no valid metrics for the ANOVA model.β and the show_notes(.Last.tune.result) gives Error in FUN()
: ! x
must be a vector, not a <sfc_POINT/sfc> object. The error does not occur if I don't use parallel computing.
I get a similar error when using the tune_grid() function from the tune package. However, if I specify control = control_grid(pkgs = "sf"), the tune_grid() function will work with the spatial_block_cv object and parallel computing.
I think the issue is that specifying control = control_race(pkgs = "sf") in the tune_race_anova() function is not being passed to control$pkgs (line 232 in the function code). I can get the tune_race_anova() function to work with parallel computing and a spatial_block_cv object if I modify the function by including βsfβ in the list of packages passed to control$pkgs.
The reprex shows tuning using the tune_grid() function, with and without the control = control_grid(pkgs = "sf") argument to show the error and how it is addressed; as well as tuning using the tune_race_anova() function with and without the control = control_race(pkgs = "sf") to show that the error remains.
I did have some success using the workaround suggested for #39 which I have included at the end of the reprex.
Thanks for your help!
# Load packages and prepare data ------------------------------------------
library(tidymodels)
library(spatialsample)
library(finetune)
library(sf)
#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
tidymodels_prefer()
# Function to clean up parallel computing backends (if needed) from:
# paste0("https://stackoverflow.com/questions/64519640/",
# "error-in-summary-connectionconnection-invalid-connection")
unregister_dopar <- function() {
env <- foreach:::.foreachGlobals
rm(list=ls(name=env), pos=env)
}
# Data
data("ames", package = "modeldata")
# Convert to sf object for spatial resampling
ames_sf <- sf::st_as_sf(
x = ames[1:200, ],
coords = c("Longitude", "Latitude"),
crs = 4326
)
# Resampling --------------------------------------------------------------
# Spatial resampling using the spatialsample package
set.seed(123)
spatial_block_folds <- spatial_block_cv(ames_sf, v = 5)
# Model specification -----------------------------------------------------
bart_spec <-
parsnip::bart(trees = tune()) |>
set_mode("regression") |>
set_engine("dbarts")
bart_rec <-
recipe(Sale_Price ~ Year_Built + Bldg_Type + Gr_Liv_Area,
data = ames)
bart_wflow <-
workflow() |>
add_model(bart_spec) |>
add_recipe(bart_rec)
# Grid tuning using the tune package --------------------------------------
## Grid tuning without control - gives an error message
cores <- parallel::detectCores(logical = FALSE)
cl <- parallel::makePSOCKcluster(cores)
doParallel::registerDoParallel(cl)
tune_grid_no_control <-
bart_wflow |>
tune_grid(
resamples = spatial_block_folds
)
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
# Show what the error message was
show_notes(.Last.tune.result)
#> unique notes:
#> βββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> Error in `FUN()`:
#> ! `x` must be a vector, not a <sfc_POINT/sfc> object.
parallel::stopCluster(cl)
unregister_dopar()
## Grid tuning specifying the sf package in control - no error message
cl <- parallel::makePSOCKcluster(cores)
doParallel::registerDoParallel(cl)
tune_grid_with_control <-
bart_wflow |>
tune_grid(
resamples = spatial_block_folds,
control = control_grid(pkgs = "sf")
)
parallel::stopCluster(cl)
unregister_dopar()
# Racing method using the finetune package --------------------------------
## Racing method without control - gives the same error message
cl <- parallel::makePSOCKcluster(cores)
doParallel::registerDoParallel(cl)
tune_race_no_control <-
bart_wflow |>
tune_race_anova(
resamples = spatial_block_folds
)
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Error in `test_parameters_gls()`:
#> ! There were no valid metrics for the ANOVA model.
#> Backtrace:
#> β
#> 1. ββfinetune::tune_race_anova(bart_wflow, resamples = spatial_block_folds)
#> 2. ββfinetune:::tune_race_anova.workflow(bart_wflow, resamples = spatial_block_folds)
#> 3. ββfinetune:::tune_race_anova_workflow(...)
#> 4. ββfinetune:::test_parameters_gls(res, control$alpha)
#> 5. ββrlang::abort("There were no valid metrics for the ANOVA model.")
# Show what the error message was
show_notes(.Last.tune.result)
#> unique notes:
#> βββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> Error in `FUN()`:
#> ! `x` must be a vector, not a <sfc_POINT/sfc> object.
parallel::stopCluster(cl)
unregister_dopar()
## Racing method specifying the sf package in control - error remains
cl <- parallel::makePSOCKcluster(cores)
doParallel::registerDoParallel(cl)
tune_race_no_control <-
bart_wflow |>
tune_race_anova(
resamples = spatial_block_folds,
control = control_race(pkgs = "sf")
)
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Error in `test_parameters_gls()`:
#> ! There were no valid metrics for the ANOVA model.
#> Backtrace:
#> β
#> 1. ββfinetune::tune_race_anova(...)
#> 2. ββfinetune:::tune_race_anova.workflow(...)
#> 3. ββfinetune:::tune_race_anova_workflow(...)
#> 4. ββfinetune:::test_parameters_gls(res, control$alpha)
#> 5. ββrlang::abort("There were no valid metrics for the ANOVA model.")
# Show what the error message was
show_notes(.Last.tune.result)
#> unique notes:
#> βββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> Error in `FUN()`:
#> ! `x` must be a vector, not a <sfc_POINT/sfc> object.
parallel::stopCluster(cl)
unregister_dopar()
## Racing method using result from tune_grid() as the initial argument
# This comes from the suggested workaround for another issue. See:
# https://github.com/tidymodels/finetune/issues/39#issuecomment-1132266958
# I think this works:
cl <- parallel::makePSOCKcluster(cores)
doParallel::registerDoParallel(cl)
bart_rs <-
bart_wflow |>
tune_grid(resamples = spatial_block_folds,
control = control_grid(pkgs = "sf"),
grid = 3)
tune_race_init <-
bart_wflow |>
tune_race_anova(
resamples = spatial_block_folds,
iter = 3,
initial = bart_rs
)
#> Warning: The `...` are not used in this function but one or more objects were
#> passed: 'iter', 'initial'
parallel::stopCluster(cl)
unregister_dopar()
Created on 2023-05-30 with reprex v2.0.2
sessioninfo::session_info()
#> β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 4.2.3 (2023-03-15 ucrt)
#> os Windows 10 x64 (build 19042)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_Australia.utf8
#> ctype English_Australia.utf8
#> tz Australia/Perth
#> date 2023-05-30
#> pandoc 2.19.2 @ C:/program files/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date (UTC) lib source
#> backports 1.4.1 2021-12-13 [2] CRAN (R 4.2.0)
#> boot 1.3-28.1 2022-11-22 [2] CRAN (R 4.2.3)
#> broom * 1.0.4 2023-03-11 [2] CRAN (R 4.2.3)
#> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.2.3)
#> class 7.3-21 2023-01-23 [2] CRAN (R 4.2.3)
#> classInt 0.4-9 2023-02-28 [1] CRAN (R 4.2.2)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.3)
#> codetools 0.2-19 2023-02-01 [2] CRAN (R 4.2.3)
#> colorspace 2.1-0 2023-01-23 [2] CRAN (R 4.2.3)
#> conflicted 1.2.0 2023-02-01 [2] CRAN (R 4.2.3)
#> data.table 1.14.8 2023-02-17 [2] CRAN (R 4.2.3)
#> dbarts 0.9-23 2023-01-23 [1] CRAN (R 4.2.3)
#> DBI 1.1.3 2022-06-18 [2] CRAN (R 4.2.3)
#> dials * 1.2.0 2023-04-03 [2] CRAN (R 4.2.3)
#> DiceDesign 1.9 2021-02-13 [2] CRAN (R 4.2.3)
#> digest 0.6.31 2022-12-11 [2] CRAN (R 4.2.3)
#> doParallel 1.0.17 2022-02-07 [1] CRAN (R 4.2.3)
#> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.2.3)
#> e1071 1.7-13 2023-02-01 [1] CRAN (R 4.2.2)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.2.3)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2)
#> fastmap 1.1.1 2023-02-24 [2] CRAN (R 4.2.3)
#> finetune * 1.1.0 2023-04-19 [1] CRAN (R 4.2.3)
#> foreach 1.5.2 2022-02-02 [2] CRAN (R 4.2.3)
#> fs 1.6.2 2023-04-25 [1] CRAN (R 4.2.3)
#> furrr 0.3.1 2022-08-15 [2] CRAN (R 4.2.3)
#> future 1.32.0 2023-03-07 [2] CRAN (R 4.2.3)
#> future.apply 1.10.0 2022-11-05 [2] CRAN (R 4.2.3)
#> generics 0.1.3 2022-07-05 [2] CRAN (R 4.2.3)
#> ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.2.3)
#> globals 0.16.2 2022-11-21 [2] CRAN (R 4.2.2)
#> glue 1.6.2 2022-02-24 [2] CRAN (R 4.2.3)
#> gower 1.0.1 2022-12-22 [2] CRAN (R 4.2.2)
#> GPfit 1.0-8 2019-02-08 [2] CRAN (R 4.2.3)
#> gtable 0.3.3 2023-03-21 [2] CRAN (R 4.2.3)
#> hardhat 1.3.0 2023-03-30 [2] CRAN (R 4.2.3)
#> htmltools 0.5.5 2023-03-23 [2] CRAN (R 4.2.3)
#> infer * 1.0.4 2022-12-02 [2] CRAN (R 4.2.3)
#> ipred 0.9-14 2023-03-09 [2] CRAN (R 4.2.3)
#> iterators 1.0.14 2022-02-05 [2] CRAN (R 4.2.3)
#> KernSmooth 2.23-20 2021-05-03 [2] CRAN (R 4.2.3)
#> knitr 1.42 2023-01-25 [2] CRAN (R 4.2.3)
#> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.3)
#> lava 1.7.2.1 2023-02-27 [2] CRAN (R 4.2.3)
#> lhs 1.1.6 2022-12-17 [2] CRAN (R 4.2.3)
#> lifecycle 1.0.3 2022-10-07 [2] CRAN (R 4.2.3)
#> listenv 0.9.0 2022-12-16 [2] CRAN (R 4.2.3)
#> lme4 1.1-33 2023-04-25 [1] CRAN (R 4.2.3)
#> lubridate 1.9.2 2023-02-10 [1] CRAN (R 4.2.2)
#> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.2.3)
#> MASS 7.3-58.2 2023-01-23 [2] CRAN (R 4.2.3)
#> Matrix 1.5-3 2022-11-11 [1] CRAN (R 4.2.2)
#> memoise 2.0.1 2021-11-26 [2] CRAN (R 4.2.3)
#> minqa 1.2.5 2022-10-19 [2] CRAN (R 4.2.3)
#> modeldata * 1.1.0 2023-01-25 [2] CRAN (R 4.2.3)
#> munsell 0.5.0 2018-06-12 [2] CRAN (R 4.2.3)
#> nlme 3.1-162 2023-01-31 [2] CRAN (R 4.2.3)
#> nloptr 2.0.3 2022-05-26 [2] CRAN (R 4.2.3)
#> nnet 7.3-18 2022-09-28 [2] CRAN (R 4.2.3)
#> parallelly 1.35.0 2023-03-23 [2] CRAN (R 4.2.3)
#> parsnip * 1.1.0 2023-04-12 [2] CRAN (R 4.2.3)
#> pillar 1.9.0 2023-03-22 [2] CRAN (R 4.2.3)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.3)
#> prodlim 2023.03.31 2023-04-02 [2] CRAN (R 4.2.3)
#> proxy 0.4-27 2022-06-09 [2] CRAN (R 4.2.3)
#> purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.2.3)
#> Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.2.2)
#> recipes * 1.0.6 2023-04-25 [2] CRAN (R 4.2.3)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.3)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.2.3)
#> rmarkdown 2.21 2023-03-26 [2] CRAN (R 4.2.3)
#> rpart 4.1.19 2022-10-21 [2] CRAN (R 4.2.3)
#> rsample * 1.1.1 2022-12-07 [2] CRAN (R 4.2.3)
#> rstudioapi 0.14 2022-08-22 [2] CRAN (R 4.2.3)
#> s2 1.1.4 2023-05-17 [1] CRAN (R 4.2.3)
#> scales * 1.2.1 2022-08-20 [2] CRAN (R 4.2.3)
#> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.2.3)
#> sf * 1.0-12 2023-03-19 [1] CRAN (R 4.2.3)
#> spatialsample * 0.4.0 2023-05-17 [1] CRAN (R 4.2.3)
#> survival 3.5-3 2023-02-12 [2] CRAN (R 4.2.3)
#> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.2.3)
#> tidymodels * 1.1.0 2023-05-01 [1] CRAN (R 4.2.3)
#> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.2)
#> tidyselect 1.2.0 2022-10-10 [2] CRAN (R 4.2.3)
#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.2)
#> timeDate 4022.108 2023-01-07 [2] CRAN (R 4.2.3)
#> tune * 1.1.1 2023-04-11 [2] CRAN (R 4.2.3)
#> units 0.8-2 2023-04-27 [1] CRAN (R 4.2.3)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2)
#> vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.2.3)
#> withr 2.5.0 2022-03-03 [2] CRAN (R 4.2.3)
#> wk 0.7.3 2023-05-06 [1] CRAN (R 4.2.3)
#> workflows * 1.1.3 2023-02-22 [2] CRAN (R 4.2.3)
#> workflowsets * 1.0.1 2023-04-06 [2] CRAN (R 4.2.3)
#> xfun 0.39 2023-04-20 [1] CRAN (R 4.2.3)
#> yaml 2.3.7 2023-01-23 [2] CRAN (R 4.2.3)
#> yardstick * 1.2.0 2023-04-21 [2] CRAN (R 4.2.3)
#>
#> [1] C:/Users/00055815/AppData/Local/R/win-library/4.2
#> [2] C:/Program Files/R/R-4.2.3/library
#>
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Prepare for release:
git pull
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
git push
usethis::use_github_release()
usethis::use_dev_version()
git push
Part of tidymodels/tune#704
It would be nice if this warning is only thrown once. (One of them comes from tune_grid()
but is irrelevant to the user.)
library(tidymodels)
library(censored)
#> Loading required package: survival
library(finetune)
lung_surv <- lung %>%
dplyr::mutate(surv = Surv(time, status), .keep = "unused")
# mode is not censored regression
set.seed(2193)
tune_res <-
linear_reg(penalty = tune(), engine = "glmnet") %>%
tune_race_anova(
mpg ~ .,
resamples = vfold_cv(mtcars, 5),
metrics = metric_set(rmse),
eval_time = 10
)
#> Warning in tune::tune_grid(., tmp_resamples, param_info = param_info, grid = grid, : Evaluation times are only required when the model mode is "censored regression"
#> (and will be ignored).
#> Warning in tune_race_anova(., mpg ~ ., resamples = vfold_cv(mtcars, 5), : Evaluation times are only required when the model mode is "censored regression"
#> (and will be ignored).
# static metric
set.seed(2193)
tune_res <-
proportional_hazards(penalty = tune(), engine = "glmnet") %>%
tune_race_anova(
surv ~ .,
resamples = vfold_cv(lung_surv, 5),
metrics = metric_set(concordance_survival),
eval_time = 10
)
#> Warning in tune::tune_grid(., tmp_resamples, param_info = param_info, grid = grid, : Evaluation times are only required when dynmanic or integrated metrics are used
#> (and will be ignored here).
#> Warning in tune_race_anova(., surv ~ ., resamples = vfold_cv(lung_surv, : Evaluation times are only required when dynmanic or integrated metrics are used
#> (and will be ignored here).
Created on 2024-01-16 with reprex v2.0.2
The internals of autoplot()
for racing functions uses map_dfr()
, a deprecated function from purrr:
Line 30 in 5e9d49e
We should likely use map()
and list_rbind()
instead. tidymodels/recipes#1204 is a good example of another PR that does the same.
.Last.tune.results
gives incorrect results for racing and simulated annealing output for the same reason as in tidymodels/tune#623.
library(tidymodels)
library(finetune)
tune_race_anova(
boost_tree(mtry = tune(), min_n = tune(), trees = 100, mode = "regression"),
log(Sale_Price) ~ .,
bootstraps(ames, 5)
)
#> # Tuning results
#> # Bootstrap sampling
#> # A tibble: 5 Γ 5
#> splits id .order .metrics .notes
#> <list> <chr> <int> <list> <list>
#> 1 <split [2930/1055]> Bootstrap2 2 <tibble [20 Γ 6]> <tibble [0 Γ 3]>
#> 2 <split [2930/1079]> Bootstrap3 3 <tibble [20 Γ 6]> <tibble [0 Γ 3]>
#> 3 <split [2930/1065]> Bootstrap4 1 <tibble [20 Γ 6]> <tibble [0 Γ 3]>
#> 4 <split [2930/1060]> Bootstrap5 4 <tibble [16 Γ 6]> <tibble [0 Γ 3]>
#> 5 <split [2930/1099]> Bootstrap1 5 <tibble [16 Γ 6]> <tibble [0 Γ 3]>
.Last.tune.result
#> # Tuning results
#> # Bootstrap sampling
#> # A tibble: 1 Γ 5
#> splits id .order .metrics .notes
#> <list> <chr> <int> <list> <list>
#> 1 <split [2930/1099]> Bootstrap1 5 <tibble [16 Γ 6]> <tibble [0 Γ 3]>
Created on 2023-02-27 with reprex v2.0.2
stacks relies on being able to create a map from predictions to metrics in order to keep track of candidate members. The .config
column allows us to get this essentially "for free," though I came across an exception with simulated annealing while working on tidymodels/extratests#59.
It'd certainly be possible to special-case this in stacks' machinery, though I wonder if this inconsistency may lead to issues in other applications of finetune.
A reprex, including an example with tune_bayes
to show that this isn't just an issue with iterative tuning:
# load libraries ---------------------------------------------------------------
library(tidymodels)
library(finetune)
# setup ------------------------------------------------------------------------
data(ames)
ames$Sale_Price <- log(ames$Sale_Price)
spec <- linear_reg(engine = "glmnet", penalty = tune(), mixture = tune())
form <- Sale_Price ~ .
set.seed(1)
boots <- bootstraps(ames, times = 5)
metrics <- yardstick::metric_set(yardstick::rmse)
First, demonstrating with simulated annealing:
# simulated annealing ----------------------------------------------------------
set.seed(1)
res_sim_anneal <-
tune_sim_anneal(
object = spec, preproc = form, resamples = boots, metrics = metrics,
control = control_sim_anneal(save_pred = TRUE, save_workflow = TRUE)
)
#> Optimizing rmse
#> Initial best: 0.18813
#> 1 β₯ new best rmse=0.16864 (+/-0.01079)
#> 2 β― accept suboptimal rmse=0.16981 (+/-0.01065)
#> 3 β― accept suboptimal rmse=0.17384 (+/-0.01191)
#> 4 β discard suboptimal rmse=0.18508 (+/-0.01052)
#> 5 + better suboptimal rmse=0.16915 (+/-0.01145)
#> 6 β― accept suboptimal rmse=0.16962 (+/-0.01227)
#> 7 β discard suboptimal rmse=0.17377 (+/-0.01203)
#> 8 β discard suboptimal rmse=0.18469 (+/-0.007478)
#> 9 β restart from best rmse=0.17793 (+/-0.01141)
#> 10 β― accept suboptimal rmse=0.16928 (+/-0.01203)
preds_sim_anneal <- collect_predictions(res_sim_anneal)
metrics_sim_anneal <- collect_metrics(res_sim_anneal)
The entries in collect_predictions
' .config
don't have matches in collect_metrics
output:
preds_sim_anneal %>% pull(.config) %>% unique()
#> [1] "Preprocessor1_Model1"
metrics_sim_anneal %>% pull(.config) %>% unique()
#> [1] "initial_Preprocessor1_Model1" "Iter1"
#> [3] "Iter2" "Iter3"
#> [5] "Iter4" "Iter5"
#> [7] "Iter6" "Iter7"
#> [9] "Iter8" "Iter9"
#> [11] "Iter10"
There are indeed several configurations in the predictions, though:
metrics_sim_anneal %>%
count(penalty, mixture)
#> # A tibble: 11 Γ 3
#> penalty mixture n
#> <dbl> <dbl> <int>
#> 1 0.0000625 0.580 1
#> 2 0.000584 0.681 1
#> 3 0.00111 0.796 1
#> 4 0.00132 0.597 1
#> 5 0.00270 0.742 1
#> 6 0.00433 0.364 1
#> 7 0.00539 0.686 1
#> 8 0.00909 0.480 1
#> 9 0.00918 0.605 1
#> 10 0.0249 0.839 1
#> 11 0.0565 0.448 1
preds_sim_anneal %>%
count(penalty, mixture)
#> # A tibble: 11 Γ 3
#> penalty mixture n
#> <dbl> <dbl> <int>
#> 1 0.0000625 0.580 5333
#> 2 0.000584 0.681 5333
#> 3 0.00111 0.796 5333
#> 4 0.00132 0.597 5333
#> 5 0.00270 0.742 5333
#> 6 0.00433 0.364 5333
#> 7 0.00539 0.686 5333
#> 8 0.00909 0.480 5333
#> 9 0.00918 0.605 5333
#> 10 0.0249 0.839 5333
#> 11 0.0565 0.448 5333
For comparison with another method making use of iterative tuning, an analogous example with Bayesian tuning:
# bayes ------------------------------------------------------------------------
set.seed(1)
res_bayes <-
tune_bayes(
object = spec, preproc = form, resamples = boots, metrics = metrics,
control = control_bayes(save_pred = TRUE, save_workflow = TRUE)
)
preds_bayes <- collect_predictions(res_bayes)
metrics_bayes <- collect_metrics(res_bayes)
preds_bayes %>% pull(.config) %>% unique()
#> [1] "Preprocessor1_Model1" "Preprocessor1_Model2" "Preprocessor1_Model3"
#> [4] "Preprocessor1_Model4" "Preprocessor1_Model5" "Iter1"
#> [7] "Iter2" "Iter3" "Iter4"
#> [10] "Iter5" "Iter6" "Iter7"
#> [13] "Iter8" "Iter9" "Iter10"
metrics_bayes %>% pull(.config) %>% unique()
#> [1] "Preprocessor1_Model1" "Preprocessor1_Model2" "Preprocessor1_Model3"
#> [4] "Preprocessor1_Model4" "Preprocessor1_Model5" "Iter1"
#> [7] "Iter2" "Iter3" "Iter4"
#> [10] "Iter5" "Iter6" "Iter7"
#> [13] "Iter8" "Iter9" "Iter10"
Created on 2022-11-03 with reprex v2.0.2
Hi,
please close if you disagree, but I wonder if users would appreciate the suggested change ... As-is, when you use tune_race_win_loss
but don't have that package installed, tuning will fail at the end, and the user will have lost their time ...
For tune_*
that call tune_grid()
, the verbose
option should be pass to control_grid()
and a different (perhaps new) option should be passed to the finetune function.
It seems that tune_race_anova doesn't work as expected as the following code:
system.time({ set.seed(2) svm_wflow %>% tune_race_anova(resamples = rs, grid = svm_grid) })
generate the following error:
Error in
parsnip::condense_control()
:
! Object of classcontrol_race
cannot be coerced to object of classcontrol_grid
.
β’ The following arguments are missing:
β’ 'backend_options
The previous code is copy pasted from the package's documentation (https://cran.r-project.org/web/packages/finetune/finetune.pdf). Attached you can find the whole code and my sessionInfo.
Increase coverage of alt-text in pictures, plots, etc; see https://posit.co/blog/knitr-fig-alt/ for examples.
Reproducible Example:
library(palmerpenguins)
split <- penguins %>%
initial_split()
folds <- split %>%
training() %>%
vfold_cv(v=5)
rec_xgb <-
recipe(bill_depth_mm ~ ., data = penguins) %>%
step_unknown(all_nominal_predictors(), new_level = "missing")
spec_xgb <-
boost_tree(mode = "regression", engine = "xgboost") %>%
set_args(
trees = tune(),
min_n = tune()
)
grid_xgb <-
grid_regular(
trees() %>% range_set(c(1, 200)),
min_n(),
levels = 3
)
tune_race_anova(
workflow(rec_xgb, spec_xgb),
folds,
grid = grid_xgb,
metrics = metric_set(rmse),
control = control_race(
verbose = T, verbose_elim = T,
burn_in = 2,
alpha = 0.05,
randomize = T
)
)
Warning: All models failed. See the `.notes` column.
i Racing will minimize the mae metric.
i Resamples are analyzed in a random order.
Error: There were no valid metrics for the ANOVA model.
Run `rlang::last_error()` to see where the error occurred.
With my actual dataset I do not run into the error when using random forest or support vector regression, for example. It only occurs when trying to tune an XGB model.
@topepo Any ideas on where this error may come from? π€
Hi There,
The Genetic Algorithm has been shown to achieve better results compared with the classic grid search for hyper-parameter tuning. It gains the optimal solutions of the objective function by selecting the best or fittest solution alongside the rare and random mutation occurrence.
I wonder if it is possible to include the Genetic Algorithm for hyper-parameter tuning? I believe this will greatly enhance the performance of machine learning models in tidymodels. Thank you.
Regards,
Yang
revdeps are ready to go!
ββ CHECK βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 2 packages ββ
β shinymodels 0.1.1 ββ E: 0 | W: 0 | N: 0
β workflowsets 1.0.1 ββ E: 0 | W: 0 | N: 0
OK: 2
BROKEN: 0
I'm trying to run tune_sim_anneal to search for better parameters for a xgboost model after having run a normal tune first.
Snippet of what the code looked like below:
folds <- vfold_cv(train, v = 5, strata = target)
rec <- recipe(target ~ ., data = train) %>%
update_role(customer_ID, new_role = "id") %>%
step_normalize(all_numeric_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_nzv() #%>%
xgb_model <- boost_tree(
trees = 2000,
learn_rate = tune(), # 0.0275,
min_n = tune(), #22L,
tree_depth = tune(), #11L,
mtry = tune(), #616L,
stop_iter = 20L
) %>%
set_engine('xgboost', event_level = "second",
tree_method = "gpu_hist", gpu_id = 0,
nthread = 10) %>%
set_mode('classification')
xgb_wf <- workflow() %>%
add_recipe(rec) %>%
add_model(xgb_model)
xgb_params <- xgb_wf %>%
extract_parameter_set_dials() %>%
update(mtry = mtry(range(600L, 1200L)),
tree_depth = tree_depth(range(4L, 8L)),
learn_rate = learn_rate(range(0.025, 0.035), trans = NULL),
min_n = min_n(range(15L, 30L))
)
prob_metrics <- metric_set(mean_coeff) # This is a custom metric I defined as instructed from the yardstick vignettes.
param_grid <- grid_latin_hypercube(xgb_params, size = 10)
xgb_tune <- xgb_wf %>%
tune_grid(
folds,
param_info = xgb_params,
grid = param_grid,
metrics = prob_metrics,
control = control_grid(
verbose = TRUE,
save_pred = TRUE,
save_workflow = TRUE,
event_level = "second"
)
)
xgb_race <- tune_sim_anneal(
object = xgb_wf,
resamples = folds,
iter = 20,
param_info = xgb_params,
metrics = prob_metrics,
initial = xgb_tune,
control = control_sim_anneal(
verbose = TRUE,
save_pred = TRUE,
save_workflow = TRUE,
event_level = "second"
)
)
The data and metric are coming from: https://www.kaggle.com/competitions/amex-default-prediction/overview/evaluation
The metric provides a different value depending on whether the event_level is first or second (in my case event_level is second).
The problem is that event_level = "second" does not seem to register with tune_sim_anneal().
This is the output I get after running collect_metrics():
While the parameters in the iteration process are not vastly different, the results are significantly and consistently worse.
As a test, I ran
xgb_race %>% collect_predictions() %>% group_by(.config) %>% mean_coeff(target, .pred_0)
to test whether the metric was applied incorrectly.
Indeed, for the initial set of tuning combinations (before the iterative process), the metric estimate was:
which leads me to believe the event_level is ignored in finetune.
At this point I don't think my custom metric has an issue, because I'd observe the same behaviour in tune_grid() and tune_sim_anneal(), but I could be wrong.
One additional point, even though I had the setup above (+ a couple set.seed() that I removed), when running collect_predictions() on xgb_race, I don't get the predictions for any of the iterations. Is this a different issue or is it because none of the solutions outperform the initial set of estimates provided by tune_grid()?
The following situation came up in tidymodels/dials#258
We make a grid with a wider range for a parameter (here mtry), use tune_grid()
to get some initial tuning results, and then use them as the initial
results to tune_sim_anneal()
.
However, we also give tune_sim_anneal()
a parameter set for param_info
- which has a smaller range for that parameter than went into the initial tuning results. Then the transformation back and forth to [0,1] of that parameter breaks and dials throws an internal error that doesn't point directly to the actual problem.
I'm wondering if this is expected behavior or if it generally should work. If the error is to be expected, maybe we can catch that error more elegantly?
library(tidymodels)
set.seed(1)
rf_spec <- rand_forest(mode = "regression", mtry = tune())
grid_with_bigger_range <- grid_latin_hypercube(mtry(range = c(1, 16)))
car_folds <- vfold_cv(car_prices, v = 2)
car_wflow <- workflow() %>%
add_formula(Price ~ .) %>%
add_model(rf_spec)
tune_res_with_bigger_range <- tune_grid(
car_wflow,
resamples = car_folds,
grid = grid_with_bigger_range
)
parameter_set_with_smaller_range <- parameters(mtry(range = c(1, 5)))
finetune::tune_sim_anneal(
car_wflow,
param_info = parameter_set_with_smaller_range,
resamples = car_folds,
initial = tune_res_with_bigger_range,
iter = 2
)
#> Optimizing rmse
#> Initial best: 2570.90000
#> Error in `.f()`:
#> ! Values should be on [0, 1].
#> βΉ This is an internal error that was detected in the dials package.
#> Please report it at <https://github.com/tidymodels/dials/issues> with a reprex (<https://https://tidyverse.org/help/>) and the full backtrace.
#> Backtrace:
#> β
#> 1. ββfinetune::tune_sim_anneal(...)
#> 2. ββfinetune:::tune_sim_anneal.workflow(...)
#> 3. β ββfinetune:::tune_sim_anneal_workflow(...)
#> 4. β ββ... %>% ...
#> 5. β ββfinetune:::new_in_neighborhood(...)
#> 6. β ββfinetune:::random_integer_neighbor(...)
#> 7. β ββfinetune:::sample_by_distance(...)
#> 8. β ββfinetune:::encode_set_backwards(candidates, pset)
#> 9. β ββpurrr::map2(pset$object, x, dials::encode_unit, direction = "backward")
#> 10. β ββdials (local) .f(.x[[1L]], .y[[1L]], ...)
#> 11. β ββdials:::encode_unit.quant_param(.x[[1L]], .y[[1L]], ...) at dials/R/encode_unit.R:23:2
#> 12. β ββrlang::abort("Values should be on [0, 1].", .internal = TRUE) at dials/R/encode_unit.R:50:6
#> 13. ββdplyr::mutate(., .config = paste0("iter", i), .parent = current_parent)
#> β Optimization stopped prematurely; returning current results.
#> # Tuning results
#> # 2-fold cross-validation
#> # A tibble: 2 Γ 5
#> splits id .metrics .notes .iter
#> <list> <chr> <list> <list> <int>
#> 1 <split [402/402]> Fold1 <tibble [6 Γ 5]> <tibble [0 Γ 3]> 0
#> 2 <split [402/402]> Fold2 <tibble [6 Γ 5]> <tibble [0 Γ 3]> 0
Created on 2022-11-04 with reprex v2.0.2
use_standalone("r-lib/rlang", "types-check")
in favor of home-grown argument checkers
Hello,
I am learning to use the package tidymodels and I found this great documentation . I was learning this part and when I run these codes:
library(finetune)
race_ctrl <-
control_race(
save_pred = TRUE,
parallel_over = "everything",
save_workflow = TRUE
)
race_results <-
all_workflows %>%
workflow_map(
"tune_race_anova",
seed = 1503,
resamples = concrete_folds,
grid = 25,
control = race_ctrl
)
I got the following messages:
boundary (singular) fit: see ?isSingular
i Creating pre-processing data to finalize unknown parameter: mtry
! Fold04, Repeat1: internal: A correlation computation is required, but `estimate` is constant and has 0 standard deviation, resulting in a divide by 0 error. `NA` wi...
! Fold04, Repeat1: internal: A correlation computation is required, but `estimate` is constant and has 0 standard deviation, resulting in a divide by 0 error. `NA` wi...
! Fold06, Repeat1: internal: A correlation computation is required, but `estimate` is constant and has 0 standard deviation, resulting in a divide by 0 error. `NA` wi...
! Fold06, Repeat1: internal: A correlation computation is required, but `estimate` is constant and has 0 standard deviation, resulting in a divide by 0 error. `NA` wi...
! Fold10, Repeat1: internal: A correlation computation is required, but `estimate` is constant and has 0 standard deviation, resulting in a divide by 0 error. `NA` wi...
! Fold10, Repeat1: internal: A correlation computation is required, but `estimate` is constant and has 0 standard deviation, resulting in a divide by 0 error. `NA` wi...
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
5: The `...` are not used in this function but one or more objects were passed: 'param'
What does it mean? By the way, this piece of code took around 1 hour and half. As explained in the documentation, I expected that it would be faster than the grid search but it took the same amount of time. I have normally installed the necessary packages. What is the reason?
2021
usethis::use_tidy_description()
usethis::use_tidy_dependencies()
usethis::use_tidy_github_actions()
and update artisanal actions to use setup-r-dependencies
cran-comments.md
Authors@R
of DESCRIPTION like so, if appropriate:person("RStudio", role = c("cph", "fnd"))
2022
usethis::use_tidy_coc()
master
--> main
issuesdevelopment
is mode: auto
in pkgdown configusethis::use_lifecycle()
2023
Necessary:
person(given = "Posit Software, PBC", role = c("cph", "fnd"))
use_mit_license()
use_tidy_logo()
usethis::use_tidy_coc()
usethis::use_tidy_github_actions()
Optional:
pak::pak("org/pkg")
over devtools::install_github("org/pkg")
in READMEuse_tidy_dependencies()
and/or replace compat files with use_standalone()
use_standalone("r-lib/rlang", "types-check")
instead of home grown argument checkersRelated snapshots are in test-survival-tune-eval-time-attribute.R
, test-survival-tune-sa.R
, and test-survival-tune_race_win_loss.R
library(tidymodels)
library(censored)
#> Loading required package: survival
library(finetune)
set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
mutate(event_time = Surv(time, event)) %>%
select(event_time, X1, X2)
set.seed(2)
split <- initial_split(sim_dat)
sim_tr <- training(split)
sim_te <- testing(split)
sim_rs <- vfold_cv(sim_tr)
time_points <- c(10, 1, 5, 15)
mod_spec <-
decision_tree(cost_complexity = tune()) %>%
set_mode("censored regression")
grid <- tibble(cost_complexity = 10^c(-10, -2, -1))
srv_mtrc <- metric_set(brier_survival)
set.seed(2193)
sa_res <-
mod_spec %>%
tune_sim_anneal(
event_time ~ X1 + X2,
sim_rs,
#initial = grid_res,
iter = 2,
metrics = srv_mtrc,
eval_time = time_points,
control = control_sim_anneal(verbose_iter = FALSE)
)
#> Warning in tune_sim_anneal_workflow(wflow, resamples = resamples, iter = iter, : 4 evaluation times are available; the first will be used (i.e. `eval_time =
#> 10`).
set.seed(2193)
anova_res <-
mod_spec %>%
tune_race_anova(
event_time ~ X1 + X2,
sim_rs,
grid = grid,
metrics = srv_mtrc,
eval_time = time_points
)
#> Warning in tune_race_anova_workflow(wflow, resamples = resamples, grid = grid, : 4 evaluation times are available; the first will be used (i.e. `eval_time =
#> 10`).
set.seed(2193)
wl_res <-
mod_spec %>%
tune_race_win_loss(
event_time ~ X1 + X2,
sim_rs,
grid = grid,
metrics = srv_mtrc,
eval_time = time_points
)
#> Warning in tune_race_win_loss_workflow(wflow, resamples = resamples, grid = grid, : 4 evaluation times are available; the first will be used (i.e. `eval_time =
#> 10`).
Created on 2024-01-22 with reprex v2.0.2
With verbose_elim = TRUE
in racing, nothing is printed until the candidate models are resampled 3 times. This can make it feel like the verbosity setting "didn't work." It may be worth printing something from the get-go affirming that eliminations will be logged once the models are resampled 3x. :)
In situations when one wants to analyze the intermediate results of a race, one shouldn't be required to know the internal data structure of the tune package and be able to use some function like collect_race or similar.
Here's an example of what we get with the current functions and what I expect to get.
project_name <- "sliced-s01e09-playoffs-1"
output_dir <- here::here(project_name, "data")
dir.create(file.path(output_dir), showWarnings = FALSE, recursive = TRUE)
kaggler::kgl_competitions_data_download_all(project_name, output_dir = output_dir)
library(tidyverse)
library(tidymodels)
library(finetune)
options(readr.show_col_types = FALSE)
theme_set(theme_light())
train_raw <- read_csv(here::here(output_dir, "train.csv"))
set.seed(123)
bb_split <- train_raw %>%
mutate(
is_home_run = if_else(as.logical(is_home_run), "HR", "no"),
is_home_run = factor(is_home_run)
) %>%
na.omit() %>%
sample_n(5000) %>%
initial_split(strata = is_home_run)
bb_train <- training(bb_split)
bb_test <- testing(bb_split)
set.seed(234)
bb_folds <- vfold_cv(bb_train, strata = is_home_run, v = 10)
bb_rec <-
recipe(is_home_run ~ launch_angle + launch_speed + plate_x + plate_z +
bb_type + bearing + pitch_mph +
is_pitcher_lefty + is_batter_lefty +
inning + balls + strikes + game_date,
data = bb_train
) %>%
step_date(game_date, features = c("week"), keep_original_cols = FALSE) %>%
step_unknown(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors())
xgb_wf <- bb_rec %>%
workflow(
boost_tree(
mode = "classification",
trees = tune(),
min_n = tune(),
mtry = tune(),
learn_rate = tune(),
tree_depth = tune(),
loss_reduction = tune(),
sample_size = tune()
) %>%
set_engine("xgboost", counts = FALSE)
)
set.seed(123)
xgb_grid <- xgb_wf %>%
extract_parameter_set_dials() %>%
update(
trees = trees(c(100, 100)),
min_n = min_n(c(1, 300)),
mtry = mtry_prop(c(0.1, 0.4)),
learn_rate = learn_rate(c(0.3, 0.3)),
tree_depth = tree_depth(c(2, 6)),
loss_reduction = loss_reduction(c(0, 0), trans = NULL),
sample_size = sample_prop(c(0.4, 0.8))
) %>%
grid_max_entropy(size = 10)
cores <- parallelly::availableCores(omit = 15)
if(cores > 1) {
print(paste("Using", cores, "cores"))
doParallel::registerDoParallel(cores)
}
#> [1] "Using 5 cores"
set.seed(345)
xgb_rs <- tune_race_anova(
xgb_wf,
resamples = bb_folds,
grid = xgb_grid,
metrics = metric_set(mn_log_loss),
control = control_race(verbose_elim = TRUE)
)
#> βΉ Racing will minimize the mn_log_loss metric.
#> βΉ Resamples are analyzed in a random order.
#> βΉ Fold10: 8 eliminated; 2 candidates remain.
#>
#> βΉ Fold07: All but one parameter combination were eliminated.
if(cores > 1) {
doParallel::stopImplicitCluster()
}
xgb_rs %>% show_best(metric = "mn_log_loss")
#> # A tibble: 1 Γ 13
#> mtry trees min_n tree_depth learn_rate loss_reduction sample_size .metric
#> <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 0.277 100 47 5 2.00 0 0.632 mn_log_loss
#> # βΉ 5 more variables: .estimator <chr>, mean <dbl>, n <int>, std_err <dbl>,
#> # .config <chr>
# # A tibble: 1 Γ 13
# mtry trees min_n tree_depth learn_rate loss_reduction sample_size .metric .estimator mean n std_err .config
# <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
# 1 0.277 100 47 5 2.00 0 0.632 mn_log_loss binary 0.107 10 0.00630 Preprocessor1_Model06
xgb_rs %>% collect_metrics(summarize = FALSE) %>% arrange(.estimate)
#> # A tibble: 10 Γ 12
#> id mtry trees min_n tree_depth learn_rate loss_reduction sample_size
#> <chr> <dbl> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 Fold07 0.277 100 47 5 2.00 0 0.632
#> 2 Fold04 0.277 100 47 5 2.00 0 0.632
#> 3 Fold01 0.277 100 47 5 2.00 0 0.632
#> 4 Fold08 0.277 100 47 5 2.00 0 0.632
#> 5 Fold10 0.277 100 47 5 2.00 0 0.632
#> 6 Fold09 0.277 100 47 5 2.00 0 0.632
#> 7 Fold05 0.277 100 47 5 2.00 0 0.632
#> 8 Fold02 0.277 100 47 5 2.00 0 0.632
#> 9 Fold06 0.277 100 47 5 2.00 0 0.632
#> 10 Fold03 0.277 100 47 5 2.00 0 0.632
#> # βΉ 4 more variables: .metric <chr>, .estimator <chr>, .estimate <dbl>,
#> # .config <chr>
xgb_rs %>%
dplyr::select(id, .order, .metrics) %>%
tidyr::unnest(cols = .metrics) %>%
dplyr::group_by(!!!rlang::syms(attributes(xgb_rs)$parameters$id), .metric, .estimator) %>%
dplyr::summarize(
mean = mean(.estimate, na.rm = TRUE),
n = sum(!is.na(.estimate)),
std_err = sd(.estimate, na.rm = TRUE) / sqrt(n),
.groups = "drop"
) %>%
arrange(mean) %>%
print(n = Inf)
#> # A tibble: 10 Γ 12
#> mtry trees min_n tree_depth learn_rate loss_reduction sample_size .metric
#> <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 0.277 100 47 5 2.00 0 0.632 mn_log_loβ¦
#> 2 0.310 100 68 4 2.00 0 0.754 mn_log_loβ¦
#> 3 0.256 100 107 2 2.00 0 0.527 mn_log_loβ¦
#> 4 0.353 100 63 3 2.00 0 0.540 mn_log_loβ¦
#> 5 0.343 100 67 5 2.00 0 0.615 mn_log_loβ¦
#> 6 0.304 100 264 4 2.00 0 0.751 mn_log_loβ¦
#> 7 0.207 100 158 4 2.00 0 0.418 mn_log_loβ¦
#> 8 0.115 100 120 4 2.00 0 0.629 mn_log_loβ¦
#> 9 0.198 100 98 2 2.00 0 0.534 mn_log_loβ¦
#> 10 0.137 100 209 4 2.00 0 0.725 mn_log_loβ¦
#> # βΉ 4 more variables: .estimator <chr>, mean <dbl>, n <int>, std_err <dbl>
Created on 2024-01-18 with reprex v2.1.0.9000
Hi Max et. al,
I'm getting a weird error in tune_race_win_loss()
where I get down to two sets of parameters and then it errors out and I see this:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
I tried to reproduce it in a reprex
but couldn't (see below) so I know it's not much to go off of. Here's the code I'm using:
tune_sandbox_model <- function(model_spec, model_params) {
rec <- recipe(status ~ ., data = train) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_knnimpute(-all_outcomes())
wf <- workflow() %>%
add_recipe(rec) %>%
add_model(model_spec)
registerDoParallel(cores = 5)
tuning_results <- tune_race_win_loss(
wf,
resamples = train_folds,
param_info = model_params,
metrics = metric_set(roc_auc),
grid = 10,
control = control_race(verbose = T,verbose_elim = T, allow_par = T,
save_pred = F, burn_in = 2, alpha = .05, randomize = T,
event_level = 'first', save_workflow = F)
)
return(tuning_results)
}
I'm passing a model spec and a parameter grid into the function like this:
svm_spec <- svm_rbf(
cost = tune(),
rbf_sigma = tune()
) %>%
set_engine("kernlab") %>%
set_mode("classification")
svm_params <- parameters(
cost(range = c(0, 50), trans = NULL),
rbf_sigma(range = c(0, 2), trans = NULL)
)
And it's giving me that error I'm seeing. Not sure what might be causing this or if it's even actually a finetune
issue vs. an issue somewhere else in Tidymodels
(or outside of Tidymodels
), but figured I'd give it a shot.
It seems like I can fix this by just increasing grid = 2
to something much bigger so we get through the resamples before we hit that number, but should this error our more informatively?
Many thanks for any pointers! Love the package and the talk at rstudio::global by the way!
I'm having trouble with finetunes generating out of range values for tune_sim_anneal.
This tune_sim_anneal errors, as it creates a out of range value for tree_depth:
mtcars
set.seed(123)
car_rec <-
recipe(mpg ~ ., data = mtcars) %>%
step_normalize(all_predictors())
folds <- vfold_cv(mtcars, v = 3)
model <-
boost_tree(tree_depth = tune()
) %>%
set_engine("xgboost") %>%
set_mode("regression")
parameters <- parameters(list(
tree_depth(range = c(2, 3))
)
)
res <- tune_sim_anneal(model, car_rec,
resamples = resamples,
iter = 3,
param_info = parameters)
reprex::reprex(si = TRUE)
The real problem lies in random_integer_neighbor_calc
library(tidymodels)
library(finetune)
set.seed(123)
parameters <- parameters(list(tree_depth(range = c(2, 3))))
finetune:::random_integer_neighbor_calc(tibble(tree_depth = 3),
parameters, 0.75, FALSE)
reprex::reprex(si = TRUE)
as only one value is allowed in
the fix would be:
random_integer_neighbor_calc <- function(current, pset, prob, change) {
change_val <- runif(nrow(pset)) <= prob
if (change & !any(change_val)) {
change_val[sample(seq_along(change_val), 1)] <- TRUE
}
if (any(change_val)) {
param_change <- pset$id[change_val]
for(i in param_change) {
prm <- pset$object[[which(pset$id == i)]]
prm_rng <- prm$range$upper - prm$range$lower
tries <- min(prm_rng + 1, 500)
pool <- dials::value_seq(prm, n = tries)
smol_range <- floor(prm_rng/10) + 1
val_diff <- abs(current[[i]] - pool)
pool <- pool[val_diff <= smol_range & val_diff > 0]
if(length(pool) > 1) {
current[[i]] <- sample(pool, 1)
} else if (length(pool) > 1) {
current[[i]] <- pool
}
}
}
current
}
The problem is the sample(pool,1) function. If pool is an array, it chooses randomly one element from the array. If it is just an integer (length 1), then it chooses a value between 1 and the integer value. This creates out of range values.
As this is my first bug report and contribution to an open source project, I'd like to fix it and create a pull request by myself. Hopefully it works. Can I just add my own branch and push it to remote and create a pull request?
Seeing a good few of these in the most recent CI runs:
ββ Warning ('test-win-loss-overall.R:80'): one player is really bad ββββββββββββ
Each row in `x` is expected to match at most 1 row in `y`.
i Row 1 of `x` matches multiple rows.
i If multiple matches are expected, set `multiple = "all"` to silence this warning.
Backtrace:
1. finetune::tune_race_win_loss(...)
at test-win-loss-overall.R:80:2
23. dplyr (local) `<fn>`(`<vctrs___>`)
24. dplyr:::rethrow_warning_join_matches_multiple(cnd, error_call)
25. dplyr:::warn_join(...)
26. dplyr:::warn_dplyr(...)
Prepare for release:
git pull
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
git push
usethis::use_github_release()
usethis::use_dev_version()
git push
Prepare for release:
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
Hi,
I noticed this error to appear only when one of the parameters for tuning is mtry
. The reprex
below covers both models.
suppressPackageStartupMessages({
library(tidyverse)
library(tidymodels)
library(finetune)
})
data(two_class_dat, package = "modeldata")
set.seed(5046)
bt <- bootstraps(two_class_dat, times = 5)
rec_example <- recipe(Class ~ ., data = two_class_dat)
# RF
model_rf <- rand_forest(mtry = tune()) %>%
set_mode("classification") %>%
set_engine("ranger")
wf_rf <- workflow() %>%
add_model(model_rf) %>%
add_recipe(rec_example)
set.seed(30)
rf_res <- wf_rf %>%
tune_grid(resamples = bt, grid = 4)
#> i Creating pre-processing data to finalize unknown parameter: mtry
set.seed(40)
rf_res_finetune <- wf_rf %>%
tune_sim_anneal(resamples = bt, initial = rf_res)
#> Optimizing roc_auc
#> Initial best: 0.84957
#> Error in prm$range$upper - prm$range$lower: non-numeric argument to binary operator
#> x Optimization stopped prematurely; returning current results.
# XGB
model_xgb <- boost_tree(mtry = tune()) %>%
set_mode("classification") %>%
set_engine("xgboost")
wf_xgb <- workflow() %>%
add_model(model_xgb) %>%
add_recipe(rec_example)
set.seed(30)
xgb_res <- wf_xgb %>%
tune_grid(resamples = bt, grid = 4)
#> i Creating pre-processing data to finalize unknown parameter: mtry
set.seed(40)
xgb_res_finetune <- wf_xgb %>%
tune_sim_anneal(resamples = bt, initial = xgb_res)
#> Optimizing roc_auc
#> Initial best: 0.85281
#> Error in prm$range$upper - prm$range$lower: non-numeric argument to binary operator
#> x Optimization stopped prematurely; returning current results.
Created on 2021-04-01 by the reprex package (v1.0.0)
Prepare for release:
git pull
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
git push
usethis::use_github_release()
usethis::use_dev_version()
git push
Hi, I am now getting the following error when using finetune, which I wasn't getting when I last used it a couple of months ago - "Error in parsnip::condense_control()
:
! Object of class control_race
cannot be coerced to object of class control_grid
.
β’ The following arguments are missing:
β’ 'backend_options'
This occurs with my own data, or when copying the examples here or here
I have defined a custom metric (partial ROC AUC) myself, the code is as follows:
# Load packages
library(tidyverse)
library(tidymodels)
library(modeldata)
library(finetune)
library(baguette)
library(doParallel)
ncores = round(parallel::detectCores()/3)
# Logic for `event_level`
event_col <- function(truth, event_level) {
if (identical(event_level, "first")) {
levels(truth)[1]
} else {
levels(truth)[2]
}
}
pauc_impl <- function(truth, estimate, estimator = 'binary', event_level) {
if(estimator == "binary") {
level_case = event_col(truth = truth, event_level = event_level)
level_control = setdiff(levels(truth), level_case)
result = pROC::roc(estimate,
response = truth,
levels = c(level_control, level_case),
partial.auc = c(0.8,1),
partial.auc.focus = "sensitivity")
pauc_value = as.numeric(result$auc)
}
return(pauc_value)
}
pauc_vec <- function(truth,
estimate,
estimator = NULL,
na_rm = TRUE,
case_weights = NULL,
event_level = "first",
...) {
# calls finalize_estimator_internal() internally
estimator <- finalize_estimator(truth, estimator, metric_class = "pauc")
check_prob_metric(truth, estimate, case_weights, estimator)
if (na_rm) {
result <- yardstick_remove_missing(truth, estimate, case_weights)
truth <- result$truth
estimate <- result$estimate
case_weights <- result$case_weights
} else if (yardstick_any_missing(truth, estimate, case_weights)) {
return(NA_real_)
}
pauc_impl(truth, estimate, estimator, event_level)
}
pauc <- function(data, ...) {
UseMethod("pauc")
}
pauc <- new_prob_metric(pauc, direction = "maximize")
pauc.data.frame <- function(data,
truth,
estimate,
estimator = NULL,
na_rm = TRUE,
case_weights = NULL,
event_level = "first",
options = list()) {
prob_metric_summarizer(
name = "pauc",
fn = pauc_vec,
data = data,
truth = !!enquo(truth),
!!enquo(estimate),
estimator = estimator,
na_rm = na_rm,
case_weights = !!enquo(case_weights),
event_level = event_level,
fn_options = list(options = options)
)
}
I can use defined metric function pauc
on the example:
pauc(data = two_class_example,truth = truth,Class1)
The results are as follows:
> pauc(data = two_class_example,truth = truth,Class1)
Setting direction: controls < cases
# A tibble: 1 Γ 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 pauc binary 0.149
Then, I used tune_race_anova
to tune the bag_tree
model.
set.seed(123)
data("lending_club", package = "modeldata")
split <- initial_split(lending_club, strata = Class)
train <- training(split)
test <- testing(split)
fold = vfold_cv(data = train,v = 10,strata = Class)
rec <- recipe(Class ~ ., train) %>%
step_normalize(all_numeric())
mod <- bag_tree(tree_depth = tune()) %>%
set_engine("rpart") %>%
set_mode("classification")
wf_set <- workflow_set(
preproc = list(base = rec),
models = list(bag = mod),
cross = TRUE)
When not using parallel computation, using the defined pauc
metric works correctly:
race_result = workflow_map(wf_set,
fn = 'tune_race_anova',
resamples = fold,
grid = 5,
metrics = metric_set(pauc))
race_result %>%
extract_workflow_set_result(id = 'base_bag') %>%
show_best(metric = 'pauc')
> race_result %>%
+ extract_workflow_set_result(id = 'base_bag') %>%
+ show_best(metric = 'pauc')
# A tibble: 1 Γ 7
tree_depth .metric .estimator mean n std_err .config
<int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 6 pauc binary 0.0647 10 0.00621 Preprocessor1_Model2
However, when I use parallel computation, an error occurs:
cl = makePSOCKcluster(ncores)
registerDoParallel(cl)
race_result = workflow_map(wf_set,
fn = 'tune_race_anova',
resamples = fold,
grid = 5,
metrics = metric_set(pauc))
stopCluster(cl)
Warning message:
All models failed. Run `show_notes(.Last.tune.result)` for more information.
> show_notes(.Last.tune.result)
unique notes:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Error in `metric_set()`:
! Failed to compute `pauc()`.
Caused by error in `UseMethod()`:
! no applicable method for 'pauc' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"
When I use roc_auc
as the metric for parallel hyperparameter tuning, everything works fine. Therefore, I believe the source of the error is in the parallel computation.
Use cli errors in favor of rlang / home-grown machinery.
Hi. I encountered an error using finetune. It happens when I try to set options for certain metrics.
I used the same function in the document for metric_set
.
The metric function works fine in tune_grid
, but it fails when I try to use tune_sim_anneal
and tune_race_anova
.
Thanks in advance!
library(tidymodels)
library(finetune)
data(ames)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))
set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
ames_folds <- vfold_cv(ames_train, v = 10)
ames_rec <-
recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type +
Latitude + Longitude, data = ames_train) %>%
step_log(Gr_Liv_Area, base = 10) %>%
step_other(Neighborhood, threshold = 0.01) %>%
step_dummy(all_nominal_predictors()) %>%
step_interact( ~ Gr_Liv_Area:starts_with("Bldg_Type_") ) %>%
step_ns(Latitude, Longitude, deg_free = 20)
rf_model <-
rand_forest(trees = tune()) %>%
# rand_forest(trees = 1000) %>%
set_engine("ranger") %>%
set_mode("regression")
rf_wflow <-
workflow() %>%
add_formula(
Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type +
Latitude + Longitude) %>%
add_model(rf_model)
grid <- parameters(trees(c(10, 100))) %>%
grid_max_entropy(size = 10)
ccc_with_bias <- function(data, truth, estimate, na_rm = TRUE, ...) {
ccc(
data = data,
truth = !!rlang::enquo(truth),
estimate = !!rlang::enquo(estimate),
# set bias = TRUE
bias = TRUE,
na_rm = na_rm,
...
)
}
# Use `new_numeric_metric()` to formalize this new metric function
ccc_with_bias <- new_numeric_metric(ccc_with_bias, "maximize")
model_metric <- metric_set(ccc_with_bias)
tune_res <- tune_grid(
rf_wflow,
ames_folds,
grid = grid,
metrics = model_metric
)
tune_res_anova <- tune_race_anova(
rf_wflow,
ames_folds,
grid = grid,
metrics = model_metric
)
#> Warning in max(best_config$B, na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
tune_res_anneal <- tune_sim_anneal(
rf_wflow,
ames_folds,
metrics = model_metric
)
#> Optimizing ccc_with_bias
#> Warning in max(which(x$global_best)): no non-missing arguments to max;
#> returning -Inf
#> Warning in max(x$.iter): no non-missing arguments to max; returning -Inf
#> Warning in max(x$mean[x$.iter == 0], na.rm = TRUE): no non-missing arguments to
#> max; returning -Inf
#> Initial best: -Inf
#> Error in 1:prev_ind: argument of length 0
#> β Optimization stopped prematurely; returning current results.
Created on 2023-05-23 with reprex v2.0.2
Same as tidymodels/tune#912
library(tidymodels)
library(finetune)
set.seed(6735)
folds <- vfold_cv(mtcars, v = 5)
spline_rec <-
recipe(mpg ~ ., data = mtcars) %>%
step_spline_natural(disp, deg_free = tune("disp")) %>%
step_spline_natural(wt, deg_free = tune("wt"))
lin_mod <-
linear_reg() %>%
set_engine("lm")
spline_wflow <- workflow(spline_rec, lin_mod)
spline_grid <- expand.grid(disp = 2:5, wt = 2:5)
spline_res <-
spline_wflow %>%
tune_sim_anneal(spline_rec, resamples = folds)
#> Warning: The `...` are not used in this function but one or more objects were
#> passed: ''
#> Optimizing rmse
#> Initial best: 2.64170
#> 1 β― accept suboptimal rmse=3.5264 (+/-0.5527)
#> 2 β― accept suboptimal rmse=5.7856 (+/-0.8672)
#> 3 + better suboptimal rmse=4.1851 (+/-0.5124)
#> 4 β discard suboptimal rmse=5.5377 (+/-0.7899)
#> 5 + better suboptimal rmse=3.0792 (+/-0.4582)
#> 6 β discard suboptimal rmse=5.2613 (+/-0.4338)
#> 7 β discard suboptimal rmse=3.6311 (+/-0.5161)
#> 8 β restart from best rmse=3.0093 (+/-0.4751)
#> 9 β discard suboptimal rmse=3.2367 (+/-0.3523)
#> 10 β discard suboptimal rmse=4.3919 (+/-0.6772)
Created on 2024-06-24 with reprex v2.1.0
We pick the first metric and, if it is a dynamic metric, the first eval time. Using verbose_iter = TRUE
will say which metric was chosen but not the time.
library(tidymodels)
library(finetune)
library(censored)
#> Loading required package: survival
set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
mutate(event_time = Surv(time, event)) %>%
select(event_time, X1, X2)
set.seed(2)
split <- initial_split(sim_dat)
sim_tr <- training(split)
sim_te <- testing(split)
sim_rs <- bootstraps(sim_tr, times = 4)
time_points <- c(10, 1, 5, 15)
mod_spec <-
decision_tree(cost_complexity = tune()) %>%
set_mode("censored regression")
grid <- tibble(cost_complexity = 10^c(-10, -2, -1))
gctrl <- control_grid(save_pred = TRUE)
rctrl <- control_race(save_pred = TRUE, verbose_elim = TRUE, verbose = FALSE)
dyn_mtrc <- metric_set(brier_survival)
set.seed(2193)
aov_dyn_res <-
mod_spec %>%
tune_race_anova(
event_time ~ X1 + X2,
resamples = sim_rs,
grid = grid,
metrics = dyn_mtrc,
eval_time = time_points,
control = rctrl
)
#> βΉ Racing will minimize the brier_survival metric.
#> βΉ Resamples are analyzed in a random order.
#> βΉ Bootstrap4: 0 eliminated; 3 candidates remain.
Created on 2023-11-13 with reprex v2.0.2
Prepare for release:
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.