Comments (5)
@magrenimish Thank you for creating this issue and bringing this to our attention. AutoML should have failed with a nicer message, e.g., No model was trained.
. GBM requires more data in order to be trained as mentioned in the warning The dataset size is too small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only 111.0.
.
from h2o-3.
@tomasfryda would it be possible to then skip or exclude the GBM algorithm with H2O AutoML without explicitly specifying it with the 'exclude_algos' parameter?
For example: With the following code:
fr = h2o.create_frame(rows=111, cols=29, real_fraction=1.0, categorical_fraction=0, has_response=True, response_factors=2, seed=12345, missing_fraction=0.0) aml = H2OAutoML(max_runtime_secs=10000) aml.train(x=fr.columns[:-1], y=fr.columns[-1], training_frame=fr) h2o.shutdown()
The function fails with GBM, but would it be possible to skip GBM in this case?
from h2o-3.
@magrenimish that's basically what should happen. AutoML doesn't want to know about underlying constraints of individual models so first each model runs its parameter/training data validation logic and if that fails, the model won't train. The validation logic is also responsible for emitting the warning to inform the user of what went wrong (e.g. The dataset size is too small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only 111.0.
).
It's hard to exclude automatically whole class of models since each model in AutoML has different parameters and the failures are often dependent on the parameters.
from h2o-3.
@tomasfryda thank you! So if I want the AutoML function to continue without the GBM algorithm, then I either have to explicitly exclude it with the 'exclude_algos' parameter or catch the specific error and skip the algorithm?
from h2o-3.
@magrenimish you can just ignore the warning.
When I run your code, I can still get the automl to train and it looks some GBMs have parameters that enable training with low amount of data:
In [3]: fr = h2o.create_frame(rows=111, cols=29, real_fraction=1.0, categorical_fraction=0, has_response=True, response_factors=2, seed=12345, miss
...: ing_fraction=0.0)
In [6]: from h2o.automl import H2OAutoML
In [7]: aml = H2OAutoML(max_runtime_secs=100)
In [8]: aml.train(x=fr.columns[:-1], y=fr.columns[-1], training_frame=fr)
AutoML progress: |▉ | 1%
16:41:20.27: _min_rows param, The dataset size is too small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only 111.0.
AutoML progress: |███████████████████████████████████████████████████████████████████████████████████ (done)| 100%
In [9]: aml.leaderboard
Out[9]:
model_id rmse mse mae rmsle mean_residual_deviance
------------------------------------------------------------------------ ------- ------- ------- ------- ------------------------
GBM_grid_1_AutoML_1_20240318_164118_model_49 56.1927 3157.62 49.7652 nan 3157.62
GBM_grid_1_AutoML_1_20240318_164118_model_10 56.364 3176.9 49.8405 nan 3176.9
GBM_grid_1_AutoML_1_20240318_164118_model_8 56.4429 3185.8 49.9569 nan 3185.8
GBM_grid_1_AutoML_1_20240318_164118_model_17 56.459 3187.62 50.0887 nan 3187.62
GBM_grid_1_AutoML_1_20240318_164118_model_52 56.472 3189.08 49.9436 nan 3189.08
GBM_grid_1_AutoML_1_20240318_164118_model_21 56.5403 3196.8 49.9926 nan 3196.8
GBM_grid_1_AutoML_1_20240318_164118_model_46 56.6061 3204.25 50.4635 nan 3204.25
StackedEnsemble_BestOfFamily_5_AutoML_1_20240318_164118 56.6463 3208.8 50.668 nan 3208.8
GBM_grid_1_AutoML_1_20240318_164118_model_32 56.8033 3226.62 50.2397 nan 3226.62
XGBoost_lr_search_selection_AutoML_1_20240318_164118_select_grid_model_6 56.8386 3230.63 50.8662 nan 3230.63
[166 rows x 6 columns]
from h2o-3.
Related Issues (20)
- Appendix m: updating user guide page to adhere to style guide (max_abs_leafnode_pred, max_active_predictors, max_after_balance_size, max_depth, max_iterations, max_models, max_runtime_secs, max_runtime_secs_per_model, metalearner_algorithm, metalearner_params, metalearner_transform, min_prob, min_rows, min_sdev, min_split_improvement, missing_values_handling, model_id, monotone_constraints, mtries) HOT 1
- Appendix n/o/p: updating user guide page to adhere to style guide (nbins, nbins_cats, nbins_top_level, nfolds, nlambdas, noise, non_negative, ntrees, objective_epsilon, offset_column, out_of_bounds, pca_impl, pca_method, plug_values, pred_noise_bandwidth, prior) HOT 1
- Appendix q/r/s: updating user guide page to adhere to style guide (quantile_alpha, rand_family, random_columns, rate, rate_annealing, rate_decay, remove_collinear_columns, sample_rate, sample_rate_per_class, sample_size, score_each_iteration, score_tree_interval, seed, single_node_mode, smoothing, solver, sort_metric, standardize, start_column, stop_column, stopping_metric, stopping_rounds, stopping_tolerance, stratify_by) HOT 1
- Appendix t/u/v/w/x/y: updating user guide page to adhere to style guide (theta, ties, training_frame, transform, treatment_column, tweedie_link_power, tweedie_power, tweedie_variance_power, uplift_metric, upload_custom_distribution, upload_custom_metric, use_all_factor_levels, user_points, validation_frame, weights_column, x, y) HOT 1
- Implement UMAP
- Implement HDBSCAN
- Job request failed Local server has died unexpectedly. RIP., will retry after 3s HOT 2
- Fix plotting in explain: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show()
- List tests that needed to be manually verified when changing plotting actions in Python for explain function HOT 1
- Fix as_data_frame and not use csv as a medium HOT 1
- Add use_multi_thread for as_data_frame
- Bug in ICE Plot with R 4.4
- Add support for Websockets to steam.jar
- R 4.4 warning `Did you mean to use "<<-"? ( in method "get_model" for class "models_info")` HOT 1
- Upload H2O-3 3.46.0.3 to CRAN
- Bug in GBM python example
- 3.46.0.3 Release Notes
- Overview video for H2O-3 like DAI
- Make sure H2O-3 runs with both new and older Numpy
- Add to Jenkins test: checking that we can connect to the websocket endpoint.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2o-3.