Giter Club home page Giter Club logo

Comments (5)

tomasfryda avatar tomasfryda commented on June 12, 2024

@magrenimish Thank you for creating this issue and bringing this to our attention. AutoML should have failed with a nicer message, e.g., No model was trained.. GBM requires more data in order to be trained as mentioned in the warning The dataset size is too small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only 111.0..

from h2o-3.

magrenimish avatar magrenimish commented on June 12, 2024

@tomasfryda would it be possible to then skip or exclude the GBM algorithm with H2O AutoML without explicitly specifying it with the 'exclude_algos' parameter?
For example: With the following code:
fr = h2o.create_frame(rows=111, cols=29, real_fraction=1.0, categorical_fraction=0, has_response=True, response_factors=2, seed=12345, missing_fraction=0.0) aml = H2OAutoML(max_runtime_secs=10000) aml.train(x=fr.columns[:-1], y=fr.columns[-1], training_frame=fr) h2o.shutdown()
The function fails with GBM, but would it be possible to skip GBM in this case?

from h2o-3.

tomasfryda avatar tomasfryda commented on June 12, 2024

@magrenimish that's basically what should happen. AutoML doesn't want to know about underlying constraints of individual models so first each model runs its parameter/training data validation logic and if that fails, the model won't train. The validation logic is also responsible for emitting the warning to inform the user of what went wrong (e.g. The dataset size is too small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only 111.0.).

It's hard to exclude automatically whole class of models since each model in AutoML has different parameters and the failures are often dependent on the parameters.

from h2o-3.

magrenimish avatar magrenimish commented on June 12, 2024

@tomasfryda thank you! So if I want the AutoML function to continue without the GBM algorithm, then I either have to explicitly exclude it with the 'exclude_algos' parameter or catch the specific error and skip the algorithm?

from h2o-3.

tomasfryda avatar tomasfryda commented on June 12, 2024

@magrenimish you can just ignore the warning.

When I run your code, I can still get the automl to train and it looks some GBMs have parameters that enable training with low amount of data:

In [3]: fr = h2o.create_frame(rows=111, cols=29, real_fraction=1.0, categorical_fraction=0, has_response=True, response_factors=2, seed=12345, miss
   ...: ing_fraction=0.0)

In [6]: from h2o.automl import H2OAutoML

In [7]: aml = H2OAutoML(max_runtime_secs=100)

In [8]: aml.train(x=fr.columns[:-1], y=fr.columns[-1], training_frame=fr)
AutoML progress: ||   1%
16:41:20.27: _min_rows param, The dataset size is too small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only 111.0.

AutoML progress: |███████████████████████████████████████████████████████████████████████████████████ (done)| 100%

In [9]: aml.leaderboard
Out[9]:
model_id                                                                     rmse      mse      mae    rmsle    mean_residual_deviance
------------------------------------------------------------------------  -------  -------  -------  -------  ------------------------
GBM_grid_1_AutoML_1_20240318_164118_model_49                              56.1927  3157.62  49.7652      nan                   3157.62
GBM_grid_1_AutoML_1_20240318_164118_model_10                              56.364   3176.9   49.8405      nan                   3176.9
GBM_grid_1_AutoML_1_20240318_164118_model_8                               56.4429  3185.8   49.9569      nan                   3185.8
GBM_grid_1_AutoML_1_20240318_164118_model_17                              56.459   3187.62  50.0887      nan                   3187.62
GBM_grid_1_AutoML_1_20240318_164118_model_52                              56.472   3189.08  49.9436      nan                   3189.08
GBM_grid_1_AutoML_1_20240318_164118_model_21                              56.5403  3196.8   49.9926      nan                   3196.8
GBM_grid_1_AutoML_1_20240318_164118_model_46                              56.6061  3204.25  50.4635      nan                   3204.25
StackedEnsemble_BestOfFamily_5_AutoML_1_20240318_164118                   56.6463  3208.8   50.668       nan                   3208.8
GBM_grid_1_AutoML_1_20240318_164118_model_32                              56.8033  3226.62  50.2397      nan                   3226.62
XGBoost_lr_search_selection_AutoML_1_20240318_164118_select_grid_model_6  56.8386  3230.63  50.8662      nan                   3230.63
[166 rows x 6 columns]

from h2o-3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.