minimaxir / automl-gs Goto Github PK

View Code? Open in Web Editor NEW

1.8K 1.8K 177.0 572 KB

Provide an input CSV and a target field to predict, generate a model + code to run it.

License: MIT License

Python 100.00%

automl keras machine-learning python tensorflow xgboost

automl-gs's People

Contributors

Stargazers

Watchers

Forkers

mwendamugendi robinsingh1 phillynch7 a-ozbek loretoparisi krzynio tekanic hbcbh1999 ekzhu pplonski drien dantodor hhy5277 abernard francishero junk13 ideaplexus owcsx katakwar86 kubatyszko shaunstanislauslau awesome-archive dhaneshkk spencerx zhiliangpersonal anil3a thaneacheron tspannhw mbrukman avinregmi cwlseu sagar19raorane rahulsoibam tokiran hunglethanh9 billynegwoo abodacs thebluesmoke gatarelib abeusher lrwm3 evan-burke canslove hieuqtran nursnaaz labxtreme yangpingyan laxminarayen rosssong terodea sandeepgupta1984 cxz chetanseth doojin88 santiblanko skumarbigdata avinash-mishra ssameerr adolfoeliazat awkepler yarenty tayiorbeii slzeroth vaibagga rajacsp sakshamio mmejdoubi jaykimbravekjh raphaelmansuy mkhoin praveenjoshi01 manikant92 bhuiyanmobasshir94 clorton drroad johngian rogervaas emrul maravedi dannyydt jzlouie jitkasempin vladimiralencar prashant118 jando runska i0n longwei66 zelladoor sumedhkumarprasad grlk zwcdp digideskio juangon nunofernandes-plight reynierhdez kerphi cmsebast shayan-taheri oasisye

automl-gs's Issues

catboost/LightGBM: Categorical indices support

Both framework leverage categorical indices, which may require a slightly different approach compared to xgboost.

Google Colab - automl_train/metadata/results.csv does not exist

Input:

from automl_gs import automl_grid_search

automl_grid_search("data.csv", "diagnosis")

Output:

Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
id: ignore
radius_mean: numeric
texture_mean: numeric
perimeter_mean: numeric
area_mean: numeric
smoothness_mean: numeric
compactness_mean: numeric
concavity_mean: numeric
concave points_mean: numeric
symmetry_mean: numeric
fractal_dimension_mean: numeric
radius_se: numeric
texture_se: numeric
perimeter_se: numeric
area_se: numeric
smoothness_se: numeric
compactness_se: numeric
concavity_se: numeric
concave points_se: numeric
symmetry_se: numeric
fractal_dimension_se: numeric
radius_worst: numeric
texture_worst: numeric
perimeter_worst: numeric
area_worst: numeric
smoothness_worst: numeric
compactness_worst: numeric
concavity_worst: numeric
concave points_worst: numeric
symmetry_worst: numeric
fractal_dimension_worst: numeric
Unnamed: 32: numeric
0%
0/100 [00:04<?, ?trial/s]
0%
0/20 [00:00<?, ?epoch/s]
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-37-308e97508c91> in <module>()
      1 from automl_gs import automl_grid_search
      2 
----> 3 automl_grid_search("data.csv", "diagnosis")

5 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File automl_train/metadata/results.csv does not exist: 'automl_train/metadata/results.csv'

Image input fields

Allow the ability to use an image as an input, in conjunction with other fields.

TensorFlow only
The input column data is text indicating the file name.
The images are stored in a folder; this folder must be specified as an input parameter.

The problem is that the pretrained models are too heavy, and training a CNN from scratch is too time consuming.

Solution is to use a fast image-encoding approach, which work I'll be starting after automl-gs.

Add console error if the subprocess script errors out.

Colab: FileNotFoundError: File b'tpu_train/metadata/results.csv' does not exist

The stock Google Colab link in the README.md isn't working correctly. I added a line to download the titanic.csv, then hit run all. Full stack trace below:



Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
PassengerId: numeric
Pclass: categorical
Name: ignore
Sex: categorical
Age: numeric
SibSp: categorical
Parch: categorical
Ticket: ignore
Fare: numeric
Cabin: categorical
Embarked: categorical

0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

<ipython-input-5-17dc9e2d602c> in <module>()
      2                    target_field='Survived',
      3                    model_name='tpu',
----> 4                    tpu_address = tpu_address)

/usr/local/lib/python3.6/dist-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
     85         # and append to the metrics CSV.
     86         results = pd.read_csv(os.path.join(train_folder, 
---> 87                                         "metadata", "results.csv"))
     88         results = results.assign(**params)
     89         results.insert(0, 'trial_id', uuid.uuid4())

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    707                     skip_blank_lines=skip_blank_lines)
    708 
--> 709         return _read(filepath_or_buffer, kwds)
    710 
    711     parser_f.__name__ = name

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    447 
    448     # Create the parser.
--> 449     parser = TextFileReader(filepath_or_buffer, **kwds)
    450 
    451     if chunksize or iterator:

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    816             self.options['has_index_names'] = kwds['has_index_names']
    817 
--> 818         self._make_engine(self.engine)
    819 
    820     def close(self):

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1047     def _make_engine(self, engine='c'):
   1048         if engine == 'c':
-> 1049             self._engine = CParserWrapper(self.f, **self.options)
   1050         else:
   1051             if engine == 'python':

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1693         kwds['allow_leading_cols'] = self.index_col is not False
   1694 
-> 1695         self._reader = parsers.TextReader(src, **kwds)
   1696 
   1697         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'tpu_train/metadata/results.csv' does not exist

SyntaxError: invalid syntax when fields start with a number.

Hi, and thanks for your work.

I tried to run your project using a dataset that have some fields that starts with numbers and this throws a Syntax error.
For example, with a field named '1stFlrSF', I got the following error :

Traceback (most recent call last):
  File "model.py", line 3, in <module>
    from pipeline import *
  File "[MY_PATH]/automl_train/pipeline.py", line 1090
    1stflrsf_enc = df['1stFlrSF']
               ^
SyntaxError: invalid syntax

  0%|          | 0/20 [00:00<?, ?epoch/s]Traceback (most recent call last):
  File "[MY_PATH]/test_auto_ml/Test.py", line 8, in <module>
    do_the_thing("[MY_DATASET_PATH]/train.csv","SalePrice")
  File "[MY_PATH]/test_auto_ml/Test.py", line 5, in do_the_thing
    automl_grid_search(path,label)
  File "[MY_PYTHON_PATH]/site-packages/automl_gs/automl_gs.py", line 94, in automl_grid_search
    train_results = results.tail(1).to_dict('records')[0]
IndexError: list index out of range

Float conversion issue screwing with numeric encoders.

I almost feel bad for reporting this one.

Using the yacht hydrodynamics UIC dataset, I got this error:

(env) (base) C:\Users\josep\Jeenee\AutoML\automl_train>python model.py -d ..\automl-testbench\yacht-hydrodynamics\data.csv -m train
Traceback (most recent call last):
  File "model.py", line 46, in <module>
    model_train(df, encoders, args, model)
  File "C:\Users\josep\Jeenee\AutoML\automl_train\pipeline.py", line 347, in model_train
    X, y = process_data(df, encoders)
  File "C:\Users\josep\Jeenee\AutoML\automl_train\pipeline.py", line 296, in process_data
    df['Length-beam ratio'].values, encoders['length_beam_ratio_bins'], labels=False, include_lowest=True, duplicates='drop')
  File "C:\Users\josep\Jeenee\AutoML\venv\lib\site-packages\pandas\core\reshape\tile.py", line 235, in cut
    raise ValueError('bins must increase monotonically.')
ValueError: bins must increase monotonically.

Hmmm, odd. Let's take a look at pipeline.py...

    # Length-beam ratio
    length_beam_ratio_enc = df['Length-beam ratio']
    length_beam_ratio_bins = length_beam_ratio_enc.quantile(
        np.linspace(0, 1, 10+1))
    encoders['length_beam_ratio_bins'] = length_beam_ratio_bins
    
    # ....

    # Length-beam ratio
    length_beam_ratio_enc = pd.cut(
        df['Length-beam ratio'].values, encoders['length_beam_ratio_bins'], labels=False, include_lowest=True, duplicates='drop')

The error is referring to the .cut line, which I had previously patched to include the duplicates='drop' bit. But the current error isn't related to that, but complaining about the encoder. Hmmm, nothing looks odd in the data about that column. Let's open up pdb and take a look...

>>> encoders['length_beam_ratio_bins']
[2.73, 2.76, 3.15, 3.15, 3.1499999999999995, 3.15, 3.17, 3.32, 3.51, 3.51, 3.64]

facepalm

Well now! I suppose I'll concede that's technically not monotonically increasing!

I appended a .round(4) to the two .quantile lines of encoders/numeric (lines 12 and 15), which worked for this test case. This is certainly not an adequate general solution, however, as e.g. that'll break data on data that needs precision at the 5th decimal place...

SyntaxError: invalid decimal literal produced by automl

Wanted to try automl_gs, but I get this error and can't figure out why.

File "C:\Users\XXX\automl_train\model.py", line 3, in
from pipeline import *
File "C:\Users\XXX\automl_train\pipeline.py", line 29
0_enc = df['0']
^
SyntaxError: invalid decimal literal

Any ideas about that?

NotFoundError and ValueError on titanic dataset

Trying out automl_gs in a new conda env using the titanic dataset. After each iteration I get the error:

ValueError: Parent directory of model_weights.hdf5 doesn't exist, can't save.

Same behavior running from command line or within ipython following the example notebook. To clarify, it's finding titanic.csv fine, the error seems to be when saving the intermediate results. Full traceback available below.

Traceback

$ automl_gs titanic.csv Survived
/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)
Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
Pclass: categorical
Name: ignore
Sex: categorical
Age: numeric
Siblings/Spouses Aboard: categorical
Parents/Children Aboard: categorical
Fare: numeric
/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:126: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  hps = yaml.load(f)
  0%|                                                                                        | 0/100 [00:00<?, ?trial/s/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:199: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)[problem_type]
                                                                                                                       Traceback (most recent call last):████████████████████████████████████                | 16/20 [00:06<00:01,  2.35epoch/s]
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
                                                                                                                           return fn(*args)██████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00,  2.83epoch/s]
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory
	 [[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1620, in save
    {self.saver_def.filename_tensor_name: checkpoint_file})
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1047, in run
    fetches=fetches, feed_dict=feed_dict, **kwargs)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
    run_metadata_ptr)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
    run_metadata)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory
	 [[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]

Caused by op 'save/SaveV2', defined at:
  File "model.py", line 46, in <module>
    model_train(df, encoders, args, model)
  File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 377, in model_train
    batch_size=256)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1363, in fit
    validation_steps=validation_steps)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 291, in fit_loop
    callbacks.on_train_end()
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 158, in on_train_end
    callback.on_train_end(logs)
  File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 398, in on_train_end
    self.model.save_weights('model_weights.hdf5')
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1365, in save_weights
    self._checkpointable_saver.save(filepath, session=session)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1178, in save
    self._last_save_saver = saver_lib.Saver(var_list=named_variables)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1281, in __init__
    self.build()
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1293, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1330, in _build
    build_save=build_save, build_restore=build_restore)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 775, in _build_internal
    save_tensor = self._AddSaveOps(filename_tensor, saveables)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 275, in _AddSaveOps
    save = self.save_op(filename_tensor, saveables)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 193, in save_op
    tensors)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1687, in save_v2
    shape_and_slices=shape_and_slices, tensors=tensors, name=name)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): ; No such file or directory
	 [[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "model.py", line 46, in <module>
    model_train(df, encoders, args, model)
  File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 377, in model_train
    batch_size=256)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1363, in fit
    validation_steps=validation_steps)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 291, in fit_loop
    callbacks.on_train_end()
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 158, in on_train_end
    callback.on_train_end(logs)
  File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 398, in on_train_end
    self.model.save_weights('model_weights.hdf5')
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1365, in save_weights
    self._checkpointable_saver.save(filepath, session=session)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1186, in save
    global_step=checkpoint_number)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1637, in save
    raise exc
ValueError: Parent directory of model_weights.hdf5 doesn't exist, can't save.
                                                                                                                        
Metrics:                                                                                                                
trial_id: 3e5c75e7-53be-4e75-8558-17b511440ba9
epoch: 20
time_completed: 2019-03-26 22:03:19
log_loss: 0.6697867036089022
accuracy: 0.6142322097378277
auc: 0.8345666587733839
precision: 0.30711610486891383
recall: 0.5
f1: 0.3805104408352668
  1%|▊                                                                               | 1/100 [00:07<12:16,  7.44s/trial]

results missing, ValueError: Input

This can be hard to figure out since I can't share the data. I am running it in Google Colab.

automl_grid_search(csv_path='/content/CLT_all_tasks_trial_level.csv', target_field='correctResp', model_name='tpu', tpu_address = tpu_address)

Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
Subject: categorical
Finished: categorical
TrainingDay: categorical
Condition: categorical
CondPrev: categorical
TaskNumber: categorical
TaskId: categorical
TrialNumber: numeric
PresentationStimulus: numeric
StimTime: numeric
RespToTime: numeric
RT: numeric
SubjResp: categorical
OutcomeInt: categorical
TaskOutcomeInt: categorical
StimDim1: categorical
StimDim2: categorical
StimDim3: categorical
StimDim4: categorical
IntendedRule: categorical
Background: categorical
StimDimWord1: categorical
StimDimWord2: categorical
StimDimWord3: categorical
StimDimWord4: categorical
ExpResp: categorical
DistinctDays: categorical
out: categorical
StimType: categorical
0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-16-ca69e1157d4e> in <module>()
      2                    target_field='correctResp',
      3                    model_name='tpu',
----> 4                    tpu_address = tpu_address)

/usr/local/lib/python3.6/dist-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
     92                     header=(best_result is None))
     93 
---> 94         train_results = results.tail(1).to_dict('records')[0]
     95 
     96         # If the target metric improves, save the new hps/files,

IndexError: list index out of range

Here is the log.


Apr 3, 2019, 5:31:44 PM | WARNING | ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
-- | -- | --
Apr 3, 2019, 5:31:44 PM | WARNING | raise ValueError(msg_err.format(type_err, X.dtype))
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
Apr 3, 2019, 5:31:44 PM | WARNING | allow_nan=force_all_finite == 'allow-nan')
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 573, in check_array
Apr 3, 2019, 5:31:44 PM | WARNING | y_pred = check_array(y_pred, ensure_2d=False)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py", line 1763, in log_loss
Apr 3, 2019, 5:31:44 PM | WARNING | logloss = log_loss(y_true, y_pred)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/content/tpu_train/pipeline.py", line 1126, in on_epoch_end
Apr 3, 2019, 5:31:44 PM | WARNING | callback.on_epoch_end(epoch, logs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/callbacks.py", line 251, in on_epoch_end
Apr 3, 2019, 5:31:44 PM | WARNING | callbacks.on_epoch_end(epoch, epoch_logs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1734, in _pipeline_fit_loop
Apr 3, 2019, 5:31:44 PM | WARNING | validation_steps=validation_steps)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1633, in _pipeline_fit
Apr 3, 2019, 5:31:44 PM | WARNING | steps_per_epoch, validation_steps, **kwargs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1532, in fit
Apr 3, 2019, 5:31:44 PM | WARNING | batch_size=64 * 8)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/content/tpu_train/pipeline.py", line 1095, in model_train
Apr 3, 2019, 5:31:44 PM | WARNING | model_train(df, encoders, args, model)
Apr 3, 2019, 5:31:44 PM | WARNING | File "model.py", line 69, in <module>
Apr 3, 2019, 5:31:44 PM | WARNING | Traceback (most recent call last):

AFAIK, the largest number in the dataset is 12007245.

Thanks for the help!

How to predict a single Titanic data?

I was able to train using the Titanic Dataset. In the docs it says to train use the following command:
python3 model.py -d data.csv -m predict

Does this mean prediction features have to be in .csv file? Is it possible to predict a single row in a CSV file from python without using the terminal?

automl_gs titanic.csv Survived --

data

Hello, what is the format of the predicted data set data.csv? Is it tensorflow by default

automl-gs assumes the Python that's running is the global `python`

(I need to fill this in with a full reproducer) but as a placeholder:

I just ran automl-gs for the first time, and it errored out with:

/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
  return f(*args, **kwds)
/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/jinja2/utils.py:485: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/jinja2/runtime.py:318: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping

  0%|          | 0/100 [00:00<?, ?trial/s]

  0%|          | 0/20 [00:00<?, ?epoch/s]�[ATraceback (most recent call last):
  File "model.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'


  0%|          | 0/20 [00:00<?, ?epoch/s]�[A/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py:858: ResourceWarning: subprocess 12646 is still running
  ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Solving a classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
message.received: numeric
message.sender: categorical
message.subject: text
message.size: numeric
message.recipients: categorical
response.sent: numeric
response.sender: categorical
response.subject: text
response.size: numeric
response.recipients: categorical
lag_readable: text
Traceback (most recent call last):
  File "/Users/jberman/.local/bin/automl_gs", line 10, in <module>
    sys.exit(cmd())
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 175, in cmd
    tpu_address=args.tpu_address)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 87, in automl_grid_search
    "metadata", "results.csv"))
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='automl_results.csv' mode='w' encoding='UTF-8'>
ResourceWarning: Enable tracemalloc to get the object allocation traceback

Where at least that first error there is because the auto-generated .py file doesn't contain a shebang:

⊙  head automl_train/model.py                                                                                                jberman@USNYHJBERMANMB2 ●
import argparse
import pandas as pd
from pipeline import *

and seems to be being executed by just calling python, whereas it needs to use sys.executable from the original Python that was used to run automl-gs (and which is where pandas would be installed to).

(The blow up to be clear is because it's installed into a virtualenv, and in fact whatever other Python it's running model.py with in fact does not have pandas installed to it).

Compatibility with TensorFlow 2?

When will compatibility with TensorFlow 2 be added, if ever?

FileNotFoundError: [WinError 2] The system cannot find the file specified

When I try to run the jupyter notebook that you provided on the link
https://github.com/minimaxir/automl-gs/blob/master/docs/automl_gs_tutorial.ipynb
I'm getting FileNotFoundError. Also, when I tried with a local csv file, I got the same error again.
Btw, my OS is Windows 10.

Using automl-gs for bin packing

Is there any way to use a tool like automl-gs for bin-packing problems? I've seen a way to model it as linear optimization where you do a cross join of all objects and all potential bins, set a constraint of each object being selected once, and then optimizing the bins however desired. Cross joins can end up being needlessly heavy though, so I'm wondering if there is a way to model that sort of problem such that you could use pre-optimized and continually developed tools like this one.

Add LightGBM support via scikit-learn 0.21

https://twitter.com/amuellerml/status/1129443826945396737

If this is using the scikit-learn API it might be more straightforward.

xgboost: GPU support

xgboost supports GPUs by setting gpu_hist instead of hist, and the code is prepared for that. Two problems:

Unlike TensorFlow, xgboost does not have a way to automatically determine if a GPU is present.
GPU hist training requires a Pascal minimum; the GPUs in Colaboratory notebooks are K80s (Kepler) which does not qualify.

Will keep at CPU support for now but there has to be a better solution.

Data set requires multiple

I tried a simple dataset to play around with this, and I am running into

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

I believe this is because scikit wants there to be multiple iterations of the variable you're trying to predict. Might want to add it to the docs~

Input:
square.txt

Full log if needed:

>$ automl_gs square.csv square
/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)
Solving a classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
real: numeric
fake1: numeric
fake2: numeric
fake3: numeric
fake4: numeric
text: categorical
bool: categorical
/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:126: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  hps = yaml.load(f)
  0%|                                                                                                                                                                         | 0/100 [00:00<?, ?trial/s/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:199: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)[problem_type]
Traceback (most recent call last):
  File "model.py", line 47, in <module>
    model_train(df, encoders, args, model)
  File "../automodel/automl_train/pipeline.py", line 408, in model_train
    for train_indices, val_indices in split.split(np.zeros(y.shape[0]), y):
  File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1315, in split
    for train, test in self._iter_indices(X, y, groups):
  File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1695, in _iter_indices
    raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
                                                                                                                                                                                                        Traceback (most recent call last):                                                                                                                                              | 0/20 [00:00<?, ?epoch/s]
  File "/usr/local/bin/automl_gs", line 10, in <module>
    sys.exit(cmd())
  File "/usr/local/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 175, in cmd
    tpu_address=args.tpu_address)
  File "/usr/local/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 87, in automl_grid_search
    "metadata", "results.csv"))
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'

Why called "gs" in "automl-gs"?

can't predict using test data

Hello - I have successfully trained my model using my training dataset. Now, when I go to predict, using this command:

python model.py -d ../testing_imputed.csv -m predict

I get this error:

ValueError: Usecols do not match columns, columns expected but not found: ['accepted']

but this is the column I'm trying to predict! Am I supposed to create the target/prediction column in the test data and it will be populated with the predictions? This is a logistic regression problem, where I am trying to predict whether or not a loan will be approved. If I need to add a column, is it:

test_data'['accepted'] = ""

Or do I zero it out and the prediction will update the value with what the model should predict?

Thanks in advance to all who respond.

YAMLLoadWarning disrupting progress bar

Trying out the example titanic dataset in a conda environment and encountered the following error very frequently such that it disrupts the tqdm progress bar.

/anaconda3/envs/automl-gs/lib/python3.6/site-packages/automl_gs/utils_automl.py:270:
YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default 
Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)

Console printing issues after an experiment improvement

tqdm.write() apparently gets buggy when terminal height is less than the text being replaced, and has text artifacts if the text replaced has a different line length.

As a result, the script cannot print hyperparameters, as the hps + metrics will go above typical terminal height, and the lines of hyperparameters are variable unlike metrics.

I removed hps printing for the time being (it's not strictly necessary anyways) but I would like a better solution.

FileNotFoundError

          from automl_gs import automl_grid_search
          automl_grid_search('Housing.csv','price')

Solving a regression problem, minimizing mse using tensorflow.
Modeling with field specifications:
area: numeric
bedrooms: numeric
bathrooms: categorical
stories: categorical
mainroad: categorical
guestroom: categorical
basement: categorical
hotwaterheating: categorical
airconditioning: categorical
parking: categorical
prefarea: categorical
furnishingstatus: categorical

0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]

FileNotFoundError Traceback (most recent call last)
in
1 from automl_gs import automl_grid_search
----> 2 automl_grid_search('Housing.csv','price')

~/.local/lib/python3.6/site-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
85 # and append to the metrics CSV.
86 results = pd.read_csv(os.path.join(train_folder,
---> 87 "metadata", "results.csv"))
88 results = results.assign(**params)
89 results.insert(0, 'trial_id', uuid.uuid4())

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
700 skip_blank_lines=skip_blank_lines)
701
--> 702 return _read(filepath_or_buffer, kwds)
703
704 parser_f.name = name

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
427
428 # Create the parser.
--> 429 parser = TextFileReader(filepath_or_buffer, **kwds)
430
431 if chunksize or iterator:

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, f, engine, **kwds)
893 self.options['has_index_names'] = kwds['has_index_names']
894
--> 895 self._make_engine(self.engine)
896
897 def close(self):

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1120 def _make_engine(self, engine='c'):
1121 if engine == 'c':
-> 1122 self._engine = CParserWrapper(self.f, **self.options)
1123 else:
1124 if engine == 'python':

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, src, **kwds)
1851 kwds['usecols'] = self.usecols
1852
-> 1853 self._reader = parsers.TextReader(src, **kwds)
1854 self.unnamed_cols = self._reader.unnamed_cols
1855

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'

AttributeError: 'float' object has no attribute 'lower'

While trying out automl_gs with Jupiter, I got file not found error:

FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'

Trying do it with terminal before the file missing error it returns:

AttributeError: 'float' object has no attribute 'lower'

Searching in StackOverflow, I found that the problem is how pandas converts inputs to python datatypes.

https://stackoverflow.com/questions/34724246/attributeerror-float-object-has-no-attribute-lower/34724771

Is it possible to prevent this behaviour using automl_gs ?

Bug: incorrect shape size set for categorical field in classifier model.

While trying out automl_gs on this uci dataset, I got this error:

Traceback (most recent call last):
  File "model.py", line 59, in <module>
    model_train(df, encoders, args, model)
  File "C:\Users\josep\automl_train\pipeline.py", line 835, in model_train
    batch_size=256)
  File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 776, in fit
    shuffle=shuffle)
  File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2382, in _standardize_user_data
    exception_prefix='input')
  File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training_utils.py", line 362, in standardize_input_data
    ' but got array with shape ' + str(data_shape))
ValueError: Error when checking input: expected input_son to have shape (1,) but got array with shape (2,)

After some sleuthing, I eventually figured out that the error is that the shape size for the offending column was set incorrectly in build_model():

    input_son_size = len(encoders['son_encoder'].classes_)
    input_son = Input(
        shape=(input_son_size if input_son_size != 2 else 1,), name="input_son")

I don't understand what the purpose of that `if-else clause. It looks like this change was introduced in 1dcb9e2; reverting that commit allows my model to work.

My device is been monitored by intruder

😅

FileNotFoundError

Hi,

Just trying to work through your example colab notebook. I work through the cells, upload the titanic.csv, and get

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

<ipython-input-3-9f452c025bdd> in <module>()
      2                    target_field='origin',
      3                    model_name='tpu',
----> 4                    tpu_address = tpu_address)

5 frames

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File tpu_train/metadata/results.csv does not exist: 'tpu_train/metadata/results.csv'

bin edges must be unique

Hello - I am trying to use this package to provide predictions for my Data Science Capstone project. When I run against my training data, I get the following exception/error:

raceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "model.py", line 63, in
model_train(df, encoders, args, model)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 903, in model_train
X, y = process_data(df, encoders)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 758, in process_data
df['msa_md'].values, encoders['msa_md_bins'], labels=False, include_lowest=True)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 234, in cut
duplicates=duplicates)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 332, in _bins_to_cuts
"the 'duplicates' kwarg".format(bins=bins))
ValueError: Bin edges must be unique: array([ -1., -1., 18., 63., 118., 192., 247., 305., 329., 371., 408.]).
You can drop duplicate edges by setting the 'duplicates' kwarg
Traceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\Scripts\automl_gs.exe_main.py", line 9, in
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 175, in cmd
tpu_address=args.tpu_address)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 87, in automl_grid_search
"metadata", "results.csv"))
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1708, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit
File "pandas_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'automl_train\metadata\results.csv' does not exist

Is it only for the classification problems? How about the regression problems?

Thank you so much for your library. Is it possible to use it for the regression problems?

minimaxir / automl-gs Goto Github PK

automl-gs's People

Contributors

Stargazers

Watchers

Forkers

automl-gs's Issues

0% 0/100 [00:00<?, ?trial/s] 0% 0/20 [00:00<?, ?epoch/s]

Recommend Projects

Recommend Topics

Recommend Org

0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]