Giter Club home page Giter Club logo

automl-gs's People

Contributors

a-ozbek avatar drien avatar evan-burke avatar krzynio avatar mikeshatch avatar minimaxir avatar xorb0ss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

automl-gs's Issues

can't predict using test data

Hello - I have successfully trained my model using my training dataset. Now, when I go to predict, using this command:

python model.py -d ../testing_imputed.csv -m predict

I get this error:

ValueError: Usecols do not match columns, columns expected but not found: ['accepted']

but this is the column I'm trying to predict! Am I supposed to create the target/prediction column in the test data and it will be populated with the predictions? This is a logistic regression problem, where I am trying to predict whether or not a loan will be approved. If I need to add a column, is it:

test_data'['accepted'] = ""

Or do I zero it out and the prediction will update the value with what the model should predict?

Thanks in advance to all who respond.

NotFoundError and ValueError on titanic dataset

Trying out automl_gs in a new conda env using the titanic dataset. After each iteration I get the error:

ValueError: Parent directory of model_weights.hdf5 doesn't exist, can't save.

Same behavior running from command line or within ipython following the example notebook. To clarify, it's finding titanic.csv fine, the error seems to be when saving the intermediate results. Full traceback available below.

Traceback
$ automl_gs titanic.csv Survived
/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)
Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
Pclass: categorical
Name: ignore
Sex: categorical
Age: numeric
Siblings/Spouses Aboard: categorical
Parents/Children Aboard: categorical
Fare: numeric
/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:126: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  hps = yaml.load(f)
  0%|                                                                                        | 0/100 [00:00<?, ?trial/s/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:199: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)[problem_type]
                                                                                                                       Traceback (most recent call last):████████████████████████████████████                | 16/20 [00:06<00:01,  2.35epoch/s]
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
                                                                                                                           return fn(*args)██████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00,  2.83epoch/s]
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory
	 [[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1620, in save
    {self.saver_def.filename_tensor_name: checkpoint_file})
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1047, in run
    fetches=fetches, feed_dict=feed_dict, **kwargs)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
    run_metadata_ptr)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
    run_metadata)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory
	 [[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]

Caused by op 'save/SaveV2', defined at:
  File "model.py", line 46, in <module>
    model_train(df, encoders, args, model)
  File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 377, in model_train
    batch_size=256)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1363, in fit
    validation_steps=validation_steps)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 291, in fit_loop
    callbacks.on_train_end()
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 158, in on_train_end
    callback.on_train_end(logs)
  File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 398, in on_train_end
    self.model.save_weights('model_weights.hdf5')
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1365, in save_weights
    self._checkpointable_saver.save(filepath, session=session)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1178, in save
    self._last_save_saver = saver_lib.Saver(var_list=named_variables)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1281, in __init__
    self.build()
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1293, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1330, in _build
    build_save=build_save, build_restore=build_restore)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 775, in _build_internal
    save_tensor = self._AddSaveOps(filename_tensor, saveables)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 275, in _AddSaveOps
    save = self.save_op(filename_tensor, saveables)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 193, in save_op
    tensors)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1687, in save_v2
    shape_and_slices=shape_and_slices, tensors=tensors, name=name)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): ; No such file or directory
	 [[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "model.py", line 46, in <module>
    model_train(df, encoders, args, model)
  File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 377, in model_train
    batch_size=256)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1363, in fit
    validation_steps=validation_steps)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 291, in fit_loop
    callbacks.on_train_end()
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 158, in on_train_end
    callback.on_train_end(logs)
  File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 398, in on_train_end
    self.model.save_weights('model_weights.hdf5')
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1365, in save_weights
    self._checkpointable_saver.save(filepath, session=session)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1186, in save
    global_step=checkpoint_number)
  File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1637, in save
    raise exc
ValueError: Parent directory of model_weights.hdf5 doesn't exist, can't save.
                                                                                                                        
Metrics:                                                                                                                
trial_id: 3e5c75e7-53be-4e75-8558-17b511440ba9
epoch: 20
time_completed: 2019-03-26 22:03:19
log_loss: 0.6697867036089022
accuracy: 0.6142322097378277
auc: 0.8345666587733839
precision: 0.30711610486891383
recall: 0.5
f1: 0.3805104408352668
  1%|| 1/100 [00:07<12:16,  7.44s/trial]

Console printing issues after an experiment improvement

tqdm.write() apparently gets buggy when terminal height is less than the text being replaced, and has text artifacts if the text replaced has a different line length.

As a result, the script cannot print hyperparameters, as the hps + metrics will go above typical terminal height, and the lines of hyperparameters are variable unlike metrics.

I removed hps printing for the time being (it's not strictly necessary anyways) but I would like a better solution.

results missing, ValueError: Input

This can be hard to figure out since I can't share the data. I am running it in Google Colab.

automl_grid_search(csv_path='/content/CLT_all_tasks_trial_level.csv', target_field='correctResp', model_name='tpu', tpu_address = tpu_address)

Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
Subject: categorical
Finished: categorical
TrainingDay: categorical
Condition: categorical
CondPrev: categorical
TaskNumber: categorical
TaskId: categorical
TrialNumber: numeric
PresentationStimulus: numeric
StimTime: numeric
RespToTime: numeric
RT: numeric
SubjResp: categorical
OutcomeInt: categorical
TaskOutcomeInt: categorical
StimDim1: categorical
StimDim2: categorical
StimDim3: categorical
StimDim4: categorical
IntendedRule: categorical
Background: categorical
StimDimWord1: categorical
StimDimWord2: categorical
StimDimWord3: categorical
StimDimWord4: categorical
ExpResp: categorical
DistinctDays: categorical
out: categorical
StimType: categorical
0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-16-ca69e1157d4e> in <module>()
      2                    target_field='correctResp',
      3                    model_name='tpu',
----> 4                    tpu_address = tpu_address)

/usr/local/lib/python3.6/dist-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
     92                     header=(best_result is None))
     93 
---> 94         train_results = results.tail(1).to_dict('records')[0]
     95 
     96         # If the target metric improves, save the new hps/files,

IndexError: list index out of range

Here is the log.


Apr 3, 2019, 5:31:44 PM | WARNING | ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
-- | -- | --
Apr 3, 2019, 5:31:44 PM | WARNING | raise ValueError(msg_err.format(type_err, X.dtype))
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
Apr 3, 2019, 5:31:44 PM | WARNING | allow_nan=force_all_finite == 'allow-nan')
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 573, in check_array
Apr 3, 2019, 5:31:44 PM | WARNING | y_pred = check_array(y_pred, ensure_2d=False)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py", line 1763, in log_loss
Apr 3, 2019, 5:31:44 PM | WARNING | logloss = log_loss(y_true, y_pred)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/content/tpu_train/pipeline.py", line 1126, in on_epoch_end
Apr 3, 2019, 5:31:44 PM | WARNING | callback.on_epoch_end(epoch, logs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/callbacks.py", line 251, in on_epoch_end
Apr 3, 2019, 5:31:44 PM | WARNING | callbacks.on_epoch_end(epoch, epoch_logs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1734, in _pipeline_fit_loop
Apr 3, 2019, 5:31:44 PM | WARNING | validation_steps=validation_steps)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1633, in _pipeline_fit
Apr 3, 2019, 5:31:44 PM | WARNING | steps_per_epoch, validation_steps, **kwargs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1532, in fit
Apr 3, 2019, 5:31:44 PM | WARNING | batch_size=64 * 8)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/content/tpu_train/pipeline.py", line 1095, in model_train
Apr 3, 2019, 5:31:44 PM | WARNING | model_train(df, encoders, args, model)
Apr 3, 2019, 5:31:44 PM | WARNING | File "model.py", line 69, in <module>
Apr 3, 2019, 5:31:44 PM | WARNING | Traceback (most recent call last):

AFAIK, the largest number in the dataset is 12007245.

Thanks for the help!

FileNotFoundError

          from automl_gs import automl_grid_search
          automl_grid_search('Housing.csv','price')

Solving a regression problem, minimizing mse using tensorflow.
Modeling with field specifications:
area: numeric
bedrooms: numeric
bathrooms: categorical
stories: categorical
mainroad: categorical
guestroom: categorical
basement: categorical
hotwaterheating: categorical
airconditioning: categorical
parking: categorical
prefarea: categorical
furnishingstatus: categorical

0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]

FileNotFoundError Traceback (most recent call last)
in
1 from automl_gs import automl_grid_search
----> 2 automl_grid_search('Housing.csv','price')

~/.local/lib/python3.6/site-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
85 # and append to the metrics CSV.
86 results = pd.read_csv(os.path.join(train_folder,
---> 87 "metadata", "results.csv"))
88 results = results.assign(**params)
89 results.insert(0, 'trial_id', uuid.uuid4())

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
700 skip_blank_lines=skip_blank_lines)
701
--> 702 return _read(filepath_or_buffer, kwds)
703
704 parser_f.name = name

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
427
428 # Create the parser.
--> 429 parser = TextFileReader(filepath_or_buffer, **kwds)
430
431 if chunksize or iterator:

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, f, engine, **kwds)
893 self.options['has_index_names'] = kwds['has_index_names']
894
--> 895 self._make_engine(self.engine)
896
897 def close(self):

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1120 def _make_engine(self, engine='c'):
1121 if engine == 'c':
-> 1122 self._engine = CParserWrapper(self.f, **self.options)
1123 else:
1124 if engine == 'python':

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, src, **kwds)
1851 kwds['usecols'] = self.usecols
1852
-> 1853 self._reader = parsers.TextReader(src, **kwds)
1854 self.unnamed_cols = self._reader.unnamed_cols
1855

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'

Bug: incorrect shape size set for categorical field in classifier model.

While trying out automl_gs on this uci dataset, I got this error:

Traceback (most recent call last):
  File "model.py", line 59, in <module>
    model_train(df, encoders, args, model)
  File "C:\Users\josep\automl_train\pipeline.py", line 835, in model_train
    batch_size=256)
  File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 776, in fit
    shuffle=shuffle)
  File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2382, in _standardize_user_data
    exception_prefix='input')
  File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training_utils.py", line 362, in standardize_input_data
    ' but got array with shape ' + str(data_shape))
ValueError: Error when checking input: expected input_son to have shape (1,) but got array with shape (2,)

After some sleuthing, I eventually figured out that the error is that the shape size for the offending column was set incorrectly in build_model():

    input_son_size = len(encoders['son_encoder'].classes_)
    input_son = Input(
        shape=(input_son_size if input_son_size != 2 else 1,), name="input_son")

I don't understand what the purpose of that `if-else clause. It looks like this change was introduced in 1dcb9e2; reverting that commit allows my model to work.

automl-gs assumes the Python that's running is the global `python`

(I need to fill this in with a full reproducer) but as a placeholder:

I just ran automl-gs for the first time, and it errored out with:

/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
  return f(*args, **kwds)
/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/jinja2/utils.py:485: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/jinja2/runtime.py:318: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping

  0%|          | 0/100 [00:00<?, ?trial/s]

  0%|          | 0/20 [00:00<?, ?epoch/s]�[ATraceback (most recent call last):
  File "model.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'


  0%|          | 0/20 [00:00<?, ?epoch/s]�[A/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py:858: ResourceWarning: subprocess 12646 is still running
  ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Solving a classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
message.received: numeric
message.sender: categorical
message.subject: text
message.size: numeric
message.recipients: categorical
response.sent: numeric
response.sender: categorical
response.subject: text
response.size: numeric
response.recipients: categorical
lag_readable: text
Traceback (most recent call last):
  File "/Users/jberman/.local/bin/automl_gs", line 10, in <module>
    sys.exit(cmd())
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 175, in cmd
    tpu_address=args.tpu_address)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 87, in automl_grid_search
    "metadata", "results.csv"))
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='automl_results.csv' mode='w' encoding='UTF-8'>
ResourceWarning: Enable tracemalloc to get the object allocation traceback

Where at least that first error there is because the auto-generated .py file doesn't contain a shebang:

⊙  head automl_train/model.py                                                                                                jberman@USNYHJBERMANMB2 ●
import argparse
import pandas as pd
from pipeline import *

and seems to be being executed by just calling python, whereas it needs to use sys.executable from the original Python that was used to run automl-gs (and which is where pandas would be installed to).

(The blow up to be clear is because it's installed into a virtualenv, and in fact whatever other Python it's running model.py with in fact does not have pandas installed to it).

SyntaxError: invalid syntax when fields start with a number.

Hi, and thanks for your work.

I tried to run your project using a dataset that have some fields that starts with numbers and this throws a Syntax error.
For example, with a field named '1stFlrSF', I got the following error :

Traceback (most recent call last):
  File "model.py", line 3, in <module>
    from pipeline import *
  File "[MY_PATH]/automl_train/pipeline.py", line 1090
    1stflrsf_enc = df['1stFlrSF']
               ^
SyntaxError: invalid syntax

  0%|          | 0/20 [00:00<?, ?epoch/s]Traceback (most recent call last):
  File "[MY_PATH]/test_auto_ml/Test.py", line 8, in <module>
    do_the_thing("[MY_DATASET_PATH]/train.csv","SalePrice")
  File "[MY_PATH]/test_auto_ml/Test.py", line 5, in do_the_thing
    automl_grid_search(path,label)
  File "[MY_PYTHON_PATH]/site-packages/automl_gs/automl_gs.py", line 94, in automl_grid_search
    train_results = results.tail(1).to_dict('records')[0]
IndexError: list index out of range

How to predict a single Titanic data?

I was able to train using the Titanic Dataset. In the docs it says to train use the following command:
python3 model.py -d data.csv -m predict

Does this mean prediction features have to be in .csv file? Is it possible to predict a single row in a CSV file from python without using the terminal?

data

Hello, what is the format of the predicted data set data.csv? Is it tensorflow by default

Float conversion issue screwing with numeric encoders.

I almost feel bad for reporting this one.

Using the yacht hydrodynamics UIC dataset, I got this error:

(env) (base) C:\Users\josep\Jeenee\AutoML\automl_train>python model.py -d ..\automl-testbench\yacht-hydrodynamics\data.csv -m train
Traceback (most recent call last):
  File "model.py", line 46, in <module>
    model_train(df, encoders, args, model)
  File "C:\Users\josep\Jeenee\AutoML\automl_train\pipeline.py", line 347, in model_train
    X, y = process_data(df, encoders)
  File "C:\Users\josep\Jeenee\AutoML\automl_train\pipeline.py", line 296, in process_data
    df['Length-beam ratio'].values, encoders['length_beam_ratio_bins'], labels=False, include_lowest=True, duplicates='drop')
  File "C:\Users\josep\Jeenee\AutoML\venv\lib\site-packages\pandas\core\reshape\tile.py", line 235, in cut
    raise ValueError('bins must increase monotonically.')
ValueError: bins must increase monotonically.

Hmmm, odd. Let's take a look at pipeline.py...

    # Length-beam ratio
    length_beam_ratio_enc = df['Length-beam ratio']
    length_beam_ratio_bins = length_beam_ratio_enc.quantile(
        np.linspace(0, 1, 10+1))
    encoders['length_beam_ratio_bins'] = length_beam_ratio_bins
    
    # ....

    # Length-beam ratio
    length_beam_ratio_enc = pd.cut(
        df['Length-beam ratio'].values, encoders['length_beam_ratio_bins'], labels=False, include_lowest=True, duplicates='drop')

The error is referring to the .cut line, which I had previously patched to include the duplicates='drop' bit. But the current error isn't related to that, but complaining about the encoder. Hmmm, nothing looks odd in the data about that column. Let's open up pdb and take a look...

>>> encoders['length_beam_ratio_bins']
[2.73, 2.76, 3.15, 3.15, 3.1499999999999995, 3.15, 3.17, 3.32, 3.51, 3.51, 3.64]

facepalm

Well now! I suppose I'll concede that's technically not monotonically increasing!

I appended a .round(4) to the two .quantile lines of encoders/numeric (lines 12 and 15), which worked for this test case. This is certainly not an adequate general solution, however, as e.g. that'll break data on data that needs precision at the 5th decimal place...

Using automl-gs for bin packing

Is there any way to use a tool like automl-gs for bin-packing problems? I've seen a way to model it as linear optimization where you do a cross join of all objects and all potential bins, set a constraint of each object being selected once, and then optimizing the bins however desired. Cross joins can end up being needlessly heavy though, so I'm wondering if there is a way to model that sort of problem such that you could use pre-optimized and continually developed tools like this one.

Image input fields

Allow the ability to use an image as an input, in conjunction with other fields.

  • TensorFlow only
  • The input column data is text indicating the file name.
  • The images are stored in a folder; this folder must be specified as an input parameter.

The problem is that the pretrained models are too heavy, and training a CNN from scratch is too time consuming.

Solution is to use a fast image-encoding approach, which work I'll be starting after automl-gs.

Data set requires multiple

I tried a simple dataset to play around with this, and I am running into

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

I believe this is because scikit wants there to be multiple iterations of the variable you're trying to predict. Might want to add it to the docs~

Input:
square.txt

Full log if needed:

>$ automl_gs square.csv square
/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)
Solving a classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
real: numeric
fake1: numeric
fake2: numeric
fake3: numeric
fake4: numeric
text: categorical
bool: categorical
/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:126: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  hps = yaml.load(f)
  0%|                                                                                                                                                                         | 0/100 [00:00<?, ?trial/s/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:199: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)[problem_type]
Traceback (most recent call last):
  File "model.py", line 47, in <module>
    model_train(df, encoders, args, model)
  File "../automodel/automl_train/pipeline.py", line 408, in model_train
    for train_indices, val_indices in split.split(np.zeros(y.shape[0]), y):
  File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1315, in split
    for train, test in self._iter_indices(X, y, groups):
  File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1695, in _iter_indices
    raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
                                                                                                                                                                                                        Traceback (most recent call last):                                                                                                                                              | 0/20 [00:00<?, ?epoch/s]
  File "/usr/local/bin/automl_gs", line 10, in <module>
    sys.exit(cmd())
  File "/usr/local/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 175, in cmd
    tpu_address=args.tpu_address)
  File "/usr/local/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 87, in automl_grid_search
    "metadata", "results.csv"))
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'

YAMLLoadWarning disrupting progress bar

Trying out the example titanic dataset in a conda environment and encountered the following error very frequently such that it disrupts the tqdm progress bar.

/anaconda3/envs/automl-gs/lib/python3.6/site-packages/automl_gs/utils_automl.py:270:
YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default 
Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)

FileNotFoundError

Hi,

Just trying to work through your example colab notebook. I work through the cells, upload the titanic.csv, and get

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

<ipython-input-3-9f452c025bdd> in <module>()
      2                    target_field='origin',
      3                    model_name='tpu',
----> 4                    tpu_address = tpu_address)

5 frames

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File tpu_train/metadata/results.csv does not exist: 'tpu_train/metadata/results.csv'

Google Colab - automl_train/metadata/results.csv does not exist

Input:

from automl_gs import automl_grid_search

automl_grid_search("data.csv", "diagnosis")

Output:

Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
id: ignore
radius_mean: numeric
texture_mean: numeric
perimeter_mean: numeric
area_mean: numeric
smoothness_mean: numeric
compactness_mean: numeric
concavity_mean: numeric
concave points_mean: numeric
symmetry_mean: numeric
fractal_dimension_mean: numeric
radius_se: numeric
texture_se: numeric
perimeter_se: numeric
area_se: numeric
smoothness_se: numeric
compactness_se: numeric
concavity_se: numeric
concave points_se: numeric
symmetry_se: numeric
fractal_dimension_se: numeric
radius_worst: numeric
texture_worst: numeric
perimeter_worst: numeric
area_worst: numeric
smoothness_worst: numeric
compactness_worst: numeric
concavity_worst: numeric
concave points_worst: numeric
symmetry_worst: numeric
fractal_dimension_worst: numeric
Unnamed: 32: numeric
0%
0/100 [00:04<?, ?trial/s]
0%
0/20 [00:00<?, ?epoch/s]
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-37-308e97508c91> in <module>()
      1 from automl_gs import automl_grid_search
      2 
----> 3 automl_grid_search("data.csv", "diagnosis")

5 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File automl_train/metadata/results.csv does not exist: 'automl_train/metadata/results.csv'

SyntaxError: invalid decimal literal produced by automl

Wanted to try automl_gs, but I get this error and can't figure out why.

File "C:\Users\XXX\automl_train\model.py", line 3, in
from pipeline import *
File "C:\Users\XXX\automl_train\pipeline.py", line 29
0_enc = df['0']
^
SyntaxError: invalid decimal literal

Any ideas about that?

bin edges must be unique

Hello - I am trying to use this package to provide predictions for my Data Science Capstone project. When I run against my training data, I get the following exception/error:

raceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "model.py", line 63, in
model_train(df, encoders, args, model)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 903, in model_train
X, y = process_data(df, encoders)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 758, in process_data
df['msa_md'].values, encoders['msa_md_bins'], labels=False, include_lowest=True)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 234, in cut
duplicates=duplicates)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 332, in _bins_to_cuts
"the 'duplicates' kwarg".format(bins=bins))
ValueError: Bin edges must be unique: array([ -1., -1., 18., 63., 118., 192., 247., 305., 329., 371., 408.]).
You can drop duplicate edges by setting the 'duplicates' kwarg
Traceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\Scripts\automl_gs.exe_main
.py", line 9, in
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 175, in cmd
tpu_address=args.tpu_address)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 87, in automl_grid_search
"metadata", "results.csv"))
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1708, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit
File "pandas_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'automl_train\metadata\results.csv' does not exist

Colab: FileNotFoundError: File b'tpu_train/metadata/results.csv' does not exist

The stock Google Colab link in the README.md isn't working correctly. I added a line to download the titanic.csv, then hit run all. Full stack trace below:



Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
PassengerId: numeric
Pclass: categorical
Name: ignore
Sex: categorical
Age: numeric
SibSp: categorical
Parch: categorical
Ticket: ignore
Fare: numeric
Cabin: categorical
Embarked: categorical

0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

<ipython-input-5-17dc9e2d602c> in <module>()
      2                    target_field='Survived',
      3                    model_name='tpu',
----> 4                    tpu_address = tpu_address)

/usr/local/lib/python3.6/dist-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
     85         # and append to the metrics CSV.
     86         results = pd.read_csv(os.path.join(train_folder, 
---> 87                                         "metadata", "results.csv"))
     88         results = results.assign(**params)
     89         results.insert(0, 'trial_id', uuid.uuid4())

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    707                     skip_blank_lines=skip_blank_lines)
    708 
--> 709         return _read(filepath_or_buffer, kwds)
    710 
    711     parser_f.__name__ = name

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    447 
    448     # Create the parser.
--> 449     parser = TextFileReader(filepath_or_buffer, **kwds)
    450 
    451     if chunksize or iterator:

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    816             self.options['has_index_names'] = kwds['has_index_names']
    817 
--> 818         self._make_engine(self.engine)
    819 
    820     def close(self):

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1047     def _make_engine(self, engine='c'):
   1048         if engine == 'c':
-> 1049             self._engine = CParserWrapper(self.f, **self.options)
   1050         else:
   1051             if engine == 'python':

/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1693         kwds['allow_leading_cols'] = self.index_col is not False
   1694 
-> 1695         self._reader = parsers.TextReader(src, **kwds)
   1696 
   1697         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'tpu_train/metadata/results.csv' does not exist


AttributeError: 'float' object has no attribute 'lower'

While trying out automl_gs with Jupiter, I got file not found error:

FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'

Trying do it with terminal before the file missing error it returns:

AttributeError: 'float' object has no attribute 'lower'

Searching in StackOverflow, I found that the problem is how pandas converts inputs to python datatypes.

https://stackoverflow.com/questions/34724246/attributeerror-float-object-has-no-attribute-lower/34724771

Is it possible to prevent this behaviour using automl_gs ?

xgboost: GPU support

xgboost supports GPUs by setting gpu_hist instead of hist, and the code is prepared for that. Two problems:

  1. Unlike TensorFlow, xgboost does not have a way to automatically determine if a GPU is present.
  2. GPU hist training requires a Pascal minimum; the GPUs in Colaboratory notebooks are K80s (Kepler) which does not qualify.

Will keep at CPU support for now but there has to be a better solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.