minimaxir / automl-gs Goto Github PK
View Code? Open in Web Editor NEWProvide an input CSV and a target field to predict, generate a model + code to run it.
License: MIT License
Provide an input CSV and a target field to predict, generate a model + code to run it.
License: MIT License
Hello - I have successfully trained my model using my training dataset. Now, when I go to predict, using this command:
python model.py -d ../testing_imputed.csv -m predict
I get this error:
ValueError: Usecols do not match columns, columns expected but not found: ['accepted']
but this is the column I'm trying to predict! Am I supposed to create the target/prediction column in the test data and it will be populated with the predictions? This is a logistic regression problem, where I am trying to predict whether or not a loan will be approved. If I need to add a column, is it:
test_data'['accepted'] = ""
Or do I zero it out and the prediction will update the value with what the model should predict?
Thanks in advance to all who respond.
Trying out automl_gs in a new conda env using the titanic dataset. After each iteration I get the error:
ValueError: Parent directory of model_weights.hdf5 doesn't exist, can't save.
Same behavior running from command line or within ipython following the example notebook. To clarify, it's finding titanic.csv fine, the error seems to be when saving the intermediate results. Full traceback available below.
$ automl_gs titanic.csv Survived
/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
metrics = yaml.load(f)
Solving a binary_classification problem, maximizing accuracy using tensorflow.
Modeling with field specifications:
Pclass: categorical
Name: ignore
Sex: categorical
Age: numeric
Siblings/Spouses Aboard: categorical
Parents/Children Aboard: categorical
Fare: numeric
/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:126: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
hps = yaml.load(f)
0%| | 0/100 [00:00<?, ?trial/s/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/automl_gs/utils_automl.py:199: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
metrics = yaml.load(f)[problem_type]
Traceback (most recent call last):████████████████████████████████████ | 16/20 [00:06<00:01, 2.35epoch/s]
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)██████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00, 2.83epoch/s]
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1620, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1047, in run
fetches=fetches, feed_dict=feed_dict, **kwargs)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]
Caused by op 'save/SaveV2', defined at:
File "model.py", line 46, in <module>
model_train(df, encoders, args, model)
File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 377, in model_train
batch_size=256)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1363, in fit
validation_steps=validation_steps)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 291, in fit_loop
callbacks.on_train_end()
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 158, in on_train_end
callback.on_train_end(logs)
File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 398, in on_train_end
self.model.save_weights('model_weights.hdf5')
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1365, in save_weights
self._checkpointable_saver.save(filepath, session=session)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1178, in save
self._last_save_saver = saver_lib.Saver(var_list=named_variables)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1281, in __init__
self.build()
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1293, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1330, in _build
build_save=build_save, build_restore=build_restore)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 775, in _build_internal
save_tensor = self._AddSaveOps(filename_tensor, saveables)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 275, in _AddSaveOps
save = self.save_op(filename_tensor, saveables)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 193, in save_op
tensors)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1687, in save_v2
shape_and_slices=shape_and_slices, tensors=tensors, name=name)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
self._traceback = tf_stack.extract_stack()
NotFoundError (see above for traceback): ; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_STRING, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_22, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, _arg_Const_1_0_10, _arg_Const_22_0_13, _arg_Const_2_0_14, _arg_Const_3_0_15, _arg_Const_14_0_4, _arg_Const_17_0_7, _arg_Const_20_0_11, _arg_Const_4_0_16, _arg_Const_5_0_17, _arg_Const_6_0_18, _arg_Const_7_0_19, _arg_Const_8_0_20, _arg_Const_11_0_1, _arg_Const_9_0_21, hidden_1/bias/Read/ReadVariableOp, hidden_1/bias/AdamW/Read/ReadVariableOp, hidden_1/bias/AdamW_1/Read/ReadVariableOp, hidden_1/kernel/Read/ReadVariableOp, hidden_1/kernel/AdamW/Read/ReadVariableOp, hidden_1/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_10_0_0, bn_1/beta/Read/ReadVariableOp, bn_1/beta/AdamW/Read/ReadVariableOp, bn_1/beta/AdamW_1/Read/ReadVariableOp, bn_1/gamma/Read/ReadVariableOp, bn_1/gamma/AdamW/Read/ReadVariableOp, bn_1/gamma/AdamW_1/Read/ReadVariableOp, bn_1/moving_mean/Read/ReadVariableOp, bn_1/moving_variance/Read/ReadVariableOp, _arg_Const_12_0_2, hidden_2/bias/Read/ReadVariableOp, hidden_2/bias/AdamW/Read/ReadVariableOp, hidden_2/bias/AdamW_1/Read/ReadVariableOp, hidden_2/kernel/Read/ReadVariableOp, hidden_2/kernel/AdamW/Read/ReadVariableOp, hidden_2/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_13_0_3, bn_2/beta/Read/ReadVariableOp, bn_2/beta/AdamW/Read/ReadVariableOp, bn_2/beta/AdamW_1/Read/ReadVariableOp, bn_2/gamma/Read/ReadVariableOp, bn_2/gamma/AdamW/Read/ReadVariableOp, bn_2/gamma/AdamW_1/Read/ReadVariableOp, bn_2/moving_mean/Read/ReadVariableOp, bn_2/moving_variance/Read/ReadVariableOp, _arg_Const_15_0_5, hidden_3/bias/Read/ReadVariableOp, hidden_3/bias/AdamW/Read/ReadVariableOp, hidden_3/bias/AdamW_1/Read/ReadVariableOp, hidden_3/kernel/Read/ReadVariableOp, hidden_3/kernel/AdamW/Read/ReadVariableOp, hidden_3/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_16_0_6, bn_3/beta/Read/ReadVariableOp, bn_3/beta/AdamW/Read/ReadVariableOp, bn_3/beta/AdamW_1/Read/ReadVariableOp, bn_3/gamma/Read/ReadVariableOp, bn_3/gamma/AdamW/Read/ReadVariableOp, bn_3/gamma/AdamW_1/Read/ReadVariableOp, bn_3/moving_mean/Read/ReadVariableOp, bn_3/moving_variance/Read/ReadVariableOp, _arg_Const_18_0_8, hidden_4/bias/Read/ReadVariableOp, hidden_4/bias/AdamW/Read/ReadVariableOp, hidden_4/bias/AdamW_1/Read/ReadVariableOp, hidden_4/kernel/Read/ReadVariableOp, hidden_4/kernel/AdamW/Read/ReadVariableOp, hidden_4/kernel/AdamW_1/Read/ReadVariableOp, _arg_Const_19_0_9, bn_4/beta/Read/ReadVariableOp, bn_4/beta/AdamW/Read/ReadVariableOp, bn_4/beta/AdamW_1/Read/ReadVariableOp, bn_4/gamma/Read/ReadVariableOp, bn_4/gamma/AdamW/Read/ReadVariableOp, bn_4/gamma/AdamW_1/Read/ReadVariableOp, bn_4/moving_mean/Read/ReadVariableOp, bn_4/moving_variance/Read/ReadVariableOp, _arg_Const_21_0_12, output/bias/Read/ReadVariableOp, output/bias/AdamW/Read/ReadVariableOp, output/bias/AdamW_1/Read/ReadVariableOp, output/kernel/Read/ReadVariableOp, output/kernel/AdamW/Read/ReadVariableOp, output/kernel/AdamW_1/Read/ReadVariableOp, training/TFOptimizer/beta1_power, training/TFOptimizer/beta2_power)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "model.py", line 46, in <module>
model_train(df, encoders, args, model)
File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 377, in model_train
batch_size=256)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1363, in fit
validation_steps=validation_steps)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 291, in fit_loop
callbacks.on_train_end()
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 158, in on_train_end
callback.on_train_end(logs)
File "/Volumes/Backstaff/scratch/automl/automl_train/pipeline.py", line 398, in on_train_end
self.model.save_weights('model_weights.hdf5')
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1365, in save_weights
self._checkpointable_saver.save(filepath, session=session)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1186, in save
global_step=checkpoint_number)
File "/Users/dnowacki/miniconda3/envs/automl/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1637, in save
raise exc
ValueError: Parent directory of model_weights.hdf5 doesn't exist, can't save.
Metrics:
trial_id: 3e5c75e7-53be-4e75-8558-17b511440ba9
epoch: 20
time_completed: 2019-03-26 22:03:19
log_loss: 0.6697867036089022
accuracy: 0.6142322097378277
auc: 0.8345666587733839
precision: 0.30711610486891383
recall: 0.5
f1: 0.3805104408352668
1%|▊ | 1/100 [00:07<12:16, 7.44s/trial]
tqdm.write()
apparently gets buggy when terminal height is less than the text being replaced, and has text artifacts if the text replaced has a different line length.
As a result, the script cannot print hyperparameters, as the hps + metrics will go above typical terminal height, and the lines of hyperparameters are variable unlike metrics.
I removed hps printing for the time being (it's not strictly necessary anyways) but I would like a better solution.
This can be hard to figure out since I can't share the data. I am running it in Google Colab.
automl_grid_search(csv_path='/content/CLT_all_tasks_trial_level.csv', target_field='correctResp', model_name='tpu', tpu_address = tpu_address)
Solving a binary_classification problem, maximizing accuracy using tensorflow.
Modeling with field specifications:
Subject: categorical
Finished: categorical
TrainingDay: categorical
Condition: categorical
CondPrev: categorical
TaskNumber: categorical
TaskId: categorical
TrialNumber: numeric
PresentationStimulus: numeric
StimTime: numeric
RespToTime: numeric
RT: numeric
SubjResp: categorical
OutcomeInt: categorical
TaskOutcomeInt: categorical
StimDim1: categorical
StimDim2: categorical
StimDim3: categorical
StimDim4: categorical
IntendedRule: categorical
Background: categorical
StimDimWord1: categorical
StimDimWord2: categorical
StimDimWord3: categorical
StimDimWord4: categorical
ExpResp: categorical
DistinctDays: categorical
out: categorical
StimType: categorical
0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-16-ca69e1157d4e> in <module>()
2 target_field='correctResp',
3 model_name='tpu',
----> 4 tpu_address = tpu_address)
/usr/local/lib/python3.6/dist-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
92 header=(best_result is None))
93
---> 94 train_results = results.tail(1).to_dict('records')[0]
95
96 # If the target metric improves, save the new hps/files,
IndexError: list index out of range
Here is the log.
Apr 3, 2019, 5:31:44 PM | WARNING | ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
-- | -- | --
Apr 3, 2019, 5:31:44 PM | WARNING | raise ValueError(msg_err.format(type_err, X.dtype))
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
Apr 3, 2019, 5:31:44 PM | WARNING | allow_nan=force_all_finite == 'allow-nan')
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 573, in check_array
Apr 3, 2019, 5:31:44 PM | WARNING | y_pred = check_array(y_pred, ensure_2d=False)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py", line 1763, in log_loss
Apr 3, 2019, 5:31:44 PM | WARNING | logloss = log_loss(y_true, y_pred)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/content/tpu_train/pipeline.py", line 1126, in on_epoch_end
Apr 3, 2019, 5:31:44 PM | WARNING | callback.on_epoch_end(epoch, logs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/callbacks.py", line 251, in on_epoch_end
Apr 3, 2019, 5:31:44 PM | WARNING | callbacks.on_epoch_end(epoch, epoch_logs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1734, in _pipeline_fit_loop
Apr 3, 2019, 5:31:44 PM | WARNING | validation_steps=validation_steps)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1633, in _pipeline_fit
Apr 3, 2019, 5:31:44 PM | WARNING | steps_per_epoch, validation_steps, **kwargs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1532, in fit
Apr 3, 2019, 5:31:44 PM | WARNING | batch_size=64 * 8)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/content/tpu_train/pipeline.py", line 1095, in model_train
Apr 3, 2019, 5:31:44 PM | WARNING | model_train(df, encoders, args, model)
Apr 3, 2019, 5:31:44 PM | WARNING | File "model.py", line 69, in <module>
Apr 3, 2019, 5:31:44 PM | WARNING | Traceback (most recent call last):
AFAIK, the largest number in the dataset is 12007245.
Thanks for the help!
from automl_gs import automl_grid_search
automl_grid_search('Housing.csv','price')
Solving a regression problem, minimizing mse using tensorflow.
Modeling with field specifications:
area: numeric
bedrooms: numeric
bathrooms: categorical
stories: categorical
mainroad: categorical
guestroom: categorical
basement: categorical
hotwaterheating: categorical
airconditioning: categorical
parking: categorical
prefarea: categorical
furnishingstatus: categorical
FileNotFoundError Traceback (most recent call last)
in
1 from automl_gs import automl_grid_search
----> 2 automl_grid_search('Housing.csv','price')
~/.local/lib/python3.6/site-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
85 # and append to the metrics CSV.
86 results = pd.read_csv(os.path.join(train_folder,
---> 87 "metadata", "results.csv"))
88 results = results.assign(**params)
89 results.insert(0, 'trial_id', uuid.uuid4())
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
700 skip_blank_lines=skip_blank_lines)
701
--> 702 return _read(filepath_or_buffer, kwds)
703
704 parser_f.name = name
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
427
428 # Create the parser.
--> 429 parser = TextFileReader(filepath_or_buffer, **kwds)
430
431 if chunksize or iterator:
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, f, engine, **kwds)
893 self.options['has_index_names'] = kwds['has_index_names']
894
--> 895 self._make_engine(self.engine)
896
897 def close(self):
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1120 def _make_engine(self, engine='c'):
1121 if engine == 'c':
-> 1122 self._engine = CParserWrapper(self.f, **self.options)
1123 else:
1124 if engine == 'python':
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, src, **kwds)
1851 kwds['usecols'] = self.usecols
1852
-> 1853 self._reader = parsers.TextReader(src, **kwds)
1854 self.unnamed_cols = self._reader.unnamed_cols
1855
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'
While trying out automl_gs on this uci dataset, I got this error:
Traceback (most recent call last):
File "model.py", line 59, in <module>
model_train(df, encoders, args, model)
File "C:\Users\josep\automl_train\pipeline.py", line 835, in model_train
batch_size=256)
File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 776, in fit
shuffle=shuffle)
File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2382, in _standardize_user_data
exception_prefix='input')
File "C:\Users\josep\venv\lib\site-packages\tensorflow\python\keras\engine\training_utils.py", line 362, in standardize_input_data
' but got array with shape ' + str(data_shape))
ValueError: Error when checking input: expected input_son to have shape (1,) but got array with shape (2,)
After some sleuthing, I eventually figured out that the error is that the shape size for the offending column was set incorrectly in build_model()
:
input_son_size = len(encoders['son_encoder'].classes_)
input_son = Input(
shape=(input_son_size if input_son_size != 2 else 1,), name="input_son")
I don't understand what the purpose of that `if-else clause. It looks like this change was introduced in 1dcb9e2; reverting that commit allows my model to work.
😅
(I need to fill this in with a full reproducer) but as a placeholder:
I just ran automl-gs for the first time, and it errored out with:
/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
return f(*args, **kwds)
/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/jinja2/utils.py:485: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import MutableMapping
/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/jinja2/runtime.py:318: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Mapping
0%| | 0/100 [00:00<?, ?trial/s]
0%| | 0/20 [00:00<?, ?epoch/s]�[ATraceback (most recent call last):
File "model.py", line 2, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
0%| | 0/20 [00:00<?, ?epoch/s]�[A/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py:858: ResourceWarning: subprocess 12646 is still running
ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Solving a classification problem, maximizing accuracy using tensorflow.
Modeling with field specifications:
message.received: numeric
message.sender: categorical
message.subject: text
message.size: numeric
message.recipients: categorical
response.sent: numeric
response.sender: categorical
response.subject: text
response.size: numeric
response.recipients: categorical
lag_readable: text
Traceback (most recent call last):
File "/Users/jberman/.local/bin/automl_gs", line 10, in <module>
sys.exit(cmd())
File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 175, in cmd
tpu_address=args.tpu_address)
File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 87, in automl_grid_search
"metadata", "results.csv"))
File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/jberman/.local/share/virtualenvs/automl/lib/python3.7/site-packages/pandas/io/parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='automl_results.csv' mode='w' encoding='UTF-8'>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Where at least that first error there is because the auto-generated .py
file doesn't contain a shebang:
⊙ head automl_train/model.py jberman@USNYHJBERMANMB2 ●
import argparse
import pandas as pd
from pipeline import *
and seems to be being executed by just calling python
, whereas it needs to use sys.executable
from the original Python that was used to run automl-gs
(and which is where pandas would be installed to).
(The blow up to be clear is because it's installed into a virtualenv, and in fact whatever other Python it's running model.py
with in fact does not have pandas installed to it).
Hi, and thanks for your work.
I tried to run your project using a dataset that have some fields that starts with numbers and this throws a Syntax error.
For example, with a field named '1stFlrSF', I got the following error :
Traceback (most recent call last):
File "model.py", line 3, in <module>
from pipeline import *
File "[MY_PATH]/automl_train/pipeline.py", line 1090
1stflrsf_enc = df['1stFlrSF']
^
SyntaxError: invalid syntax
0%| | 0/20 [00:00<?, ?epoch/s]Traceback (most recent call last):
File "[MY_PATH]/test_auto_ml/Test.py", line 8, in <module>
do_the_thing("[MY_DATASET_PATH]/train.csv","SalePrice")
File "[MY_PATH]/test_auto_ml/Test.py", line 5, in do_the_thing
automl_grid_search(path,label)
File "[MY_PYTHON_PATH]/site-packages/automl_gs/automl_gs.py", line 94, in automl_grid_search
train_results = results.tail(1).to_dict('records')[0]
IndexError: list index out of range
I was able to train using the Titanic Dataset. In the docs it says to train use the following command:
python3 model.py -d data.csv -m predict
Does this mean prediction features have to be in .csv file? Is it possible to predict a single row in a CSV file from python without using the terminal?
Hello, what is the format of the predicted data set data.csv? Is it tensorflow by default
I almost feel bad for reporting this one.
Using the yacht hydrodynamics UIC dataset, I got this error:
(env) (base) C:\Users\josep\Jeenee\AutoML\automl_train>python model.py -d ..\automl-testbench\yacht-hydrodynamics\data.csv -m train
Traceback (most recent call last):
File "model.py", line 46, in <module>
model_train(df, encoders, args, model)
File "C:\Users\josep\Jeenee\AutoML\automl_train\pipeline.py", line 347, in model_train
X, y = process_data(df, encoders)
File "C:\Users\josep\Jeenee\AutoML\automl_train\pipeline.py", line 296, in process_data
df['Length-beam ratio'].values, encoders['length_beam_ratio_bins'], labels=False, include_lowest=True, duplicates='drop')
File "C:\Users\josep\Jeenee\AutoML\venv\lib\site-packages\pandas\core\reshape\tile.py", line 235, in cut
raise ValueError('bins must increase monotonically.')
ValueError: bins must increase monotonically.
Hmmm, odd. Let's take a look at pipeline.py...
# Length-beam ratio
length_beam_ratio_enc = df['Length-beam ratio']
length_beam_ratio_bins = length_beam_ratio_enc.quantile(
np.linspace(0, 1, 10+1))
encoders['length_beam_ratio_bins'] = length_beam_ratio_bins
# ....
# Length-beam ratio
length_beam_ratio_enc = pd.cut(
df['Length-beam ratio'].values, encoders['length_beam_ratio_bins'], labels=False, include_lowest=True, duplicates='drop')
The error is referring to the .cut line, which I had previously patched to include the duplicates='drop'
bit. But the current error isn't related to that, but complaining about the encoder. Hmmm, nothing looks odd in the data about that column. Let's open up pdb and take a look...
>>> encoders['length_beam_ratio_bins']
[2.73, 2.76, 3.15, 3.15, 3.1499999999999995, 3.15, 3.17, 3.32, 3.51, 3.51, 3.64]
facepalm
Well now! I suppose I'll concede that's technically not monotonically increasing!
I appended a .round(4)
to the two .quantile
lines of encoders/numeric
(lines 12 and 15), which worked for this test case. This is certainly not an adequate general solution, however, as e.g. that'll break data on data that needs precision at the 5th decimal place...
Is there any way to use a tool like automl-gs
for bin-packing problems? I've seen a way to model it as linear optimization where you do a cross join of all objects and all potential bins, set a constraint of each object being selected once, and then optimizing the bins however desired. Cross joins can end up being needlessly heavy though, so I'm wondering if there is a way to model that sort of problem such that you could use pre-optimized and continually developed tools like this one.
Allow the ability to use an image as an input, in conjunction with other fields.
The problem is that the pretrained models are too heavy, and training a CNN from scratch is too time consuming.
Solution is to use a fast image-encoding approach, which work I'll be starting after automl-gs.
When will compatibility with TensorFlow 2 be added, if ever?
I tried a simple dataset to play around with this, and I am running into
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
I believe this is because scikit wants there to be multiple iterations of the variable you're trying to predict. Might want to add it to the docs~
Input:
square.txt
Full log if needed:
>$ automl_gs square.csv square
/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
metrics = yaml.load(f)
Solving a classification problem, maximizing accuracy using tensorflow.
Modeling with field specifications:
real: numeric
fake1: numeric
fake2: numeric
fake3: numeric
fake4: numeric
text: categorical
bool: categorical
/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:126: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
hps = yaml.load(f)
0%| | 0/100 [00:00<?, ?trial/s/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:199: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
metrics = yaml.load(f)[problem_type]
Traceback (most recent call last):
File "model.py", line 47, in <module>
model_train(df, encoders, args, model)
File "../automodel/automl_train/pipeline.py", line 408, in model_train
for train_indices, val_indices in split.split(np.zeros(y.shape[0]), y):
File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1315, in split
for train, test in self._iter_indices(X, y, groups):
File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1695, in _iter_indices
raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
Traceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "/usr/local/bin/automl_gs", line 10, in <module>
sys.exit(cmd())
File "/usr/local/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 175, in cmd
tpu_address=args.tpu_address)
File "/usr/local/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 87, in automl_grid_search
"metadata", "results.csv"))
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'
Thank you so much for your library. Is it possible to use it for the regression problems?
Trying out the example titanic dataset in a conda environment and encountered the following error very frequently such that it disrupts the tqdm progress bar.
/anaconda3/envs/automl-gs/lib/python3.6/site-packages/automl_gs/utils_automl.py:270:
YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default
Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
metrics = yaml.load(f)
Hi,
Just trying to work through your example colab notebook. I work through the cells, upload the titanic.csv, and get
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-3-9f452c025bdd> in <module>()
2 target_field='origin',
3 model_name='tpu',
----> 4 tpu_address = tpu_address)
5 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1889 kwds["usecols"] = self.usecols
1890
-> 1891 self._reader = parsers.TextReader(src, **kwds)
1892 self.unnamed_cols = self._reader.unnamed_cols
1893
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: [Errno 2] File tpu_train/metadata/results.csv does not exist: 'tpu_train/metadata/results.csv'
When I try to run the jupyter notebook that you provided on the link
https://github.com/minimaxir/automl-gs/blob/master/docs/automl_gs_tutorial.ipynb
I'm getting FileNotFoundError. Also, when I tried with a local csv file, I got the same error again.
Btw, my OS is Windows 10.
Input:
from automl_gs import automl_grid_search
automl_grid_search("data.csv", "diagnosis")
Output:
Solving a binary_classification problem, maximizing accuracy using tensorflow.
Modeling with field specifications:
id: ignore
radius_mean: numeric
texture_mean: numeric
perimeter_mean: numeric
area_mean: numeric
smoothness_mean: numeric
compactness_mean: numeric
concavity_mean: numeric
concave points_mean: numeric
symmetry_mean: numeric
fractal_dimension_mean: numeric
radius_se: numeric
texture_se: numeric
perimeter_se: numeric
area_se: numeric
smoothness_se: numeric
compactness_se: numeric
concavity_se: numeric
concave points_se: numeric
symmetry_se: numeric
fractal_dimension_se: numeric
radius_worst: numeric
texture_worst: numeric
perimeter_worst: numeric
area_worst: numeric
smoothness_worst: numeric
compactness_worst: numeric
concavity_worst: numeric
concave points_worst: numeric
symmetry_worst: numeric
fractal_dimension_worst: numeric
Unnamed: 32: numeric
0%
0/100 [00:04<?, ?trial/s]
0%
0/20 [00:00<?, ?epoch/s]
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-37-308e97508c91> in <module>()
1 from automl_gs import automl_grid_search
2
----> 3 automl_grid_search("data.csv", "diagnosis")
5 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1889 kwds["usecols"] = self.usecols
1890
-> 1891 self._reader = parsers.TextReader(src, **kwds)
1892 self.unnamed_cols = self._reader.unnamed_cols
1893
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: [Errno 2] File automl_train/metadata/results.csv does not exist: 'automl_train/metadata/results.csv'
Wanted to try automl_gs, but I get this error and can't figure out why.
File "C:\Users\XXX\automl_train\model.py", line 3, in
from pipeline import *
File "C:\Users\XXX\automl_train\pipeline.py", line 29
0_enc = df['0']
^
SyntaxError: invalid decimal literal
Any ideas about that?
https://twitter.com/amuellerml/status/1129443826945396737
If this is using the scikit-learn API it might be more straightforward.
Hello - I am trying to use this package to provide predictions for my Data Science Capstone project. When I run against my training data, I get the following exception/error:
raceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "model.py", line 63, in
model_train(df, encoders, args, model)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 903, in model_train
X, y = process_data(df, encoders)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 758, in process_data
df['msa_md'].values, encoders['msa_md_bins'], labels=False, include_lowest=True)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 234, in cut
duplicates=duplicates)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 332, in _bins_to_cuts
"the 'duplicates' kwarg".format(bins=bins))
ValueError: Bin edges must be unique: array([ -1., -1., 18., 63., 118., 192., 247., 305., 329., 371., 408.]).
You can drop duplicate edges by setting the 'duplicates' kwarg
Traceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\Scripts\automl_gs.exe_main.py", line 9, in
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 175, in cmd
tpu_address=args.tpu_address)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 87, in automl_grid_search
"metadata", "results.csv"))
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1708, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit
File "pandas_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'automl_train\metadata\results.csv' does not exist
Both framework leverage categorical indices, which may require a slightly different approach compared to xgboost.
The stock Google Colab link in the README.md isn't working correctly. I added a line to download the titanic.csv, then hit run all. Full stack trace below:
Solving a binary_classification problem, maximizing accuracy using tensorflow.
Modeling with field specifications:
PassengerId: numeric
Pclass: categorical
Name: ignore
Sex: categorical
Age: numeric
SibSp: categorical
Parch: categorical
Ticket: ignore
Fare: numeric
Cabin: categorical
Embarked: categorical
0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-5-17dc9e2d602c> in <module>()
2 target_field='Survived',
3 model_name='tpu',
----> 4 tpu_address = tpu_address)
/usr/local/lib/python3.6/dist-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
85 # and append to the metrics CSV.
86 results = pd.read_csv(os.path.join(train_folder,
---> 87 "metadata", "results.csv"))
88 results = results.assign(**params)
89 results.insert(0, 'trial_id', uuid.uuid4())
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
707 skip_blank_lines=skip_blank_lines)
708
--> 709 return _read(filepath_or_buffer, kwds)
710
711 parser_f.__name__ = name
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
447
448 # Create the parser.
--> 449 parser = TextFileReader(filepath_or_buffer, **kwds)
450
451 if chunksize or iterator:
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
816 self.options['has_index_names'] = kwds['has_index_names']
817
--> 818 self._make_engine(self.engine)
819
820 def close(self):
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in _make_engine(self, engine)
1047 def _make_engine(self, engine='c'):
1048 if engine == 'c':
-> 1049 self._engine = CParserWrapper(self.f, **self.options)
1050 else:
1051 if engine == 'python':
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1693 kwds['allow_leading_cols'] = self.index_col is not False
1694
-> 1695 self._reader = parsers.TextReader(src, **kwds)
1696
1697 # XXX
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: File b'tpu_train/metadata/results.csv' does not exist
While trying out automl_gs with Jupiter, I got file not found error:
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'
Trying do it with terminal before the file missing error it returns:
AttributeError: 'float' object has no attribute 'lower'
Searching in StackOverflow, I found that the problem is how pandas converts inputs to python datatypes.
Is it possible to prevent this behaviour using automl_gs ?
xgboost supports GPUs by setting gpu_hist
instead of hist
, and the code is prepared for that. Two problems:
Will keep at CPU support for now but there has to be a better solution.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.