automl / auto-sklearn Goto Github PK
View Code? Open in Web Editor NEWAutomated Machine Learning with scikit-learn
Home Page: https://automl.github.io/auto-sklearn
License: BSD 3-Clause "New" or "Revised" License
Automated Machine Learning with scikit-learn
Home Page: https://automl.github.io/auto-sklearn
License: BSD 3-Clause "New" or "Revised" License
My understanding is that when the predict method is called a whole ensemble of the models need to be run on the prediction X dataset. And it does so on one core only. Is there a way to fan it out to all the cores so the models run in parallel?
Hi,
I have a problem with the installation.
After following the steps from the documentation:
pip install scikit-learn==0.15.2
pip install git+https://github.com/mfeurer/HPOlibConfigSpace#egg=HPOlibConfigSpace0.1dev
pip install git+https://[email protected]/mfeurer/paramsklearn.git@73d8643b2849db753ddc7b8909d01e6cee9bafc6 --no-deps
pip install git+https://github.com/automl/HPOlib#egg=HPOlib0.2
pip install --editable git+https://bitbucket.org/mfeurer/pymetalearn/#egg=pyMetaLearn
git clone https://github.com/automl/auto-sklearn.git
cd auto-sklearn
python setup.py install
I get the following error message (including last two output lines before error) after the final install:
Installed /usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg
Processing dependencies for AutoSklearn==0.0.1.dev0
error: scipy 0.15.1 is installed but scipy==0.14.0 is required by set(['ParamSklearn'])
Manually installing sklearn 0.14.0 is also of no help, it then tries to reinstall 0.15.2 during python setup.py install :(
Importing autosklearn fails with this error:
(some directory structure ...)/pymetalearn/pyMetaLearn/metafeatures/metafeatures.py in <module>()
12 import sklearn.metrics
13 import sklearn.cross_validation
---> 14 from sklearn.utils import check_arrays
15
16 from ParamSklearn.implementations.Imputation import Imputer
ImportError: cannot import name check_arrays
ImportError: cannot import name check_arrays
Any idea what is going wrong? :)
Greetings,
I tried to run the example from the README.md. All steps run without errors, until I try to score the test set. Then I get a ValueError: No models fitted!. The console output shows various runtime warnings, among them indications about missing files. What could be the cause?
(Unfortunately, github promptly refuses to accept a text file for some reason, so I pasted the console output during fitting and the error trace when trying to score below.)
My setup is a Python 2.7.6 virtualenv, running from IPython 4.0.0. The installed packages are as follows:
argparse (1.2.1)
AutoSklearn (0.0.1.dev0)
cma (1.1.06)
decorator (4.0.2)
funcsigs (0.4)
HPOlib (0.1.0)
HPOlibConfigSpace (0.1dev)
ipython (4.0.0)
ipython-genutils (0.1.0)
liac-arff (2.1.0)
lockfile (0.10.2)
matplotlib (1.4.3)
mock (1.3.0)
networkx (1.10)
nose (1.3.7)
numpy (1.9.0)
pandas (0.16.2)
ParamSklearn (0.1dev)
path.py (8.1.1)
pbr (1.8.0)
pexpect (3.3)
pickleshare (0.5)
pip (1.5.4)
protobuf (3.0.0-alpha-1)
psutil (3.2.1)
pyMetaLearn (0.1dev)
pymongo (3.0.3)
pyparsing (2.0.3)
python-dateutil (2.4.2)
pytz (2015.6)
PyYAML (3.11)
scikit-learn (0.15.2)
scipy (0.14.0)
setuptools (18.3.2)
simplegeneric (0.8.1)
six (1.9.0)
traitlets (4.0.0)
wheel (0.24.0)
wsgiref (0.1.2)
Thanks for your response.
Console output:
[INFO] [09-24 17:58:47:AutoML_54da6690e2c896d2d9aafe349b066645_1]: Remaining time after reading 54da6690e2c896d2d9aafe349b066645 3600.00 sec
/media/selects/venv/py27/local/lib/python2.7/site-packages/numpy/lib/nanfunctions.py:1057: RuntimeWarning: Degrees of freedom <= 0 for slice.
warnings.warn("Degrees of freedom <= 0 for slice.", RuntimeWarning)
/media/selects/venv/py27/local/lib/python2.7/site-packages/numpy/lib/nanfunctions.py:598: RuntimeWarning: Mean of empty slice
warnings.warn("Mean of empty slice", RuntimeWarning)
[WARNING] [09-24 17:58:47:pyMetaLearn.input.aslib_simple]: Not found: /media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/metalearni
ng/files/multiclass.classification_dense_acc_metric/ground_truth.arff (maybe you want to add it)
[WARNING] [09-24 17:58:47:pyMetaLearn.input.aslib_simple]: Not found: /media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/metalearni
ng/files/multiclass.classification_dense_acc_metric/citation.bib (maybe you want to add it)
[WARNING] [09-24 17:58:47:pyMetaLearn.input.aslib_simple]: Not found: /media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/metalearni
ng/files/multiclass.classification_dense_acc_metric/cv.arff (maybe you want to add it)
[INFO] [09-24 17:58:48:autosklearn.metalearning.metalearning]: Reading meta-data took 0.59 seconds
['133', '132', '131', '130', '137', '136', '135', '134', '139', '138', '24', '25', '26', '27', '20', '21', '22', '23', '28', '29', '4', '8', '120', '121', '122', '123', '124', '125', '126', '127',
'128', '129', '59', '58', '55', '54', '57', '56', '51', '50', '53', '52', '115', '114', '88', '89', '111', '110', '113', '112', '82', '83', '80', '81', '119', '118', '84', '85', '3', '7', '108', '1
09', '102', '103', '100', '101', '106', '107', '104', '105', '39', '38', '33', '32', '31', '30', '37', '36', '35', '34', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '2', '6', '99',
'98', '91', '90', '93', '92', '95', '94', '97', '96', '11', '10', '13', '12', '15', '14', '17', '16', '19', '18', '117', '116', '41', '48', '49', '46', '86', '44', '45', '42', '43', '40', '87', '1'
, '5', '9', '142', '140', '141', '77', '76', '75', '74', '73', '72', '71', '70', '79', '78', '47']
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 1118_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 314_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 454_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 809_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 948_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 1118_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 948_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 454_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 809_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 314_bac
[INFO] [09-24 17:58:48:AutoML_54da6690e2c896d2d9aafe349b066645_1]: Time left for 54da6690e2c896d2d9aafe349b066645 after finding initial configurations: 3598.94sec
Calling: smac --numRun 1 --scenario /tmp/autosklearn_tmp_14167_1103/54da6690e2c896d2d9aafe349b066645.scenario --initial-challengers " -balancing:strategy 'weighting' -classifier 'lda' -imputation:s
trategy 'median' -kernel_pca:gamma '0.0290194572424' -kernel_pca:kernel 'rbf' -kernel_pca:n_components '1971' -lda:n_components '232' -lda:tol '0.000804876897084' -preprocessor 'kernel_pca' -rescal
ing:strategy 'min/max'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'libsvm_svc' -imputation:strategy 'median' -liblinear_svc_preprocessor:C '18592.5543358' -liblinear_svc_p
reprocessor:class_weight 'auto' -liblinear_svc_preprocessor:dual 'False' -liblinear_svc_preprocessor:fit_intercept 'True' -liblinear_svc_preprocessor:intercept_scaling '1' -liblinear_svc_preprocess
or:loss 'l2' -liblinear_svc_preprocessor:multi_class 'ovr' -liblinear_svc_preprocessor:penalty 'l2' -liblinear_svc_preprocessor:tol '0.040232270855' -libsvm_svc:C '6111.7121149' -libsvm_svc:class_w
eight 'None' -libsvm_svc:coef0 '0.844884936773' -libsvm_svc:degree '5' -libsvm_svc:gamma '0.117882960246' -libsvm_svc:kernel 'poly' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'False' -libsvm_s
vc:tol '0.00109298090501' -preprocessor 'liblinear_svc_preprocessor' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'liblinear_svc' -imputation:s
trategy 'mean' -kernel_pca:gamma '1.6331524928' -kernel_pca:kernel 'rbf' -kernel_pca:n_components '761' -liblinear_svc:C '44.5016816038' -liblinear_svc:class_weight 'auto' -liblinear_svc:dual 'Fals
e' -liblinear_svc:fit_intercept 'True' -liblinear_svc:intercept_scaling '1' -liblinear_svc:loss 'l2' -liblinear_svc:multi_class 'ovr' -liblinear_svc:penalty 'l2' -liblinear_svc:tol '0.0018788986680
6' -preprocessor 'kernel_pca' -rescaling:strategy 'normalize'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'libsvm_svc' -imputation:strategy 'mean' -libsvm_svc:C '50.8707992
587' -libsvm_svc:class_weight 'auto' -libsvm_svc:gamma '4.72168867253' -libsvm_svc:kernel 'rbf' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'True' -libsvm_svc:tol '1.67692533041e-05' -preproces
sor 'select_rates' -rescaling:strategy 'normalize' -select_rates:alpha '0.318343160914' -select_rates:mode 'fdr' -select_rates:score_func 'f_classif'" --initial-challengers " -balancing:strategy 'n
one' -classifier 'ridge' -imputation:strategy 'median' -kernel_pca:gamma '2.43149422021' -kernel_pca:kernel 'rbf' -kernel_pca:n_components '1194' -preprocessor 'kernel_pca' -rescaling:strategy 'nor
malize' -ridge:alpha '1.30657587648e-05' -ridge:fit_intercept 'True' -ridge:tol '0.000760986834404'" --initial-challengers " -adaboost:algorithm 'SAMME.R' -adaboost:learning_rate '0.400363929326' -
adaboost:max_depth '5' -adaboost:n_estimators '319' -balancing:strategy 'none' -classifier 'adaboost' -imputation:strategy 'most_frequent' -preprocessor 'no_preprocessing' -rescaling:strategy 'min/
max'" --initial-challengers " -balancing:strategy 'none' -classifier 'qda' -imputation:strategy 'mean' -pca:keep_variance '0.748479656855' -pca:whiten 'False' -preprocessor 'pca' -qda:reg_param '3.
82874880102' -qda:tol '0.0130621640728' -rescaling:strategy 'normalize'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'libsvm_svc' -imputation:strategy 'mean' -libsvm_svc:C '
18807.7593252' -libsvm_svc:class_weight 'None' -libsvm_svc:gamma '0.940704535703' -libsvm_svc:kernel 'rbf' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'True' -libsvm_svc:tol '0.00148731196993'
-preprocessor 'select_rates' -rescaling:strategy 'min/max' -select_rates:alpha '0.126666738937' -select_rates:mode 'fdr' -select_rates:score_func 'f_classif'" --initial-challengers " -balancing:str
ategy 'weighting' -classifier 'lda' -imputation:strategy 'mean' -kitchen_sinks:gamma '1.48108179896' -kitchen_sinks:n_components '3450' -lda:n_components '25' -lda:tol '0.0426553560955' -preprocess
or 'kitchen_sinks' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'random_forest' -feature_agglomeration:affinity 'manhattan' -feature_agglomerat
ion:linkage 'average' -feature_agglomeration:n_clusters '76' -imputation:strategy 'median' -preprocessor 'feature_agglomeration' -random_forest:bootstrap 'True' -random_forest:criterion 'entropy' -
random_forest:max_depth 'None' -random_forest:max_features '1.60908385606' -random_forest:max_leaf_nodes 'None' -random_forest:min_samples_leaf '2' -random_forest:min_samples_split '12' -random_for
est:n_estimators '100' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'none' -classifier 'ridge' -imputation:strategy 'mean' -nystroem_sampler:coef0 '0.476829591723' -ny
stroem_sampler:degree '3' -nystroem_sampler:gamma '0.0817500204362' -nystroem_sampler:kernel 'poly' -nystroem_sampler:n_components '7840' -preprocessor 'nystroem_sampler' -rescaling:strategy 'min/m
ax' -ridge:alpha '3.52478796331e-06' -ridge:fit_intercept 'True' -ridge:tol '2.63925768895e-05'" --initial-challengers " -balancing:strategy 'none' -classifier 'random_forest' -imputation:strategy
'mean' -preprocessor 'no_preprocessing' -random_forest:bootstrap 'True' -random_forest:criterion 'gini' -random_forest:max_depth 'None' -random_forest:max_features '1.0' -random_forest:max_leaf_nod
es 'None' -random_forest:min_samples_leaf '1' -random_forest:min_samples_split '2' -random_forest:n_estimators '100' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'none
' -classifier 'lda' -imputation:strategy 'median' -lda:n_components '203' -lda:tol '0.0935342136025' -preprocessor 'select_rates' -rescaling:strategy 'normalize' -select_rates:alpha '0.048178281695
5' -select_rates:mode 'fwe' -select_rates:score_func 'f_classif'" --initial-challengers " -balancing:strategy 'none' -classifier 'sgd' -imputation:strategy 'mean' -preprocessor 'no_preprocessing' -
rescaling:strategy 'min/max' -sgd:alpha '0.0001' -sgd:eta0 '0.01' -sgd:fit_intercept 'True' -sgd:learning_rate 'optimal' -sgd:loss 'hinge' -sgd:n_iter '20' -sgd:penalty 'l2'" --initial-challengers
" -balancing:strategy 'weighting' -classifier 'liblinear_svc' -imputation:strategy 'mean' -kitchen_sinks:gamma '1.62106650658' -kitchen_sinks:n_components '6034' -liblinear_svc:C '780.976275468' -l
iblinear_svc:class_weight 'auto' -liblinear_svc:dual 'False' -liblinear_svc:fit_intercept 'True' -liblinear_svc:intercept_scaling '1' -liblinear_svc:loss 'l2' -liblinear_svc:multi_class 'ovr' -libl
inear_svc:penalty 'l2' -liblinear_svc:tol '2.60869016302e-05' -preprocessor 'kitchen_sinks' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'libsv
m_svc' -feature_agglomeration:affinity 'manhattan' -feature_agglomeration:linkage 'average' -feature_agglomeration:n_clusters '89' -imputation:strategy 'most_frequent' -libsvm_svc:C '246.452178174'
-libsvm_svc:class_weight 'auto' -libsvm_svc:gamma '0.0442300193285' -libsvm_svc:kernel 'rbf' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'False' -libsvm_svc:tol '0.0180487670379' -preprocessor
'feature_agglomeration' -rescaling:strategy 'standard'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'passive_aggresive' -imputation:strategy 'median' -passive_aggresive:C '
1.31125616578' -passive_aggresive:fit_intercept 'True' -passive_aggresive:loss 'hinge' -passive_aggresive:n_iter '948' -preprocessor 'select_percentile_classification' -rescaling:strategy 'min/max'
-select_percentile_classification:percentile '83.3669247487' -select_percentile_classification:score_func 'chi2'" --initial-challengers " -balancing:strategy 'none' -classifier 'sgd' -imputation:s
trategy 'most_frequent' -preprocessor 'no_preprocessing' -rescaling:strategy 'min/max' -sgd:alpha '0.00292211727831' -sgd:epsilon '0.0116887099622' -sgd:eta0 '0.080560671307' -sgd:fit_intercept 'Tr
ue' -sgd:learning_rate 'invscaling' -sgd:loss 'modified_huber' -sgd:n_iter '754' -sgd:penalty 'l1' -sgd:power_t '0.463498329665'" --initial-challengers " -balancing:strategy 'none' -classifier 'ran
dom_forest' -imputation:strategy 'mean' -preprocessor 'select_rates' -random_forest:bootstrap 'False' -random_forest:criterion 'entropy' -random_forest:max_depth 'None' -random_forest:max_features
'4.67839426105' -random_forest:max_leaf_nodes 'None' -random_forest:min_samples_leaf '10' -random_forest:min_samples_split '10' -random_forest:n_estimators '100' -rescaling:strategy 'standard' -sel
ect_rates:alpha '0.167486470473' -select_rates:mode 'fdr' -select_rates:score_func 'f_classif'" --initial-challengers " -balancing:strategy 'none' -classifier 'sgd' -imputation:strategy 'most_frequ
ent' -preprocessor 'select_rates' -rescaling:strategy 'min/max' -select_rates:alpha '0.155334914856' -select_rates:mode 'fpr' -select_rates:score_func 'f_classif' -sgd:alpha '6.49185336268e-05' -sg
d:eta0 '0.0665593974375' -sgd:fit_intercept 'True' -sgd:learning_rate 'optimal' -sgd:loss 'log' -sgd:n_iter '189' -sgd:penalty 'l2'" --initial-challengers " -balancing:strategy 'weighting' -classif
ier 'sgd' -imputation:strategy 'median' -preprocessor 'no_preprocessing' -rescaling:strategy 'min/max' -sgd:alpha '0.000134377776157' -sgd:epsilon '0.000256156800074' -sgd:eta0 '0.05222815237' -sgd
:fit_intercept 'True' -sgd:learning_rate 'constant' -sgd:loss 'modified_huber' -sgd:n_iter '429' -sgd:penalty 'l1'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'passive_aggr
esive' -imputation:strategy 'median' -liblinear_svc_preprocessor:C '0.306520222754' -liblinear_svc_preprocessor:class_weight 'None' -liblinear_svc_preprocessor:dual 'False' -liblinear_svc_preproces
sor:fit_intercept 'True' -liblinear_svc_preprocessor:intercept_scaling '1' -liblinear_svc_preprocessor:loss 'l2' -liblinear_svc_preprocessor:multi_class 'ovr' -liblinear_svc_preprocessor:penalty 'l
2' -liblinear_svc_preprocessor:tol '4.83193374386e-05' -passive_aggresive:C '0.000522592495213' -passive_aggresive:fit_intercept 'True' -passive_aggresive:loss 'hinge' -passive_aggresive:n_iter '31
3' -preprocessor 'liblinear_svc_preprocessor' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'none' -classifier 'libsvm_svc' -imputation:strategy 'median' -libsvm_svc:C
'19690.0557441' -libsvm_svc:class_weight 'None' -libsvm_svc:gamma '4.89593584562e-05' -libsvm_svc:kernel 'rbf' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'True' -libsvm_svc:tol '0.019646836528
3' -preprocessor 'random_trees_embedding' -random_trees_embedding:max_depth '4' -random_trees_embedding:max_leaf_nodes 'None' -random_trees_embedding:min_samples_leaf '16' -random_trees_embedding:m
in_samples_split '9' -random_trees_embedding:n_estimators '52' -rescaling:strategy 'standard'" --initial-challengers " -balancing:strategy 'none' -classifier 'sgd' -imputation:strategy 'mean' -prep
rocessor 'no_preprocessing' -rescaling:strategy 'min/max' -sgd:alpha '0.0001' -sgd:eta0 '0.01' -sgd:fit_intercept 'True' -sgd:learning_rate 'optimal' -sgd:loss 'hinge' -sgd:n_iter '20' -sgd:penalty
'l2'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'random_forest' -extra_trees_preproc_for_classification:bootstrap 'True' -extra_trees_preproc_for_classification:criterion
'entropy' -extra_trees_preproc_for_classification:max_depth 'None' -extra_trees_preproc_for_classification:max_features '3.61796566599' -extra_trees_preproc_for_classification:min_samples_leaf '6'
-extra_trees_preproc_for_classification:min_samples_split '2' -extra_trees_preproc_for_classification:n_estimators '100' -imputation:strategy 'mean' -preprocessor 'extra_trees_preproc_for_classifi
cation' -random_forest:bootstrap 'True' -random_forest:criterion 'entropy' -random_forest:max_depth 'None' -random_forest:max_features '0.857466092817' -random_forest:max_leaf_nodes 'None' -random_
forest:min_samples_leaf '14' -random_forest:min_samples_split '15' -random_forest:n_estimators '100' -rescaling:strategy 'normalize'"
Calling: runsolver --watcher-data /dev/null -W 3598 -d 5 python /media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/ensemble_selecti
on_script.py /tmp/autosklearn_tmp_14167_1103 54da6690e2c896d2d9aafe349b066645 multiclass.classification acc_metric 3593.92797899 /tmp/autosklearn_output_14167_1103 50 1 /tmp/autosklearn_tmp_14167_1
103/ensemble_indices_1
Out[16]: <AutoSklearnClassifier(AutoSklearnClassifier-1, initial)>
In [17]: >>> print automl.score(X_test, y_test)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-6a5d99e9c9c3> in <module>()
----> 1 print automl.score(X_test, y_test)
/media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.pyc in score(self, X, y)
358
359 def score(self, X, y):
--> 360 prediction = self.predict(X)
361 return evaluator.calculate_score(y, prediction, self.task_,
362 self.metric_, self.target_num_)
/media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.pyc in predict(self, X)
137 The predicted classes.
138 """
--> 139 return super(AutoSklearnClassifier, self).predict(X)
140
141
/media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.pyc in predict(self, X)
327
328 if len(models) == 0:
--> 329 raise ValueError("No models fitted!")
330
331 if self.ohe_ is not None:
ValueError: No models fitted!
This is probably functioning like designed but is not convenient from my point of view. I see tasks started at the end of the "time_left_for_this_task" and overrunning it by "per_run_time_limit", they are not killed when "time_left_for_this_task" is up.
In my particular situation I set per_run_time_limit=time_left_for_this_task to accommodate very slow models as well. The result is that the fit time is doubled over the "time_left_for_this_task".
This is probably upstream (SMAC) but just want to file it here so you are aware of it.
For me this is important as I am working on a massively parallel implementation so that the execution time can be rather short (hours). Doubling this time by just one runaway model goes against the hair of this initiative.
There is no error in SMAC logs.
top shows the autosklearn processes inactive occupying a lot of memory:
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 12378092+total, 11410110+used, 9679820 free, 54200 buffers
KiB Swap: 0 total, 0 used, 0 free. 13223792 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14599 ekobylk+ 20 0 42.463g 0.037t 2848 S 0.0 32.0 28:48.66 python
14628 ekobylk+ 20 0 31.582g 0.030t 8 S 0.0 26.3 1:53.21 python
14627 ekobylk+ 20 0 33.422g 0.029t 0 S 0.0 25.2 5:29.03 python
18521 ekobylk+ 20 0 25.770g 0.024t 1016 S 0.0 21.2 0:00.02 python
18514 ekobylk+ 20 0 2897984 2.069g 0 S 0.0 1.8 0:00.04 python
18515 ekobylk+ 20 0 2897984 2.069g 0 S 0.0 1.8 0:00.02 python
Once I have issued (CTRL+C) in the python command line for the script it has completed and shown better results as the last time with less execution time. It also shows ensemble building completion at 22:32. So may be it is even not autosklearn itself?
So I wonder what is holding the completion off?
This is what I find in
ensemble_err_1.log
Start script: /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/ensemble_selection_script.py
[DEBUG] [22:31:57:ensemble_selection_script.py] Time left: -5.000000
[DEBUG] [22:31:57:ensemble_selection_script.py] Time last iteration: 0.000000
[ERROR] [22:31:57:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_1_00000.npy has score: -0.81935483871
[ERROR] [22:31:57:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_1_00001.npy has score: -0.532039976484
SKIP similar lines
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00067.npy has score: -0.446875
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00068.npy has score: -0.870646766169
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00069.npy has score: -0.474626865672
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00070.npy has score: -0.989130434783
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00071.npy has score: -0.483211678832
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00072.npy has score: -0.823788546256
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00073.npy has score: -0.488023952096
[INFO] [22:32:03:ensemble_selection_script.py] Ensemble Selection:
Trajectory: 0: -0.430189 1: -0.421986 2: -0.397436 3: -0.392625 4: -0.392625 5: -0.392625 6: -0.392625 7: -0.392625 8: -0.392625 9: -0.392625 10: -0.392625 11: -0.389610 12: -0.388949 13: -0.388949 14: -0.388949 15: -0.388949 16: -0.388949 17: -0.388949 18: -0.388949 19: -0.388949 20: -0.388949 21: -0.388949 22: -0.388949 23: -0.388949 24: -0.388949 25: -0.388949 26: -0.388949 27: -0.388949 28: -0.388949 29: -0.388949 30: -0.388949 31: -0.388949 32: -0.388949 33: -0.388949 34: -0.388949 35: -0.388949 36: -0.388949 37: -0.388949 38: -0.388949 39: -0.388949 40: -0.388949 41: -0.388949 42: -0.388949 43: -0.388949 44: -0.388949 45: -0.388949 46: -0.388949 47: -0.388949 48: -0.388949 49: -0.388949
Members: [128, 89, 57, 271, 8, 26, 26, 46, 46, 46, 293, 344, 241, 26, 26, 26, 341, 341, 341, 341, 341, 341, 341, 341, 341, 393, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 46, 293, 26, 26, 26, 26, 26, 26, 26, 26]
Weights: [ 0. 0. 0. 0. 0. 0. 0. 0. 0.02 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0.26 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0.08 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.02 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.02
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0.02 0. 0. 0. 0. 0.28 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.02 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0.02 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.04 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.18 0. 0. 0.02 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.02 0. 0. 0. 0. 0. 0. 0. ]
Identifiers: (1, 8) (1, 26) (1, 46) (1, 57) (2, 4) (2, 43) (2, 48) (4, 4) (4, 34) (4, 56) (5, 14) (5, 17) (5, 66)
[INFO] [22:32:03:ensemble_selection_script.py] Training performance: -0.388949
[INFO] [22:32:03:ensemble_selection_script.py] Could not find as many validation set predictions (0)as ensemble predictions (401)!.
[INFO] [22:32:03:ensemble_selection_script.py] Could not find as many test set predictions (0) as ensemble predictions (401)!
[DEBUG] [22:32:03:ensemble_selection_script.py] Time left: -10.804320
[DEBUG] [22:32:03:ensemble_selection_script.py] Time last iteration: 5.765388
[DEBUG] [22:32:03:ensemble_selection_script.py] Nothing has changed since the last time
log-run?.txt from SMAC all show "Returning with value: 0"
example:
22:17:08.016 [main] DEBUG c.u.c.b.smac.executors.SMACExecutor - Returning with value: 0
22:17:08.016 [Event Manager Dispatch Thread] DEBUG c.u.c.b.a.eventsystem.EventManager - Event Manager thread done, released 0 pending flushes
22:17:08.019 [FileSharingRunHistory Logger ( outputID:1)] INFO c.u.c.b.a.r.FileSharingRunHistoryDecorator - At shutdown: /data/atskln_tmp/faebb891c2943e03251e86c73d212016/live-rundata-1.json had 85 runs added to it
22:17:08.019 [FileSharingRunHistory Logger ( outputID:1)] INFO c.u.c.b.a.r.FileSharingRunHistoryDecorator - At shutdown: we retrieved atleast 316 runs and added them to our current data set [live-rundata-2.json=>67, live-rundata-3.json=>85, live-rundata-4.json=>90, live-rundata-5.json=>74]
This package is way too noisy - things are going to fail, and when they do it should do so gracefully, not loudly.
Whenever I try to fit a model using a sparse matrix as data in autosklearn, python hangs and gives me:
self.info['has_missing'] = np.all(np.isfinite(data_x))
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
One can easily reproduce by:
import scipy.sparse as sp
import autosklearn.automl as autosk
#Any sparse matrix with some filling
spars = sp.csr_matrix((10,20))
spars[4,0] = 20; spars[2,3] = 10; spars[0,3]=30
y = np.array(np.ones((10,1)))
modl = autosk.AutoML(time_left_for_this_task=300, seed=1, per_run_time_limit=60, tmp_dir='/tmp/autosk_tmp', output_dir='/tmp/autosk_out', delete_tmp_folder_after_terminate=False,
initial_configurations_via_metalearning=None,
include_estimators=['random_forest'])
# Error happens here!
modl.fit(spars, y)
AutoSklearnClassifier( include_preprocessors=('no_preprocessing',))
It would be great to have an explicit list of preprocessors and estimators that could be used in include_preprocessors and include_estimators parameters of AutoSklearnClassifier.
Hey.
Which paper should someone cite using this package?
I think that should be in the readme or on the website.
Also there is mention of a paper in the icml workshop, but no link to the paper.
Thanks,
Andy
The authors of XGBoost have recently added wrappers to have the same signatures as sklearn classifiers.
https://github.com/dmlc/xgboost/blob/master/demo/guide-python/sklearn_examples.py
Is there an easy way to add this or other classifiers to the list of trained and optimized classifiers?
I am trying to get autosklearn to run on a cluster of redhat machines. Hopefully I will succeed by myself but just want to point out that the proper application domain for autosklearn is in massive parallelism. So it would be great to have a canonical example on how to set up autosklearn in a cluster environment.
When running a modified version of the script in which I don't want ensembles (ie. without ensemble builder script) just the best model to make a prediction over my test dataset, I get the obvious error that self._ensemble._get_model_identifiers() is NoneType
.
Is that the intended behaviour such that one must create an ensemble of 1 or should I file it as a bug?
clf.fit(X_train_t, y_train, metric = 'f1_metric')
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 271, in fit
feat_type, dataset_name)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 262, in fit
return self._fit(loaded_data_manager)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 348, in _fit
data_manager_path = self._backend.save_datamanager(datamanager)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/util/backend.py", line 124, in save_datamanager
pickle.dump(datamanager, fh, -1)
SystemError: error return without exception set
clf = AutoSklearnClassifier( time_left_for_this_task=300, per_run_time_limit=90, ml_memory_limit=10000, resampling_strategy='cv', resampling_strategy_arguments={'folds':5})
and pass the dataset name to the constructor, so it can output the dataset name in the logs.
Using sklearn's OneVsRestClassifier
in order to handle multilabel datasets is not possible, as ProjLogitCLassifier
misses functionality that is necessary to use sklearn.base.clone
TypeError: Cannot clone object '<ParamSklearn.implementations.ProjLogit.ProjLogit object at 0x7f7761445450>' (type <class 'ParamSklearn.implementations.ProjLogit.ProjLogit'>): it does not seem to be a scikit-learn estimator it does not implement a 'get_params' methods.
Hi,
Fantastic Library!
I was just wondering, i am trying to use the library for a binary classifier experiment using the log loss function to train the model. This is for a university experiment around benchmarking different models. Would you have time to provide an example of how to use the library to achieve the above goal.
Also, with an included visualisation on how the model learns.
Many thanks,
Best,
Andrew
After the run seem to have completed successfully (some completed runs logged in SMAC logs) autosklearn fails with the following error message:
Traceback (most recent call last):
File "truffles-autosklearn-multy.py", line 82, in spawn_classifier
c.fit(X_train, y_train, metric='f1_metric')
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 271, in fit
feat_type, dataset_name)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 262, in fit
return self._fit(loaded_data_manager)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 526, in _fit
self._load_models()
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 647, in _load_models
self.models_ = self._backend.load_all_models(seed)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/util/backend.py", line 171, in load_all_models
models = self.load_models_by_file_names(model_files)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/util/backend.py", line 185, in load_models_by_file_names
seed = int(basename_parts[0])
ValueError: invalid literal for int() with base 10: 'tmpJmPf2D'
I noticed that auto-sklearn reserves about one third of the training data for building the ensemble. This means that the individual models could have seen 50% more data than they currently do, which can have a substantial impact if the number of training examples is small. E.g., I tried running auto-sklearn for 7 hours on the Kaggle Titanic competition and got a score of about 0.799, which is not bad, but also not great.
Suggestion: retrain individual models on full training data after ensemble has been built. One difficulty is that the hyperparameters specification should be invariant on the number of training examples. On the other hand, it may not matter much either way since this proposal wouldn't change the number of training examples the models see by more than a factor of 1,5.
I am facing a weird error whereby the autosklearn.classification is missing.
I installed autosklearn into a mini conda environment as requested on the install guide.
Any tips? :)
import autosklearn.classification
Traceback (most recent call last):
File "", line 1, in
import autosklearn.classification
ImportError: No module named classification
I can access fine autosklearn via the command line in the environment
(auto2)[root@CentOS-72-64-minimal ~]# autosklearn
usage: autosklearn [-h] [-c CONFIG] [--output-dir OUTPUT_DIR]
[--temporary-output-directory TEMPORARY_OUTPUT_DIRECTORY]
[--keep-output] [--time-limit TIME_LIMIT]
....
Process PoolWorker-13:
Traceback (most recent call last):
Process PoolWorker-12:
File "/home/ekobylkin/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
File "/home/ekobylkin/anaconda2/lib/python2.7/multiprocessing/process.py", line 261, in _bootstrap
util._exit_function()
self.run()
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/base_interface.py", line 42, in signal_handler
File "/home/ekobylkin/anaconda2/lib/python2.7/multiprocessing/util.py", line 305, in _exit_function
_run_finalizers(0)
File "/home/ekobylkin/anaconda2/lib/python2.7/multiprocessing/util.py", line 250, in _run_finalizers
def _run_finalizers(minpriority=None):
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/base_interface.py", line 42, in signal_handler
evaluator.finish_up()
AttributeError: 'NoneType' object has no attribute 'finish_up'
evaluator.finish_up()
AttributeError: 'NoneType' object has no attribute 'finish_up'
Can it be done? Is this feature being developed?
In a text classification task the SGDClassifier needs just a few minutes to get to the same result as auto-sklearn that I let running for 20 hours. Lesser time budget for auto-sklearn resulted in absolute or relative failure in prediction.
I wonder if there is a strategy to try the fastest algorithms first and if time is up use their results at least?
Another question is about the recommended per_run_time_limit value. Is there a rule of thumb choosing it?
SGDClassifier Precision: 0.20 Test FR Recall: 0.53 F1: 0.29
Auto-sklearn Precision: 0.28 Recall: 0.31 F1: 0.29 classifier.fit(X_train, y_train, metric='f1_metric')
AutoSklearnClassifier( time_left_for_this_task=72000, per_run_time_limit=19000, ml_memory_limit=10000)
I have removed scikit-learn 0.17 from anaconda hoping to be able to install the version autosklearn requires. When running pip install -r https://raw.githubusercontent.com/automl/auto-sklearn/master/requ.txt
I get a rather cryptic message
Building wheels for collected packages: scikit-learn
Running setup.py bdist_wheel for scikit-learn: started
Running setup.py bdist_wheel for scikit-learn: finished with status 'error'
Complete output from command /home/USERNAME/anaconda2/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-Caf5WC/scikit-learn/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" bdist_wheel -d /tmp/tmpgPVyFnpip-wheel- --python-tag cp27:
Failed building wheel for scikit-learn
The problem is actually a missing compiler. This fixes it:
sudo apt-get update && sudo apt-get upgrade && sudo apt-get install build-essential
for RedHat (6.5) sudo yum groupinstall 'Development Tools'
Is it against the philosophy of this project to return an instance of the most performant model as sklearn-hyperopt does?
Are there automl settings that would bring the training time (on the digits set for example )to something negligible (<10 minutes) for quick experimentation and code integration tests?
I'm currently experimenting with putting automl in my code, but the ~1 hour training time is a huge bottleneck. A setting for a quick throwaway run would be really helpful.
Just ran this snippet to test the accuracy of results with autosklearn and got a terrible 55%. Any tips why? :)
import numpy as np
import urllib
import autosklearn.classification
import sklearn.datasets
url = "http://goo.gl/j0Rvxq"
raw_data = urllib.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter=",")
print(dataset.shape)
X = dataset[:,0:7]
y = dataset[:,8]
indices = np.arange(X.shape[0])
np.random.shuffle(indices)
X = X[indices]
y = y[indices]
X_train = X[:700]
y_train = y[:700]
X_test = X[700:]
y_test = y[700:]
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
print(automl.score(X_test,y_test))`
Hi, with an explicitly specified folder containing spaces, I'm getting the following exception:
[INFO] [2016-02-05 16:15:58,126:autosklearn.util.submit_process] Calling: runsolver --watcher-data /dev/null -W 42 -d 5 python -m autosklearn.ensemble_selection_script --auto-sklearn-tmp-directory /library/auto ml/autosklearn_tmp_acc_metric --basename 85e3933b9df7e236e76d83ace9b5bbc3 --task multiclass.classification --metric acc_metric --limit 37.1727719307 --output-directory /library/auto ml/autosklearn_output_acc_metric --ensemble-size 50 --ensemble-nbest 50 --auto-sklearn-seed 1 --max-iterations -1
Error occurred while running SMAC
>Error Message:Was passed main parameter 'ml/autosklearn_tmp_acc_metric/85e3933b9df7e236e76d83ace9b5bbc3.scenario' but no main parameter was defined
>Encountered Exception:ParameterException
If I try to escape the space, I've the following exception from SMAC (and a folder "auto\ ml" is created):
[INFO] [2016-02-05 16:24:04,955:autosklearn.util.submit_process] Calling: runsolver --watcher-data /dev/null -W 40 -d 5 python -m autosklearn.ensemble_selection_script --auto-sklearn-tmp-directory /library/auto\ ml/autosklearn_tmp_acc_metric --basename 85e3933b9df7e236e76d83ace9b5bbc3 --task multiclass.classification --metric acc_metric --limit 35.2403550148 --output-directory /library/auto\ ml/autosklearn_output_acc_metric --ensemble-size 50 --ensemble-nbest 50 --auto-sklearn-seed 1 --max-iterations -1
Error occurred while running SMAC
>Error Message:Option File (JCommander @ParameterFile) does not exist: 85e3933b9df7e236e76d83ace9b5bbc3.scenario
>Encountered Exception:ParameterException
Hi all,
I took example (from docs), and i want to save classifier to file.
But if:
with open('dump_autosk.pkl', 'wb') as fio:
pickle.dump(automl, fio)
I got
TypeError: Pickling an AuthenticationString object is disallowed for security reasons
Can i save model in file?
For multiclassification task output is an index, which makes result ambiguous, e.g. [0 ... 7] instead of [1 ... 8]
class AutoML(BaseEstimator, multiprocessing.Process):
...
def predict(self, X):
return np.argmax(self.predict_proba(X), axis=1)
Does it work as expected?
Not critical
When adding a UniformFloat
or Constant
hyperparameter to the configuration space of any component, the string conversion to the 'parameter configuration space file' has only 10 decimal digits precision that mismatches the native float precision (53 bits) when running auto-sklearn and setting the hyperparameter for a component.
This generates gobs of output. Need a silent mode. Standard context manager approaches to hiding stdout don't work here.
I have upgraded autosklearn a few weeks ago and it dropped the requirement for sklearn 0.16.1. So I could successfully run it with current 0.17.x release. Now I am installing autosklearn from a scratch and can not get it over the sklearn 0.16.1 requirement.
Is it something that has to be changed in requ.txt or there is more to it?
P.S. first install was on CentOS now it is on Ubuntu 14.04
Can autosklearn generate a model file that can be reused for classifying new data? Would be useful for classifying big data streams.
Hi all,
I took example (from docs), and i got error:
Traceback (most recent call last):
File "/home/warmonger/Develop/AutoInformatisation/TagEmitter/src/pydigest/exampl.py", line 25, in <module>
print automl.score(X_test, y_test)
File "/home/warmonger/Develop/venv/pocket2/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 360, in score
prediction = self.predict(X)
File "/home/warmonger/Develop/venv/pocket2/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 139, in predict
return super(AutoSklearnClassifier, self).predict(X)
File "/home/warmonger/Develop/venv/pocket2/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 329, in predict
raise ValueError("No models fitted!")
ValueError: No models fitted!
pip freeze:
alembic==0.7.6
AutoSklearn==0.0.1.dev0
beautifulsoup4==4.3.2
blinker==1.3
boto==2.38.0
bz2file==0.98
cffi==1.1.0
chardet==2.3.0
cma==1.1.6
cryptography==0.9
cssselect==0.9.1
Cython==0.22
DAWG-Python==0.7.2
decorator==4.0.2
docopt==0.6.2
enum34==1.0.4
Flask==0.10.1
Flask-Admin==1.1.0
Flask-Admin-Profiler==0.0.1
Flask-DebugToolbar==0.10.0
Flask-Migrate==1.4.0
Flask-Script==2.0.5
Flask-SQLAlchemy==2.0
Flask-WTF==0.11
gensim==0.11.1.post1
gevent==1.0.2
greenlet==0.4.7
grequests==0.2.0
HPOlib==0.1.0
HPOlibConfigSpace==0.1.dev0
idna==2.0
ipaddress==1.0.7
itsdangerous==0.24
Jinja2==2.7.3
langdetect==1.0.5
liac-arff==2.0.2
lockfile==0.10.2
lxml==3.4.4
Mako==1.0.1
MarkupSafe==0.23
matplotlib==1.4.3
mock==1.0.1
MySQL-python==1.2.5
naiveBayesClassifier==0.1.3
ndg-httpsclient==0.4.0
networkx==1.10
nltk==3.0.2
nose==1.3.7
numpy==1.9.0
objgraph==2.0.0
pandas==0.16.0
ParamSklearn==0.1.dev0
pocket==0.3.5
protobuf==2.6.1
psutil==3.1.1
psycopg2==2.6
pyasn1==0.1.7
pycparser==2.13
-e git+https://bitbucket.org/mfeurer/pymetalearn/@2767d4d9eca801ad23247ced586a91957ef583e5#egg=pyMetaLearn-master
pymongo==3.0.3
pymorphy2==0.8
pymorphy2-dicts==2.4.393442.3710985
pyOpenSSL==0.15.1
pyparsing==2.0.3
python-dateutil==2.4.2
pytz==2015.4
PyYAML==3.11
readability-lxml==0.5.1
requests==2.7.0
rutermextract==0.2
scikit-learn==0.15.2
scipy==0.14.0
six==1.9.0
smart-open==1.2.1
SQLAlchemy==1.0.4
textblob==0.9.0
topia.termextract==1.1.0
umemcache==1.6.3
Werkzeug==0.10.4
WTForms==2.0.2
zope.interface==4.1.2
All error thrown in SMAC are not shown to the user. For example, one of the required packages was not properly installed in my case and hence, all target algorithm run crashed, but it was not shown to me and the code crashed later with an incomprehensible error message. Please add a proper DEBUG mode. (btw: some modules use logging and some use print statements)
because it loads a pickled data manager. The data manager is loaded when a data manager class initialized.
I'm completly new to auto-sklearn. I've run example from documentation on digits dataset and print(automl.score...) gives me -0.0121288... instead of 0.98. I've noticed that "1 - 0.0121288"... is more or less 0.98, but I want to be sure that is not weird coincidence.
If everything is correct, there should be note in documentation about this.
As of 3be0189, the whole thing is broken, as tested on a fresh dedicated Ubuntu 15.10 installation.
When running example/example1.py
, tmp_dir/ensemble_err_1.log
is full of messages like this:
[DEBUG] [13:06:20:ensemble_selection_script.py] Prediction directory /tmp/autoslearn_example_tmp/.auto-sklearn/predictions_ensemble does not exist!
No kind of optimization is performed, and the script finishes with:
Traceback (most recent call last):
File "example1.py", line 31, in <module>
main()
File "example1.py", line 26, in main
automl.fit(X_train, y_train, dataset_name='digits')
File "/usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 255, in fit
feat_type, dataset_name)
File "/usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 244, in fit
return self._fit(loaded_data_manager)
File "/usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 456, in _fit
self._load_models()
File "/usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 542, in _load_models
raise ValueError('No models fitted!')
ValueError: No models fitted!
Further investigation pointed out that tmp_dir/.auto-sklearn/predictions_ensemble
doesn’t exist indeed; instead, it is created as ./.auto-sklearn/predictions_ensemble
— in the current directory.
The bug seems to be introduced by 3be0189; installing a version as of previous commit, 7666120, appears to work fine.
After loading an external component feature as directed in the manual with:
from component import DeepFeedNet
add_classifier(DeepFeedNet.DeepFeedNet)
autosk.AutoML(include_estimators=['DeepFeedNet']))
In the main process it does not generate error or warning, but after reviewing the smac logs, this error is reported:
[ Shortened... ]
16:57:30.806 [CLI TAE (STDERR Thread - #0)] WARN c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - [PROCESS-ERR](hp_value, hyperparameter))
16:57:30.807 [CLI TAE (STDERR Thread - #0)] WARN c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - [PROCESS-ERR] ValueError: Hyperparameter instantiation 'DeepFeedNet' is illegal for hyperparameter classifier:choice, Type: Categorical, Choices: {adaboost, bernoulli_nb, decision_tree, extra_trees, gaussian_nb, gradient_boosting, k_nearest_neighbors, lda, liblinear_svc, libsvm_svc, multinomial_nb, passive_aggressive, proj_logit, qda, random_forest, sgd}, Default: random_forest
Which after further review the error is (presumably) at the loading of the classifiers in configuration of datamanager in the subprocesses:
193 cs = get_configuration_space(D.info)
194 configuration = configuration_space.Configuration(cs, params)
13:12:38.766 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - > Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.
Is this expected?
MKL seems to be installed by anaconda
conda install mkl
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ....
Solving package specifications: .........
# All requested packages already installed.
# packages in environment at /home/ekobylkin/anaconda2:
#
mkl 11.3.1 0
ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd "/data/atskln_tmp" ; runsolver --watcher-data /dev/null -W 1865 -d 30 -M 22000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout /data/atskln_tmp/.auto-sklearn/datamanager.pkl 1865.0 2147483647 -1 -balancing:strategy 'none' -classifier:__choice__ 'sgd' -classifier:sgd:alpha '1.0E-4' -classifier:sgd:average 'False' -classifier:sgd:eta0 '0.01' -classifier:sgd:fit_intercept 'True' -classifier:sgd:learning_rate 'optimal' -classifier:sgd:loss 'log' -classifier:sgd:n_iter '20' -classifier:sgd:penalty 'l2' -imputation:strategy 'mean' -one_hot_encoding:minimum_fraction '0.01' -one_hot_encoding:use_minimum_fraction 'True' -preprocessor:__choice__ 'no_preprocessing' -rescaling:__choice__ 'min/max'
13:12:38.766 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The last 1 lines of output we saw were:
13:12:38.766 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - > Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.
13:12:38.774 [CLI TAE (Master Thread - #0)] DEBUG c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - Run <Instance:1, Seed:-1, Config:0x000F, Kappa:1865.0, Execution Config: 0x0001> ==> <CRASHED, 0.0, 0.0, 0.0, -1,ERROR: Wrapper did not output anything that matched the expected output ("Result of algorithm run:..."). Please try executing the wrapper directly> W:(198209.0) is completed
13:12:38.783 [main] DEBUG c.u.c.b.a.i.c.ClassicInitializationProcedure - Initialization: Completed run for config (0x000F) on instance 1 with seed -1 and captime 1865.0 => Result: CRASHED, 0.0, 0.0, 2.0, -1,ERROR: Wrapper did not output anything that matched the expected output ("Result of algorithm run:..."). Please try executing the wrapper directly, wallclock time: 198209.0 seconds
automl.py serves to purposes; being the base class of a scikit-learn like estimator object and being directly usable with a data manager. It should rather be an abstract base class and all functionality regarding data loading and special fit semantics go into a special subclass. This would also free the AutoSklearnClassifier from being a subclass of multiprocessing.Process.
Hi all,
First off: Great project! I was very excited to come across this project and I'm eager to try it out. However, when I attempted to install auto-sklearn, I ran into several difficulties. In particular, I noticed that HPOlib -- a dependency of this project -- requires Python 2.7, which entails that auto-sklearn also requires Python 2.7.
Is there any way around this other than setting up a 2.7 environment? I'd love to integrate this into a Python 3 workflow.
Spotted by @hmendozap here. Thanks!
Might just be me but it's been stuck for a while, anyone experiencing this?
~/GitHub/cabinet/auto-sklearn $ python setup.py install
/Library/Python/2.7/site-packages/setuptools/dist.py:285: UserWarning: Normalizing '0.0.1dev' to '0.0.1.dev0'
normalized_version,
running install
Building runsolver
Makefile:23: runsolver.d: No such file or directory
Makefile:23: SignalNames.d: No such file or directory
Makefile:23: SyscallNames.d: No such file or directory
grep '#define[[:space:]]*__NR' | grep -v '^/' | awk '{print $2}' | sed -e 's/^__NR_//' | awk '{printf "list[__NR_%s]=\"%s\";\n",$1,$1}' > tmpSyscallList.cc
Dear auto-sklearn team,
I have just learned about this project and am very excited to try to include it into my modelling flow!
It seems the command line option names for autosklearn are not the same as what AutoSklearnClassifier() constructor would accept. So I have kind of reverse engineered a few but I still can not figure out whether it is possible to specify task_type for example.
task_type="binary.classification"
is rejected by the AutoSklearnClassifier() constructor.
I understand this is a very young project and is actively worked on. Will be happy to supply you with the feedback from the fields as I am actively running modelling experiments on various datasets available at my company. Currently I am successfully using scikit learn with SGD Classifier for one. Is there a better way to reconnect with you in a forum or a chat somewhere to ask questions or give feedback?
As the title says, when using the add_component()
method the name that auto-sklearn sees it is the one of the class, but after reviewing the name that auto-sklearn assigns to internal components is the name of the module that imports the class.
I am running AutoSklearn on my own data loaded with http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_files.html#sklearn.datasets.load_files with the same structure as in http://scikit-learn.org/stable/datasets/twenty_newsgroups.html
c = SGDClassifier(alpha=0.01, n_iter=10, penalty='l2', loss="log" , random_state=42, class_weight='auto')
works fine but
c = AutoSklearnClassifier( time_left_for_this_task=300, per_run_time_limit=90, ml_memory_limit=10000)
fails
in predict
classifier.fit(X_train, y_train)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 271, in fit
feat_type, dataset_name)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 260, in fit
encode_labels=False)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/data/xy_data_manager.py", line 27, in init
self.info['has_missing'] = np.all(np.isfinite(data_x))
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.