automl / auto-sklearn Goto Github PK

Automated Machine Learning with scikit-learn

Home Page: https://automl.github.io/auto-sklearn

License: BSD 3-Clause "New" or "Revised" License

Python 99.54% Makefile 0.17% Shell 0.24% Dockerfile 0.05%

automl scikit-learn automated-machine-learning hyperparameter-optimization hyperparameter-tuning hyperparameter-search bayesian-optimization metalearning meta-learning smac

auto-sklearn's People

Contributors

Stargazers

Watchers

Forkers

ml-ai-nlp-ir beefcrack alongwithyou craftsliu lizhen-dlut mlwave waanng meego ml-bourne charlessantiago pmadhyastha abensrhir eavie blbailei paultopia ivanajw bsinghpratap mary-octavia kamilaszumilak jayhetee stephanesbizzera aruneral01 mahgoobi hmendozap xypan1232 stokasto ypeleg tsterbak zemoel anatolfernandez wanjinchang likaiguo allen1203 jhooge mblum dongzhixiang gray0302 royshan noelnamai joshjacobson motorrat jh-jeong mr1azl smkia codeaudit wyslatitude heydavid525 hydrosquall pappakrishnan jcapitz rpgone kongscn redlin5 giangzuzana erexhepa jhayes14 robinsonkwame andymason57 iver56 xwang-saj jsonbao shuaiyan fradice1977 tuannguyen27 takenory concannon webzjuyujun iamxiaodong msalvaris cinneesol g329 neuraloverflow zhangweijiqn suny16907 harshanimmagadda44 olveirap dragonfly90 elianomarques iamandicip snazz2001 jtuyls merico34 ofergold dotrado knhuq kwresearch christlc jaidevd namankumar radovankavicky elibol strategist922 yunxileo fnielsen nyseion feelthelearn yochju eleninisioti jtyberg ajayws

auto-sklearn's Issues

wrong path to requirement file in docu

Is there a way to leverage multiprocessing in .predict()?

My understanding is that when the predict method is called a whole ensemble of the models need to be run on the prediction X dataset. And it does so on one core only. Is there a way to fan it out to all the cores so the models run in parallel?

Install problem scipy 0.15.1/0.14.0 paramSklearn

Hi,
I have a problem with the installation.
After following the steps from the documentation:

pip install scikit-learn==0.15.2
pip install git+https://github.com/mfeurer/HPOlibConfigSpace#egg=HPOlibConfigSpace0.1dev
pip install git+https://[email protected]/mfeurer/paramsklearn.git@73d8643b2849db753ddc7b8909d01e6cee9bafc6 --no-deps
pip install git+https://github.com/automl/HPOlib#egg=HPOlib0.2
pip install --editable git+https://bitbucket.org/mfeurer/pymetalearn/#egg=pyMetaLearn

git clone https://github.com/automl/auto-sklearn.git
cd auto-sklearn
python setup.py install

I get the following error message (including last two output lines before error) after the final install:

Installed /usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg
Processing dependencies for AutoSklearn==0.0.1.dev0
error: scipy 0.15.1 is installed but scipy==0.14.0 is required by set(['ParamSklearn'])

Manually installing sklearn 0.14.0 is also of no help, it then tries to reinstall 0.15.2 during python setup.py install :(

Importing autosklearn fails with this error:

(some directory structure ...)/pymetalearn/pyMetaLearn/metafeatures/metafeatures.py in <module>()
     12 import sklearn.metrics
     13 import sklearn.cross_validation
---> 14 from sklearn.utils import check_arrays
     15
     16 from ParamSklearn.implementations.Imputation import Imputer

ImportError: cannot import name check_arrays


ImportError: cannot import name check_arrays

Any idea what is going wrong? :)

ValueError: No models fitted!

Greetings,

I tried to run the example from the README.md. All steps run without errors, until I try to score the test set. Then I get a ValueError: No models fitted!. The console output shows various runtime warnings, among them indications about missing files. What could be the cause?

(Unfortunately, github promptly refuses to accept a text file for some reason, so I pasted the console output during fitting and the error trace when trying to score below.)

My setup is a Python 2.7.6 virtualenv, running from IPython 4.0.0. The installed packages are as follows:

argparse (1.2.1)
AutoSklearn (0.0.1.dev0)
cma (1.1.06)
decorator (4.0.2)
funcsigs (0.4)
HPOlib (0.1.0)
HPOlibConfigSpace (0.1dev)
ipython (4.0.0)
ipython-genutils (0.1.0)
liac-arff (2.1.0)
lockfile (0.10.2)
matplotlib (1.4.3)
mock (1.3.0)
networkx (1.10)
nose (1.3.7)
numpy (1.9.0)
pandas (0.16.2)
ParamSklearn (0.1dev)
path.py (8.1.1)
pbr (1.8.0)
pexpect (3.3)
pickleshare (0.5)
pip (1.5.4)
protobuf (3.0.0-alpha-1)
psutil (3.2.1)
pyMetaLearn (0.1dev)
pymongo (3.0.3)
pyparsing (2.0.3)
python-dateutil (2.4.2)
pytz (2015.6)
PyYAML (3.11)
scikit-learn (0.15.2)
scipy (0.14.0)
setuptools (18.3.2)
simplegeneric (0.8.1)
six (1.9.0)
traitlets (4.0.0)
wheel (0.24.0)
wsgiref (0.1.2)

Thanks for your response.

Console output:
[INFO] [09-24 17:58:47:AutoML_54da6690e2c896d2d9aafe349b066645_1]: Remaining time after reading 54da6690e2c896d2d9aafe349b066645 3600.00 sec
/media/selects/venv/py27/local/lib/python2.7/site-packages/numpy/lib/nanfunctions.py:1057: RuntimeWarning: Degrees of freedom <= 0 for slice.
warnings.warn("Degrees of freedom <= 0 for slice.", RuntimeWarning)
/media/selects/venv/py27/local/lib/python2.7/site-packages/numpy/lib/nanfunctions.py:598: RuntimeWarning: Mean of empty slice
warnings.warn("Mean of empty slice", RuntimeWarning)
[WARNING] [09-24 17:58:47:pyMetaLearn.input.aslib_simple]: Not found: /media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/metalearni
ng/files/multiclass.classification_dense_acc_metric/ground_truth.arff (maybe you want to add it)
[WARNING] [09-24 17:58:47:pyMetaLearn.input.aslib_simple]: Not found: /media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/metalearni
ng/files/multiclass.classification_dense_acc_metric/citation.bib (maybe you want to add it)
[WARNING] [09-24 17:58:47:pyMetaLearn.input.aslib_simple]: Not found: /media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/metalearni
ng/files/multiclass.classification_dense_acc_metric/cv.arff (maybe you want to add it)
[INFO] [09-24 17:58:48:autosklearn.metalearning.metalearning]: Reading meta-data took 0.59 seconds
['133', '132', '131', '130', '137', '136', '135', '134', '139', '138', '24', '25', '26', '27', '20', '21', '22', '23', '28', '29', '4', '8', '120', '121', '122', '123', '124', '125', '126', '127',
'128', '129', '59', '58', '55', '54', '57', '56', '51', '50', '53', '52', '115', '114', '88', '89', '111', '110', '113', '112', '82', '83', '80', '81', '119', '118', '84', '85', '3', '7', '108', '1
09', '102', '103', '100', '101', '106', '107', '104', '105', '39', '38', '33', '32', '31', '30', '37', '36', '35', '34', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '2', '6', '99',
'98', '91', '90', '93', '92', '95', '94', '97', '96', '11', '10', '13', '12', '15', '14', '17', '16', '19', '18', '117', '116', '41', '48', '49', '46', '86', '44', '45', '42', '43', '40', '87', '1'
, '5', '9', '142', '140', '141', '77', '76', '75', '74', '73', '72', '71', '70', '79', '78', '47']
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 1118_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 314_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 454_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 809_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.optimizers.metalearn_optimizer.metalearner]: Could not find runs for instance 948_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 1118_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 948_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 454_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 809_bac
[WARNING] [09-24 17:58:48:pyMetaLearn.metalearning.kNearestDatasets.kND]: Found no best configuration for instance 314_bac
[INFO] [09-24 17:58:48:AutoML_54da6690e2c896d2d9aafe349b066645_1]: Time left for 54da6690e2c896d2d9aafe349b066645 after finding initial configurations: 3598.94sec
Calling: smac --numRun 1 --scenario /tmp/autosklearn_tmp_14167_1103/54da6690e2c896d2d9aafe349b066645.scenario --initial-challengers " -balancing:strategy 'weighting' -classifier 'lda' -imputation:s
trategy 'median' -kernel_pca:gamma '0.0290194572424' -kernel_pca:kernel 'rbf' -kernel_pca:n_components '1971' -lda:n_components '232' -lda:tol '0.000804876897084' -preprocessor 'kernel_pca' -rescal
ing:strategy 'min/max'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'libsvm_svc' -imputation:strategy 'median' -liblinear_svc_preprocessor:C '18592.5543358' -liblinear_svc_p
reprocessor:class_weight 'auto' -liblinear_svc_preprocessor:dual 'False' -liblinear_svc_preprocessor:fit_intercept 'True' -liblinear_svc_preprocessor:intercept_scaling '1' -liblinear_svc_preprocess
or:loss 'l2' -liblinear_svc_preprocessor:multi_class 'ovr' -liblinear_svc_preprocessor:penalty 'l2' -liblinear_svc_preprocessor:tol '0.040232270855' -libsvm_svc:C '6111.7121149' -libsvm_svc:class_w
eight 'None' -libsvm_svc:coef0 '0.844884936773' -libsvm_svc:degree '5' -libsvm_svc:gamma '0.117882960246' -libsvm_svc:kernel 'poly' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'False' -libsvm_s
vc:tol '0.00109298090501' -preprocessor 'liblinear_svc_preprocessor' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'liblinear_svc' -imputation:s
trategy 'mean' -kernel_pca:gamma '1.6331524928' -kernel_pca:kernel 'rbf' -kernel_pca:n_components '761' -liblinear_svc:C '44.5016816038' -liblinear_svc:class_weight 'auto' -liblinear_svc:dual 'Fals
e' -liblinear_svc:fit_intercept 'True' -liblinear_svc:intercept_scaling '1' -liblinear_svc:loss 'l2' -liblinear_svc:multi_class 'ovr' -liblinear_svc:penalty 'l2' -liblinear_svc:tol '0.0018788986680
6' -preprocessor 'kernel_pca' -rescaling:strategy 'normalize'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'libsvm_svc' -imputation:strategy 'mean' -libsvm_svc:C '50.8707992
587' -libsvm_svc:class_weight 'auto' -libsvm_svc:gamma '4.72168867253' -libsvm_svc:kernel 'rbf' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'True' -libsvm_svc:tol '1.67692533041e-05' -preproces
sor 'select_rates' -rescaling:strategy 'normalize' -select_rates:alpha '0.318343160914' -select_rates:mode 'fdr' -select_rates:score_func 'f_classif'" --initial-challengers " -balancing:strategy 'n
one' -classifier 'ridge' -imputation:strategy 'median' -kernel_pca:gamma '2.43149422021' -kernel_pca:kernel 'rbf' -kernel_pca:n_components '1194' -preprocessor 'kernel_pca' -rescaling:strategy 'nor
malize' -ridge:alpha '1.30657587648e-05' -ridge:fit_intercept 'True' -ridge:tol '0.000760986834404'" --initial-challengers " -adaboost:algorithm 'SAMME.R' -adaboost:learning_rate '0.400363929326' -
adaboost:max_depth '5' -adaboost:n_estimators '319' -balancing:strategy 'none' -classifier 'adaboost' -imputation:strategy 'most_frequent' -preprocessor 'no_preprocessing' -rescaling:strategy 'min/
max'" --initial-challengers " -balancing:strategy 'none' -classifier 'qda' -imputation:strategy 'mean' -pca:keep_variance '0.748479656855' -pca:whiten 'False' -preprocessor 'pca' -qda:reg_param '3.
82874880102' -qda:tol '0.0130621640728' -rescaling:strategy 'normalize'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'libsvm_svc' -imputation:strategy 'mean' -libsvm_svc:C '
18807.7593252' -libsvm_svc:class_weight 'None' -libsvm_svc:gamma '0.940704535703' -libsvm_svc:kernel 'rbf' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'True' -libsvm_svc:tol '0.00148731196993'
-preprocessor 'select_rates' -rescaling:strategy 'min/max' -select_rates:alpha '0.126666738937' -select_rates:mode 'fdr' -select_rates:score_func 'f_classif'" --initial-challengers " -balancing:str
ategy 'weighting' -classifier 'lda' -imputation:strategy 'mean' -kitchen_sinks:gamma '1.48108179896' -kitchen_sinks:n_components '3450' -lda:n_components '25' -lda:tol '0.0426553560955' -preprocess
or 'kitchen_sinks' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'random_forest' -feature_agglomeration:affinity 'manhattan' -feature_agglomerat
ion:linkage 'average' -feature_agglomeration:n_clusters '76' -imputation:strategy 'median' -preprocessor 'feature_agglomeration' -random_forest:bootstrap 'True' -random_forest:criterion 'entropy' -
random_forest:max_depth 'None' -random_forest:max_features '1.60908385606' -random_forest:max_leaf_nodes 'None' -random_forest:min_samples_leaf '2' -random_forest:min_samples_split '12' -random_for
est:n_estimators '100' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'none' -classifier 'ridge' -imputation:strategy 'mean' -nystroem_sampler:coef0 '0.476829591723' -ny
stroem_sampler:degree '3' -nystroem_sampler:gamma '0.0817500204362' -nystroem_sampler:kernel 'poly' -nystroem_sampler:n_components '7840' -preprocessor 'nystroem_sampler' -rescaling:strategy 'min/m
ax' -ridge:alpha '3.52478796331e-06' -ridge:fit_intercept 'True' -ridge:tol '2.63925768895e-05'" --initial-challengers " -balancing:strategy 'none' -classifier 'random_forest' -imputation:strategy
'mean' -preprocessor 'no_preprocessing' -random_forest:bootstrap 'True' -random_forest:criterion 'gini' -random_forest:max_depth 'None' -random_forest:max_features '1.0' -random_forest:max_leaf_nod
es 'None' -random_forest:min_samples_leaf '1' -random_forest:min_samples_split '2' -random_forest:n_estimators '100' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'none
' -classifier 'lda' -imputation:strategy 'median' -lda:n_components '203' -lda:tol '0.0935342136025' -preprocessor 'select_rates' -rescaling:strategy 'normalize' -select_rates:alpha '0.048178281695
5' -select_rates:mode 'fwe' -select_rates:score_func 'f_classif'" --initial-challengers " -balancing:strategy 'none' -classifier 'sgd' -imputation:strategy 'mean' -preprocessor 'no_preprocessing' -
rescaling:strategy 'min/max' -sgd:alpha '0.0001' -sgd:eta0 '0.01' -sgd:fit_intercept 'True' -sgd:learning_rate 'optimal' -sgd:loss 'hinge' -sgd:n_iter '20' -sgd:penalty 'l2'" --initial-challengers
" -balancing:strategy 'weighting' -classifier 'liblinear_svc' -imputation:strategy 'mean' -kitchen_sinks:gamma '1.62106650658' -kitchen_sinks:n_components '6034' -liblinear_svc:C '780.976275468' -l
iblinear_svc:class_weight 'auto' -liblinear_svc:dual 'False' -liblinear_svc:fit_intercept 'True' -liblinear_svc:intercept_scaling '1' -liblinear_svc:loss 'l2' -liblinear_svc:multi_class 'ovr' -libl
inear_svc:penalty 'l2' -liblinear_svc:tol '2.60869016302e-05' -preprocessor 'kitchen_sinks' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'libsv
m_svc' -feature_agglomeration:affinity 'manhattan' -feature_agglomeration:linkage 'average' -feature_agglomeration:n_clusters '89' -imputation:strategy 'most_frequent' -libsvm_svc:C '246.452178174'
-libsvm_svc:class_weight 'auto' -libsvm_svc:gamma '0.0442300193285' -libsvm_svc:kernel 'rbf' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'False' -libsvm_svc:tol '0.0180487670379' -preprocessor
'feature_agglomeration' -rescaling:strategy 'standard'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'passive_aggresive' -imputation:strategy 'median' -passive_aggresive:C '
1.31125616578' -passive_aggresive:fit_intercept 'True' -passive_aggresive:loss 'hinge' -passive_aggresive:n_iter '948' -preprocessor 'select_percentile_classification' -rescaling:strategy 'min/max'
-select_percentile_classification:percentile '83.3669247487' -select_percentile_classification:score_func 'chi2'" --initial-challengers " -balancing:strategy 'none' -classifier 'sgd' -imputation:s
trategy 'most_frequent' -preprocessor 'no_preprocessing' -rescaling:strategy 'min/max' -sgd:alpha '0.00292211727831' -sgd:epsilon '0.0116887099622' -sgd:eta0 '0.080560671307' -sgd:fit_intercept 'Tr
ue' -sgd:learning_rate 'invscaling' -sgd:loss 'modified_huber' -sgd:n_iter '754' -sgd:penalty 'l1' -sgd:power_t '0.463498329665'" --initial-challengers " -balancing:strategy 'none' -classifier 'ran
dom_forest' -imputation:strategy 'mean' -preprocessor 'select_rates' -random_forest:bootstrap 'False' -random_forest:criterion 'entropy' -random_forest:max_depth 'None' -random_forest:max_features
'4.67839426105' -random_forest:max_leaf_nodes 'None' -random_forest:min_samples_leaf '10' -random_forest:min_samples_split '10' -random_forest:n_estimators '100' -rescaling:strategy 'standard' -sel
ect_rates:alpha '0.167486470473' -select_rates:mode 'fdr' -select_rates:score_func 'f_classif'" --initial-challengers " -balancing:strategy 'none' -classifier 'sgd' -imputation:strategy 'most_frequ
ent' -preprocessor 'select_rates' -rescaling:strategy 'min/max' -select_rates:alpha '0.155334914856' -select_rates:mode 'fpr' -select_rates:score_func 'f_classif' -sgd:alpha '6.49185336268e-05' -sg
d:eta0 '0.0665593974375' -sgd:fit_intercept 'True' -sgd:learning_rate 'optimal' -sgd:loss 'log' -sgd:n_iter '189' -sgd:penalty 'l2'" --initial-challengers " -balancing:strategy 'weighting' -classif
ier 'sgd' -imputation:strategy 'median' -preprocessor 'no_preprocessing' -rescaling:strategy 'min/max' -sgd:alpha '0.000134377776157' -sgd:epsilon '0.000256156800074' -sgd:eta0 '0.05222815237' -sgd
:fit_intercept 'True' -sgd:learning_rate 'constant' -sgd:loss 'modified_huber' -sgd:n_iter '429' -sgd:penalty 'l1'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'passive_aggr
esive' -imputation:strategy 'median' -liblinear_svc_preprocessor:C '0.306520222754' -liblinear_svc_preprocessor:class_weight 'None' -liblinear_svc_preprocessor:dual 'False' -liblinear_svc_preproces
sor:fit_intercept 'True' -liblinear_svc_preprocessor:intercept_scaling '1' -liblinear_svc_preprocessor:loss 'l2' -liblinear_svc_preprocessor:multi_class 'ovr' -liblinear_svc_preprocessor:penalty 'l
2' -liblinear_svc_preprocessor:tol '4.83193374386e-05' -passive_aggresive:C '0.000522592495213' -passive_aggresive:fit_intercept 'True' -passive_aggresive:loss 'hinge' -passive_aggresive:n_iter '31
3' -preprocessor 'liblinear_svc_preprocessor' -rescaling:strategy 'min/max'" --initial-challengers " -balancing:strategy 'none' -classifier 'libsvm_svc' -imputation:strategy 'median' -libsvm_svc:C
'19690.0557441' -libsvm_svc:class_weight 'None' -libsvm_svc:gamma '4.89593584562e-05' -libsvm_svc:kernel 'rbf' -libsvm_svc:max_iter '-1' -libsvm_svc:shrinking 'True' -libsvm_svc:tol '0.019646836528
3' -preprocessor 'random_trees_embedding' -random_trees_embedding:max_depth '4' -random_trees_embedding:max_leaf_nodes 'None' -random_trees_embedding:min_samples_leaf '16' -random_trees_embedding:m
in_samples_split '9' -random_trees_embedding:n_estimators '52' -rescaling:strategy 'standard'" --initial-challengers " -balancing:strategy 'none' -classifier 'sgd' -imputation:strategy 'mean' -prep
rocessor 'no_preprocessing' -rescaling:strategy 'min/max' -sgd:alpha '0.0001' -sgd:eta0 '0.01' -sgd:fit_intercept 'True' -sgd:learning_rate 'optimal' -sgd:loss 'hinge' -sgd:n_iter '20' -sgd:penalty
'l2'" --initial-challengers " -balancing:strategy 'weighting' -classifier 'random_forest' -extra_trees_preproc_for_classification:bootstrap 'True' -extra_trees_preproc_for_classification:criterion
'entropy' -extra_trees_preproc_for_classification:max_depth 'None' -extra_trees_preproc_for_classification:max_features '3.61796566599' -extra_trees_preproc_for_classification:min_samples_leaf '6'
-extra_trees_preproc_for_classification:min_samples_split '2' -extra_trees_preproc_for_classification:n_estimators '100' -imputation:strategy 'mean' -preprocessor 'extra_trees_preproc_for_classifi
cation' -random_forest:bootstrap 'True' -random_forest:criterion 'entropy' -random_forest:max_depth 'None' -random_forest:max_features '0.857466092817' -random_forest:max_leaf_nodes 'None' -random_
forest:min_samples_leaf '14' -random_forest:min_samples_split '15' -random_forest:n_estimators '100' -rescaling:strategy 'normalize'"
Calling: runsolver --watcher-data /dev/null -W 3598 -d 5 python /media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/ensemble_selecti
on_script.py /tmp/autosklearn_tmp_14167_1103 54da6690e2c896d2d9aafe349b066645 multiclass.classification acc_metric 3593.92797899 /tmp/autosklearn_output_14167_1103 50 1 /tmp/autosklearn_tmp_14167_1
103/ensemble_indices_1

Out[16]: <AutoSklearnClassifier(AutoSklearnClassifier-1, initial)>

In [17]: >>> print automl.score(X_test, y_test)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-6a5d99e9c9c3> in <module>()
----> 1 print automl.score(X_test, y_test)

/media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.pyc in score(self, X, y)
    358 
    359     def score(self, X, y):
--> 360         prediction = self.predict(X)
    361         return evaluator.calculate_score(y, prediction, self.task_,
    362                                          self.metric_, self.target_num_)

/media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.pyc in predict(self, X)
    137             The predicted classes.
    138         """
--> 139         return super(AutoSklearnClassifier, self).predict(X)
    140 
    141 

/media/selects/venv/py27/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.pyc in predict(self, X)
    327 
    328         if len(models) == 0:
--> 329             raise ValueError("No models fitted!")
    330 
    331         if self.ohe_ is not None:

ValueError: No models fitted!

time_left_for_this_task is not respected

This is probably functioning like designed but is not convenient from my point of view. I see tasks started at the end of the "time_left_for_this_task" and overrunning it by "per_run_time_limit", they are not killed when "time_left_for_this_task" is up.

In my particular situation I set per_run_time_limit=time_left_for_this_task to accommodate very slow models as well. The result is that the fit time is doubled over the "time_left_for_this_task".

This is probably upstream (SMAC) but just want to file it here so you are aware of it.

For me this is important as I am working on a massively parallel implementation so that the execution time can be rather short (hours). Doubling this time by just one runaway model goes against the hair of this initiative.

autosklearn hangs after completing multicore run

There is no error in SMAC logs.
top shows the autosklearn processes inactive occupying a lot of memory:

%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  12378092+total, 11410110+used,  9679820 free,    54200 buffers
KiB Swap:        0 total,        0 used,        0 free. 13223792 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14599 ekobylk+  20   0 42.463g 0.037t   2848 S   0.0 32.0  28:48.66 python
14628 ekobylk+  20   0 31.582g 0.030t      8 S   0.0 26.3   1:53.21 python
14627 ekobylk+  20   0 33.422g 0.029t      0 S   0.0 25.2   5:29.03 python
18521 ekobylk+  20   0 25.770g 0.024t   1016 S   0.0 21.2   0:00.02 python
18514 ekobylk+  20   0 2897984 2.069g      0 S   0.0  1.8   0:00.04 python
18515 ekobylk+  20   0 2897984 2.069g      0 S   0.0  1.8   0:00.02 python

Once I have issued (CTRL+C) in the python command line for the script it has completed and shown better results as the last time with less execution time. It also shows ensemble building completion at 22:32. So may be it is even not autosklearn itself?
So I wonder what is holding the completion off?

This is what I find in
ensemble_err_1.log

Start script: /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/ensemble_selection_script.py
[DEBUG] [22:31:57:ensemble_selection_script.py] Time left: -5.000000
[DEBUG] [22:31:57:ensemble_selection_script.py] Time last iteration: 0.000000

[ERROR] [22:31:57:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_1_00000.npy has score: -0.81935483871
[ERROR] [22:31:57:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_1_00001.npy has score: -0.532039976484

SKIP similar lines

[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00067.npy has score: -0.446875
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00068.npy has score: -0.870646766169
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00069.npy has score: -0.474626865672
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00070.npy has score: -0.989130434783
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00071.npy has score: -0.483211678832
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00072.npy has score: -0.823788546256
[ERROR] [22:31:59:ensemble_selection_script.py] Model only predicts at random: predictions_ensemble_5_00073.npy has score: -0.488023952096
[INFO] [22:32:03:ensemble_selection_script.py] Ensemble Selection:
        Trajectory: 0: -0.430189 1: -0.421986 2: -0.397436 3: -0.392625 4: -0.392625 5: -0.392625 6: -0.392625 7: -0.392625 8: -0.392625 9: -0.392625 10: -0.392625 11: -0.389610 12: -0.388949 13: -0.388949 14: -0.388949 15: -0.388949 16: -0.388949 17: -0.388949 18: -0.388949 19: -0.388949 20: -0.388949 21: -0.388949 22: -0.388949 23: -0.388949 24: -0.388949 25: -0.388949 26: -0.388949 27: -0.388949 28: -0.388949 29: -0.388949 30: -0.388949 31: -0.388949 32: -0.388949 33: -0.388949 34: -0.388949 35: -0.388949 36: -0.388949 37: -0.388949 38: -0.388949 39: -0.388949 40: -0.388949 41: -0.388949 42: -0.388949 43: -0.388949 44: -0.388949 45: -0.388949 46: -0.388949 47: -0.388949 48: -0.388949 49: -0.388949
        Members: [128, 89, 57, 271, 8, 26, 26, 46, 46, 46, 293, 344, 241, 26, 26, 26, 341, 341, 341, 341, 341, 341, 341, 341, 341, 393, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 46, 293, 26, 26, 26, 26, 26, 26, 26, 26]
        Weights: [ 0.    0.    0.    0.    0.    0.    0.    0.    0.02  0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.26  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.08  0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.02  0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.02
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.02  0.    0.    0.    0.    0.28  0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.02  0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.02  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.04  0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.18  0.    0.    0.02  0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  0.    0.    0.    0.    0.    0.02  0.    0.    0.    0.    0.    0.    0.  ]
        Identifiers: (1, 8) (1, 26) (1, 46) (1, 57) (2, 4) (2, 43) (2, 48) (4, 4) (4, 34) (4, 56) (5, 14) (5, 17) (5, 66)
[INFO] [22:32:03:ensemble_selection_script.py] Training performance: -0.388949
[INFO] [22:32:03:ensemble_selection_script.py] Could not find as many validation set predictions (0)as ensemble predictions (401)!.
[INFO] [22:32:03:ensemble_selection_script.py] Could not find as many test set predictions (0) as ensemble predictions (401)!
[DEBUG] [22:32:03:ensemble_selection_script.py] Time left: -10.804320
[DEBUG] [22:32:03:ensemble_selection_script.py] Time last iteration: 5.765388
[DEBUG] [22:32:03:ensemble_selection_script.py] Nothing has changed since the last time

log-run?.txt from SMAC all show "Returning with value: 0"
example:

22:17:08.016 [main] DEBUG c.u.c.b.smac.executors.SMACExecutor - Returning with value: 0
22:17:08.016 [Event Manager Dispatch Thread] DEBUG c.u.c.b.a.eventsystem.EventManager - Event Manager thread done, released 0 pending flushes
22:17:08.019 [FileSharingRunHistory Logger ( outputID:1)] INFO  c.u.c.b.a.r.FileSharingRunHistoryDecorator - At shutdown: /data/atskln_tmp/faebb891c2943e03251e86c73d212016/live-rundata-1.json had 85 runs added to it
22:17:08.019 [FileSharingRunHistory Logger ( outputID:1)] INFO  c.u.c.b.a.r.FileSharingRunHistoryDecorator - At shutdown: we retrieved atleast 316 runs and added them to our current data set [live-rundata-2.json=>67, live-rundata-3.json=>85, live-rundata-4.json=>90, live-rundata-5.json=>74]

Suppress warnings by default

This package is way too noisy - things are going to fail, and when they do it should do so gracefully, not loudly.

Sparse Matrix - isfinite hangs when fitting

Whenever I try to fit a model using a sparse matrix as data in autosklearn, python hangs and gives me:

self.info['has_missing'] = np.all(np.isfinite(data_x))
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

One can easily reproduce by:

import scipy.sparse as sp
import autosklearn.automl as autosk

#Any sparse matrix with some filling
spars = sp.csr_matrix((10,20))
spars[4,0] = 20; spars[2,3] = 10; spars[0,3]=30
y = np.array(np.ones((10,1)))

modl = autosk.AutoML(time_left_for_this_task=300, seed=1, per_run_time_limit=60, tmp_dir='/tmp/autosk_tmp', output_dir='/tmp/autosk_out', delete_tmp_folder_after_terminate=False,
                     initial_configurations_via_metalearning=None,
                     include_estimators=['random_forest'])

# Error happens here!
modl.fit(spars, y)

to turn pre-processing off use no_preprocessing

AutoSklearnClassifier( include_preprocessors=('no_preprocessing',))

It would be great to have an explicit list of preprocessors and estimators that could be used in include_preprocessors and include_estimators parameters of AutoSklearnClassifier.

which paper to cite?

Hey.
Which paper should someone cite using this package?
I think that should be in the readme or on the website.

Also there is mention of a paper in the icml workshop, but no link to the paper.

Thanks,
Andy

Using other classifiers with sklearn signatures

The authors of XGBoost have recently added wrappers to have the same signatures as sklearn classifiers.

https://github.com/dmlc/xgboost/blob/master/demo/guide-python/sklearn_examples.py

Is there an easy way to add this or other classifiers to the list of trained and optimized classifiers?

Cluster usage primer

I am trying to get autosklearn to run on a cluster of redhat machines. Hopefully I will succeed by myself but just want to point out that the proper application domain for autosklearn is in massive parallelism. So it would be great to have a canonical example on how to set up autosklearn in a cluster environment.

Get best model predictions without running ensembles

When running a modified version of the script in which I don't want ensembles (ie. without ensemble builder script) just the best model to make a prediction over my test dataset, I get the obvious error that self._ensemble._get_model_identifiers() is NoneType.

Is that the intended behaviour such that one must create an ensemble of 1 or should I file it as a bug?

save_datamanager SystemError: error return without exception set

clf.fit(X_train_t, y_train, metric = 'f1_metric')

File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 271, in fit
feat_type, dataset_name)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 262, in fit
return self._fit(loaded_data_manager)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 348, in _fit
data_manager_path = self._backend.save_datamanager(datamanager)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/util/backend.py", line 124, in save_datamanager
pickle.dump(datamanager, fh, -1)
SystemError: error return without exception set

clf = AutoSklearnClassifier( time_left_for_this_task=300, per_run_time_limit=90, ml_memory_limit=10000, resampling_strategy='cv', resampling_strategy_arguments={'folds':5})

Make ensemble a class

and pass the dataset name to the constructor, so it can output the dataset name in the logs.

posted on wrong repo

Using sklearn's OneVsRestClassifier in order to handle multilabel datasets is not possible, as ProjLogitCLassifier misses functionality that is necessary to use sklearn.base.clone

TypeError: Cannot clone object '<ParamSklearn.implementations.ProjLogit.ProjLogit object at 0x7f7761445450>' (type <class 'ParamSklearn.implementations.ProjLogit.ProjLogit'>): it does not seem to be a scikit-learn estimator it does not implement a 'get_params' methods.

Binary Classifier experiment - log loss

Hi,
Fantastic Library!
I was just wondering, i am trying to use the library for a binary classifier experiment using the log loss function to train the model. This is for a university experiment around benchmarking different models. Would you have time to provide an example of how to use the library to achieve the above goal.
Also, with an included visualisation on how the model learns.
Many thanks,
Best,
Andrew

autosklearn can not load models by filename

After the run seem to have completed successfully (some completed runs logged in SMAC logs) autosklearn fails with the following error message:

Traceback (most recent call last):
  File "truffles-autosklearn-multy.py", line 82, in spawn_classifier
    c.fit(X_train, y_train, metric='f1_metric')
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 271, in fit
    feat_type, dataset_name)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 262, in fit
    return self._fit(loaded_data_manager)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 526, in _fit
    self._load_models()
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 647, in _load_models
    self.models_ = self._backend.load_all_models(seed)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/util/backend.py", line 171, in load_all_models
    models = self.load_models_by_file_names(model_files)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/util/backend.py", line 185, in load_models_by_file_names
    seed = int(basename_parts[0])
ValueError: invalid literal for int() with base 10: 'tmpJmPf2D'

After ensemble has been made, retrain models on full training data

I noticed that auto-sklearn reserves about one third of the training data for building the ensemble. This means that the individual models could have seen 50% more data than they currently do, which can have a substantial impact if the number of training examples is small. E.g., I tried running auto-sklearn for 7 hours on the Kaggle Titanic competition and got a score of about 0.799, which is not bad, but also not great.

Suggestion: retrain individual models on full training data after ensemble has been built. One difficulty is that the hyperparameters specification should be invariant on the number of training examples. On the other hand, it may not matter much either way since this proposal wouldn't change the number of training examples the models see by more than a factor of 1,5.

Better API options to include/exclude classifiers/preprocessors

ImportError: No module named classification

I am facing a weird error whereby the autosklearn.classification is missing.

I installed autosklearn into a mini conda environment as requested on the install guide.

Any tips? :)

import autosklearn.classification
Traceback (most recent call last):
File "", line 1, in
import autosklearn.classification
ImportError: No module named classification

I can access fine autosklearn via the command line in the environment

(auto2)[root@CentOS-72-64-minimal ~]# autosklearn
usage: autosklearn [-h] [-c CONFIG] [--output-dir OUTPUT_DIR]
[--temporary-output-directory TEMPORARY_OUTPUT_DIRECTORY]
[--keep-output] [--time-limit TIME_LIMIT]
....

the run completes but exceptions are generated

Process PoolWorker-13:
Traceback (most recent call last):
Process PoolWorker-12:
  File "/home/ekobylkin/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
  File "/home/ekobylkin/anaconda2/lib/python2.7/multiprocessing/process.py", line 261, in _bootstrap
    util._exit_function()
    self.run()
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/base_interface.py", line 42, in signal_handler
  File "/home/ekobylkin/anaconda2/lib/python2.7/multiprocessing/util.py", line 305, in _exit_function
    _run_finalizers(0)
  File "/home/ekobylkin/anaconda2/lib/python2.7/multiprocessing/util.py", line 250, in _run_finalizers
    def _run_finalizers(minpriority=None):
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/base_interface.py", line 42, in signal_handler
    evaluator.finish_up()
AttributeError: 'NoneType' object has no attribute 'finish_up'
    evaluator.finish_up()
AttributeError: 'NoneType' object has no attribute 'finish_up'

Running search in parallel?

Can it be done? Is this feature being developed?

Speed and time budgets

In a text classification task the SGDClassifier needs just a few minutes to get to the same result as auto-sklearn that I let running for 20 hours. Lesser time budget for auto-sklearn resulted in absolute or relative failure in prediction.

I wonder if there is a strategy to try the fastest algorithms first and if time is up use their results at least?

Another question is about the recommended per_run_time_limit value. Is there a rule of thumb choosing it?

SGDClassifier Precision: 0.20 Test FR Recall: 0.53 F1: 0.29
Auto-sklearn Precision: 0.28 Recall: 0.31 F1: 0.29 classifier.fit(X_train, y_train, metric='f1_metric')
AutoSklearnClassifier( time_left_for_this_task=72000, per_run_time_limit=19000, ml_memory_limit=10000)

Failed building wheel for scikit-learn when installing from requ.txt

I have removed scikit-learn 0.17 from anaconda hoping to be able to install the version autosklearn requires. When running pip install -r https://raw.githubusercontent.com/automl/auto-sklearn/master/requ.txt

I get a rather cryptic message

Building wheels for collected packages: scikit-learn
Running setup.py bdist_wheel for scikit-learn: started
Running setup.py bdist_wheel for scikit-learn: finished with status 'error'
Complete output from command /home/USERNAME/anaconda2/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-Caf5WC/scikit-learn/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" bdist_wheel -d /tmp/tmpgPVyFnpip-wheel- --python-tag cp27:
Failed building wheel for scikit-learn

The problem is actually a missing compiler. This fixes it:
sudo apt-get update && sudo apt-get upgrade && sudo apt-get install build-essential
for RedHat (6.5) sudo yum groupinstall 'Development Tools'

Return instance of best model

Is it against the philosophy of this project to return an instance of the most performant model as sklearn-hyperopt does?

Settings for short test run?

Are there automl settings that would bring the training time (on the digits set for example )to something negligible (<10 minutes) for quick experimentation and code integration tests?

I'm currently experimenting with putting automl in my code, but the ~1 hour training time is a huge bottleneck. A setting for a quick throwaway run would be really helpful.

Pima Indians Diabetes dataset accuracy with AutoSKLearn is 55%?

Just ran this snippet to test the accuracy of results with autosklearn and got a terrible 55%. Any tips why? :)

import numpy as np
import urllib
import autosklearn.classification
import sklearn.datasets
url = "http://goo.gl/j0Rvxq"
raw_data = urllib.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter=",")
print(dataset.shape)
X = dataset[:,0:7]
y = dataset[:,8]
indices = np.arange(X.shape[0])
np.random.shuffle(indices)
X = X[indices]
y = y[indices]
X_train = X[:700]
y_train = y[:700]
X_test = X[700:]
y_test = y[700:]
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
print(automl.score(X_test,y_test))`

Explicitly specified tmp/output folders exceptions

Hi, with an explicitly specified folder containing spaces, I'm getting the following exception:

[INFO] [2016-02-05 16:15:58,126:autosklearn.util.submit_process] Calling: runsolver --watcher-data /dev/null -W 42 -d 5 python -m autosklearn.ensemble_selection_script --auto-sklearn-tmp-directory /library/auto ml/autosklearn_tmp_acc_metric --basename 85e3933b9df7e236e76d83ace9b5bbc3 --task multiclass.classification --metric acc_metric --limit 37.1727719307 --output-directory /library/auto ml/autosklearn_output_acc_metric --ensemble-size 50 --ensemble-nbest 50 --auto-sklearn-seed 1 --max-iterations -1

Error occurred while running SMAC
>Error Message:Was passed main parameter 'ml/autosklearn_tmp_acc_metric/85e3933b9df7e236e76d83ace9b5bbc3.scenario' but no main parameter was defined
>Encountered Exception:ParameterException

If I try to escape the space, I've the following exception from SMAC (and a folder "auto\ ml" is created):

[INFO] [2016-02-05 16:24:04,955:autosklearn.util.submit_process] Calling: runsolver --watcher-data /dev/null -W 40 -d 5 python -m autosklearn.ensemble_selection_script --auto-sklearn-tmp-directory /library/auto\ ml/autosklearn_tmp_acc_metric --basename 85e3933b9df7e236e76d83ace9b5bbc3 --task multiclass.classification --metric acc_metric --limit 35.2403550148 --output-directory /library/auto\ ml/autosklearn_output_acc_metric --ensemble-size 50 --ensemble-nbest 50 --auto-sklearn-seed 1 --max-iterations -1

Error occurred while running SMAC
>Error Message:Option File (JCommander @ParameterFile) does not exist: 85e3933b9df7e236e76d83ace9b5bbc3.scenario
>Encountered Exception:ParameterException

How save model in file?

Hi all,
I took example (from docs), and i want to save classifier to file.
But if:

with open('dump_autosk.pkl', 'wb') as fio:
    pickle.dump(automl, fio)

I got

TypeError: Pickling an AuthenticationString object is disallowed for security reasons

Can i save model in file?

Multiclass prediction outputs index instead of class label

For multiclassification task output is an index, which makes result ambiguous, e.g. [0 ... 7] instead of [1 ... 8]

class AutoML(BaseEstimator, multiprocessing.Process):
   ...
    def predict(self, X):
        return np.argmax(self.predict_proba(X), axis=1)

Does it work as expected?

Hyperparameter floating point precision in configuration search space

Not critical

When adding a UniformFloat or Constant hyperparameter to the configuration space of any component, the string conversion to the 'parameter configuration space file' has only 10 decimal digits precision that mismatches the native float precision (53 bits) when running auto-sklearn and setting the hyperparameter for a component.

Silence subprocesses

This generates gobs of output. Need a silent mode. Standard context manager approaches to hiding stdout don't work here.

Change requrements to reflect the move to sklearn 0.17

I have upgraded autosklearn a few weeks ago and it dropped the requirement for sklearn 0.16.1. So I could successfully run it with current 0.17.x release. Now I am installing autosklearn from a scratch and can not get it over the sklearn 0.16.1 requirement.
Is it something that has to be changed in requ.txt or there is more to it?
P.S. first install was on CentOS now it is on Ubuntu 14.04

Generate a model file and reuse model to classify new samples (eg streaming big data)

Can autosklearn generate a model file that can be reused for classifying new data? Would be useful for classifying big data streams.

Error with example

Hi all,
I took example (from docs), and i got error:

Traceback (most recent call last):
  File "/home/warmonger/Develop/AutoInformatisation/TagEmitter/src/pydigest/exampl.py", line 25, in <module>
    print automl.score(X_test, y_test)
  File "/home/warmonger/Develop/venv/pocket2/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 360, in score
    prediction = self.predict(X)
  File "/home/warmonger/Develop/venv/pocket2/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 139, in predict
    return super(AutoSklearnClassifier, self).predict(X)
  File "/home/warmonger/Develop/venv/pocket2/local/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 329, in predict
    raise ValueError("No models fitted!")
ValueError: No models fitted!

pip freeze:

alembic==0.7.6
AutoSklearn==0.0.1.dev0
beautifulsoup4==4.3.2
blinker==1.3
boto==2.38.0
bz2file==0.98
cffi==1.1.0
chardet==2.3.0
cma==1.1.6
cryptography==0.9
cssselect==0.9.1
Cython==0.22
DAWG-Python==0.7.2
decorator==4.0.2
docopt==0.6.2
enum34==1.0.4
Flask==0.10.1
Flask-Admin==1.1.0
Flask-Admin-Profiler==0.0.1
Flask-DebugToolbar==0.10.0
Flask-Migrate==1.4.0
Flask-Script==2.0.5
Flask-SQLAlchemy==2.0
Flask-WTF==0.11
gensim==0.11.1.post1
gevent==1.0.2
greenlet==0.4.7
grequests==0.2.0
HPOlib==0.1.0
HPOlibConfigSpace==0.1.dev0
idna==2.0
ipaddress==1.0.7
itsdangerous==0.24
Jinja2==2.7.3
langdetect==1.0.5
liac-arff==2.0.2
lockfile==0.10.2
lxml==3.4.4
Mako==1.0.1
MarkupSafe==0.23
matplotlib==1.4.3
mock==1.0.1
MySQL-python==1.2.5
naiveBayesClassifier==0.1.3
ndg-httpsclient==0.4.0
networkx==1.10
nltk==3.0.2
nose==1.3.7
numpy==1.9.0
objgraph==2.0.0
pandas==0.16.0
ParamSklearn==0.1.dev0
pocket==0.3.5
protobuf==2.6.1
psutil==3.1.1
psycopg2==2.6
pyasn1==0.1.7
pycparser==2.13
-e git+https://bitbucket.org/mfeurer/pymetalearn/@2767d4d9eca801ad23247ced586a91957ef583e5#egg=pyMetaLearn-master
pymongo==3.0.3
pymorphy2==0.8
pymorphy2-dicts==2.4.393442.3710985
pyOpenSSL==0.15.1
pyparsing==2.0.3
python-dateutil==2.4.2
pytz==2015.4
PyYAML==3.11
readability-lxml==0.5.1
requests==2.7.0
rutermextract==0.2
scikit-learn==0.15.2
scipy==0.14.0
six==1.9.0
smart-open==1.2.1
SQLAlchemy==1.0.4
textblob==0.9.0
topia.termextract==1.1.0
umemcache==1.6.3
Werkzeug==0.10.4
WTForms==2.0.2
zope.interface==4.1.2

improve error handling

All error thrown in SMAC are not shown to the user. For example, one of the required packages was not properly installed in my case and hence, all target algorithm run crashed, but it was not shown to me and the code crashed later with an incomprehensible error message. Please add a proper DEBUG mode. (btw: some modules use logging and some use print statements)

Backend.load_datamanager() has masleading name

because it loads a pickled data manager. The data manager is loaded when a data manager class initialized.

Example from documentation (digits) - score

I'm completly new to auto-sklearn. I've run example from documentation on digits dataset and print(automl.score...) gives me -0.0121288... instead of 0.98. I've noticed that "1 - 0.0121288"... is more or less 0.98, but I want to be sure that is not weird coincidence.
If everything is correct, there should be note in documentation about this.

Confused directories of .auto-sklearn/predictions_ensemble

As of 3be0189, the whole thing is broken, as tested on a fresh dedicated Ubuntu 15.10 installation.

When running example/example1.py, tmp_dir/ensemble_err_1.log is full of messages like this:

[DEBUG] [13:06:20:ensemble_selection_script.py] Prediction directory /tmp/autoslearn_example_tmp/.auto-sklearn/predictions_ensemble does not exist!

No kind of optimization is performed, and the script finishes with:

Traceback (most recent call last):
  File "example1.py", line 31, in <module>
    main()
  File "example1.py", line 26, in main
    automl.fit(X_train, y_train, dataset_name='digits')
  File "/usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 255, in fit
    feat_type, dataset_name)
  File "/usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 244, in fit
    return self._fit(loaded_data_manager)
  File "/usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 456, in _fit
    self._load_models()
  File "/usr/local/lib/python2.7/dist-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 542, in _load_models
    raise ValueError('No models fitted!')
ValueError: No models fitted!

Further investigation pointed out that tmp_dir/.auto-sklearn/predictions_ensemble doesn’t exist indeed; instead, it is created as ./.auto-sklearn/predictions_ensemble — in the current directory.

The bug seems to be introduced by 3be0189; installing a version as of previous commit, 7666120, appears to work fine.

External Components are not loaded into smac child processes and external components subclasess

After loading an external component feature as directed in the manual with:

from component import DeepFeedNet
add_classifier(DeepFeedNet.DeepFeedNet)
autosk.AutoML(include_estimators=['DeepFeedNet']))

In the main process it does not generate error or warning, but after reviewing the smac logs, this error is reported:
[ Shortened... ]
16:57:30.806 [CLI TAE (STDERR Thread - #0)] WARN c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - [PROCESS-ERR](hp_value, hyperparameter))
16:57:30.807 [CLI TAE (STDERR Thread - #0)] WARN c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - [PROCESS-ERR] ValueError: Hyperparameter instantiation 'DeepFeedNet' is illegal for hyperparameter classifier:choice, Type: Categorical, Choices: {adaboost, bernoulli_nb, decision_tree, extra_trees, gaussian_nb, gradient_boosting, k_nearest_neighbors, lda, liblinear_svc, libsvm_svc, multinomial_nb, passive_aggressive, proj_logit, qda, random_forest, sgd}, Default: random_forest

Which after further review the error is (presumably) at the loading of the classifiers in configuration of datamanager in the subprocesses:

 193       cs = get_configuration_space(D.info)
 194       configuration = configuration_space.Configuration(cs, params)

SMAC is complaining about the lack of Intel MKL

13:12:38.766 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - > Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.

Is this expected?
MKL seems to be installed by anaconda

conda install mkl
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ....
Solving package specifications: .........

# All requested packages already installed.
# packages in environment at /home/ekobylkin/anaconda2:
#
mkl                       11.3.1                        0

 ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd "/data/atskln_tmp" ;  runsolver --watcher-data /dev/null -W 1865 -d 30 -M 22000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout /data/atskln_tmp/.auto-sklearn/datamanager.pkl 1865.0 2147483647 -1 -balancing:strategy 'none' -classifier:__choice__ 'sgd' -classifier:sgd:alpha '1.0E-4' -classifier:sgd:average 'False' -classifier:sgd:eta0 '0.01' -classifier:sgd:fit_intercept 'True' -classifier:sgd:learning_rate 'optimal' -classifier:sgd:loss 'log' -classifier:sgd:n_iter '20' -classifier:sgd:penalty 'l2' -imputation:strategy 'mean' -one_hot_encoding:minimum_fraction '0.01' -one_hot_encoding:use_minimum_fraction 'True' -preprocessor:__choice__ 'no_preprocessing' -rescaling:__choice__ 'min/max'
13:12:38.766 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The last 1 lines of output we saw were:
13:12:38.766 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - > Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.
13:12:38.774 [CLI TAE (Master Thread - #0)] DEBUG c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - Run <Instance:1, Seed:-1, Config:0x000F, Kappa:1865.0, Execution Config: 0x0001> ==> <CRASHED, 0.0, 0.0, 0.0, -1,ERROR: Wrapper did not output anything that matched the expected output ("Result of algorithm run:..."). Please try executing the wrapper directly> W:(198209.0) is completed
13:12:38.783 [main] DEBUG c.u.c.b.a.i.c.ClassicInitializationProcedure - Initialization: Completed run for config (0x000F) on instance 1 with seed -1 and captime 1865.0 => Result: CRASHED, 0.0, 0.0, 2.0, -1,ERROR: Wrapper did not output anything that matched the expected output ("Result of algorithm run:..."). Please try executing the wrapper directly, wallclock time: 198209.0 seconds

Refactor automl.py

automl.py serves to purposes; being the base class of a scikit-learn like estimator object and being directly usable with a data manager. It should rather be an abstract base class and all functionality regarding data loading and special fit semantics go into a special subclass. This would also free the AutoSklearnClassifier from being a subclass of multiprocessing.Process.

Python 3 support?

Hi all,

First off: Great project! I was very excited to come across this project and I'm eager to try it out. However, when I attempted to install auto-sklearn, I ran into several difficulties. In particular, I noticed that HPOlib -- a dependency of this project -- requires Python 2.7, which entails that auto-sklearn also requires Python 2.7.

Is there any way around this other than setting up a 2.7 environment? I'd love to integrate this into a Python 3 workflow.

Critical: wrong configuration evaluated over and over again.

Spotted by @hmendozap here. Thanks!

Make examples jupyter notebooks

OS X installation?

Might just be me but it's been stuck for a while, anyone experiencing this?

~/GitHub/cabinet/auto-sklearn $ python setup.py install
/Library/Python/2.7/site-packages/setuptools/dist.py:285: UserWarning: Normalizing '0.0.1dev' to '0.0.1.dev0'
  normalized_version,
running install
Building runsolver
Makefile:23: runsolver.d: No such file or directory
Makefile:23: SignalNames.d: No such file or directory
Makefile:23: SyscallNames.d: No such file or directory
grep '#define[[:space:]]*__NR'  | grep -v '^/' | awk '{print $2}' | sed -e 's/^__NR_//' | awk '{printf "list[__NR_%s]=\"%s\";\n",$1,$1}' > tmpSyscallList.cc

Document AutoSklearnClassifier constructor options

Dear auto-sklearn team,
I have just learned about this project and am very excited to try to include it into my modelling flow!

It seems the command line option names for autosklearn are not the same as what AutoSklearnClassifier() constructor would accept. So I have kind of reverse engineered a few but I still can not figure out whether it is possible to specify task_type for example.
task_type="binary.classification"
is rejected by the AutoSklearnClassifier() constructor.

I understand this is a very young project and is actively worked on. Will be happy to supply you with the feedback from the fields as I am actively running modelling experiments on various datasets available at my company. Currently I am successfully using scikit learn with SGD Classifier for one. Is there a better way to reconnect with you in a forum or a chat somewhere to ask questions or give feedback?

External components use class name and internal ones use module name

As the title says, when using the add_component() method the name that auto-sklearn sees it is the one of the class, but after reviewing the name that auto-sklearn assigns to internal components is the name of the module that imports the class.

ufunc 'isfinite' not supported for the input types

I am running AutoSklearn on my own data loaded with http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_files.html#sklearn.datasets.load_files with the same structure as in http://scikit-learn.org/stable/datasets/twenty_newsgroups.html

c = SGDClassifier(alpha=0.01, n_iter=10, penalty='l2', loss="log" , random_state=42, class_weight='auto')
works fine but

c = AutoSklearnClassifier( time_left_for_this_task=300, per_run_time_limit=90, ml_memory_limit=10000)

fails

in predict
classifier.fit(X_train, y_train)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 271, in fit
feat_type, dataset_name)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 260, in fit
encode_labels=False)
File "/home/centos/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/data/xy_data_manager.py", line 27, in init
self.info['has_missing'] = np.all(np.isfinite(data_x))
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''