Giter Club home page Giter Club logo

cooka's Introduction

Python Versions Downloads PyPI Version

Doc | 简体中文

Cooka is a lightweight and visualization toolkit to manage datasets and design model learning experiments through web UI. It's using DeepTables and HyperGBM as experiment engine to complete feature engineering, neural architecture search and hyperparameter tuning automatically.

DataCanvas AutoML Toolkit

Features overview

Through the web UI provided by cooka you can:

  • Add and analyze datasets
  • Design experiment
  • View experiment process and result
  • Using models
  • Export experiment to jupyter notebook

Screen shots:

The machine learning algorithms supported are :

  • XGBoost
  • LightGBM
  • Catboost

The neural networks supported are:

  • WideDeep
  • DeepFM
  • xDeepFM
  • AutoInt
  • DCN
  • FGCNN
  • FiBiNet
  • PNN
  • AFM
  • ...

The search algorithms supported are:

  • Evolution
  • MCTS(Monte Carlo Tree Search)
  • ...

The supported feature engineering provided by scikit-learn and featuretools are:

  • Scaler

    • StandardScaler
    • MinMaxScaler
    • RobustScaler
    • MaxAbsScaler
    • Normalizer
  • Encoder

    • LabelEncoder
    • OneHotEncoder
    • OrdinalEncoder
  • Discretizer

    • KBinsDiscretizer
    • Binarizer
  • Dimension Reduction

    • PCA
  • Feature derivation

    • featuretools
  • Missing value filling

    • SimpleImputer

It can also extend the search space to support more feature engineering methods and modeling algorithms.

Installation

Using pip

The python version should be >= 3.6, for CentOS , install the system package:

pip install --upgrade pip
pip install cooka

Start the web server:

cooka server

Then open http://<your_ip:8000> with your browser to use cooka.

By default, the cooka configuration file is at ~/.config/cooka/cooka.py, to generate a template:

mkdir -p ~/.config/cooka/
cooka generate-config > ~/.config/cooka/cooka.py

Using Docker

Launch a Cooka docker container:

docker run -ti -p 8888:8888 -p 8000:8000 -p 9001:9001 -e COOKA_NOTEBOOK_PORTAL=http://<your_ip>:8888 datacanvas/cooka:latest

Open http://<your_ip:8000> with your browser to visit cooka.

Citation

If you use Cooka in your research, please cite us as follows:

Haifeng Wu, Jian Yang. Cooka: A lightweight and visual AutoML system. https://github.com/DataCanvasIO/Cooka, 2021. Version 0.1.x

@misc{cooka,
  author={Haifeng Wu, Jian Yang},
  title={{Cooka}: {A lightweight and visual AutoML system}},
  howpublished={https://github.com/DataCanvasIO/Cooka},
  note={Version 0.1.x},
  year={2021}
}

DataCanvas

Cooka is an open source project created by DataCanvas.

cooka's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cooka's Issues

sklearn problem

Send train process event:
http://localhost:8000/api/dataset/iris_test_1/feature-series/default/train-job/train_job_iris_test_1_HyperGBM_20220705134148885806
{"type": "optimize", "status": "succeed", "took": 0.16978836059570312, "datetime": 1656999719237, "extension": {"trial_no": 10, "status": "succeed", "extension": {"reward": 3.8680241084247187, "elapsed": 0.16978836059570312, "params": {"missing": NaN, "reg_alpha": 0.01, "learning_rate": 0.001, "missing_values": NaN, "n_estimators": 200, "hp_or": 0, "reg_lambda": 0.01, "max_depth": 5, "hp_lazy": 0}}}}
07-05 13:41:59 I hypernets.c.callbacks.py 196 - trial end. reward:3.8680241084247187, improved:False, elapsed:0.16978836059570312
07-05 13:41:59 I hypernets.c.callbacks.py 197 - Total elapsed:3.091360569000244
07-05 13:41:59 I hypernets.c.callbacks.py 99 - Early stopping on trial : 10, best reward: 0.32170346973301056, best_trial: 5
07-05 13:41:59 I hypergbm.experiment.py 837 - fit_transform final_ensemble
07-05 13:41:59 E hypernets.e._experiment.py 85 - ExperiementID:[None] - ensemble: Unknown label type: (68 6.2
31 5.4
107 7.3
25 5.0
12 4.8
133 6.3
17 5.1
111 6.4
79 5.7
129 7.2
35 5.0
105 7.6
18 5.7
57 4.9
27 5.2
Name: tabular-toolbox__Y, dtype: float64,)
Traceback (most recent call last):
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/hypernets/experiment/_experiment.py", line 75, in run
y_eval=self.y_eval, eval_size=self.eval_size, **kwargs)
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/hypergbm/experiment.py", line 1116, in train
return super().train(hyper_model, X_train, y_train, X_test, X_eval, y_eval, **kwargs)
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/hypergbm/experiment.py", line 839, in train
step.fit_transform(hyper_model, X_train, y_train, X_test=X_test, X_eval=X_eval, y_eval=y_eval, **kwargs)
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/hypergbm/experiment.py", line 549, in fit_transform
ensemble.fit(X_eval, y_eval)
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/tabular_toolbox/ensemble/base_ensemble.py", line 85, in fit
self.fit_predictions(est_predictions, y)
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/tabular_toolbox/ensemble/voting.py", line 106, in fit_predictions
score = self.scorer._score_func(y_true, mean_predictions, **self.scorer._kwargs) * self.scorer._sign
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2237, in log_loss
lb.fit(y_true)
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/sklearn/preprocessing/label.py", line 297, in fit
self.classes
= unique_labels(y)
File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/sklearn/utils/multiclass.py", line 98, in unique_labels
raise ValueError("Unknown label type: %s" % repr(ys))
ValueError: Unknown label type: (68 6.2
31 5.4
107 7.3
25 5.0
12 4.8
133 6.3
17 5.1
111 6.4
79 5.7
129 7.2
35 5.0
105 7.6
18 5.7
57 4.9
27 5.2
Name: tabular-toolbox__Y, dtype: float64,)
[13:41:59] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[13:41:59] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[13:41:59] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Send train process event:
http://localhost:8000/api/dataset/iris_test_1/feature-series/default/train-job/train_job_iris_test_1_HyperGBM_20220705134148885806
{"type": "searched", "status": "succeed", "took": 3.3770127296447754, "datetime": 1656999719469, "extension": null}
Send train process event:
http://localhost:8000/api/dataset/iris_test_1/feature-series/default/train-job/train_job_iris_test_1_HyperGBM_20220705134148885806
{"type": "evaluate", "status": "failed", "took": 2.5987625122070312e-05, "datetime": 1656999719487, "extension": {}}
Traceback (most recent call last):
File "/root/cooka/dataset/iris_test_1/experiments/iris_test_1_2/train.py", line 285, in
raise e
File "/root/cooka/dataset/iris_test_1/experiments/iris_test_1_2/train.py", line 253, in
y_pred = estimator.predict(X_test)
AttributeError: 'NoneType' object has no attribute 'predict'

pip install cooka= 0.1.5
I have try sklean==0.23.1/0.24.2/1.0.0/1.0.5
but it is not work.
all regression task report the same problem
谢谢

expert mode

expert mode has additional settings:

  • speed priority
  • performace priority
  • test set selection for evaluation

No module named 'tensorflow.python.keras.preprocessing'

Hi,
I just installed Cooka by "pip install cooka"; however, the terminal reports ModuleNotFoundError: No module named 'tensorflow.python.keras.preprocessing' when starting cooka web server.
It might appear to be a versioning problem?

support new features

  • manually skip
  • manually stop
  • upload , save data in parquet
  • add comments
  • use GPU
  • export the notebook of used models

Working with tornado 6.1

Cooka requires tornado==6.0 4 and it's not friendly to others packages like jupyterab.
It should be compatible with the currently mainstream version of tornado==6.1

Failed to load data

Hi DataCanvas team,

I'm a surgeon and I'm now trying to use Cooka for my research. However, I have failed to load data when I using "import " function. Copy file is successful, but the load data is stucked. And I noticed that the Terminal kept repeating the same message. I attached the Figs of these errors, and the detailed infos of my platforms.
Any help you could provide is sincerely appreciated.

All the best,
Wenyi Jin

image
image

Detailed infos about this error:
Window 10
python: v3.7
GPU: NVIDA 3060
Chrome: v93.0.4577.82, 64 bit
File: a csv file which's sep is ","
File path: D:/BioI/step2_uniLogitExp.csv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.