Comments (4)
Hi @yiannis-gkoufas,
I understand that you were able to train ML models with AutoML but there is problem with predictions only. Could you please provide code that you are using for computing predictions?
from mljar-supervised.
Hi @pplonski!
I use the same constructor for AutoML and pass a dataframe.
automl = AutoML(results_path=str(model_directory),
mode="Compete",
total_time_limit=600 * 600,
golden_features=True,
features_selection=True,
ml_task="binary_classification")
Could it be an issue with the ensemble model?
from mljar-supervised.
Thank you @yiannis-gkoufas for response. It looks like some bug with computing predictions for Stacked Ensemble. Is it possible to share full code and data to reproduce the issue?
from mljar-supervised.
This code:
from sklearn.model_selection import train_test_split
from supervised import AutoML
import pandas as pd
if __name__ == '__main__':
df = pd.read_csv(
"https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv",
skipinitialspace=True,
)
X_train, X_test, y_train, y_test = train_test_split(
df[df.columns[:-1]], df["income"], test_size=0.25
)
automl = AutoML(results_path="./model",
mode="Compete",
total_time_limit=600 * 600,
golden_features=True,
features_selection=True,
ml_task="binary_classification")
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)
print(predictions)
reproduced the issue for me, because the ensemble stacked is identified as the best model.
It takes a while to run ofcourse. The message I got:
Traceback (most recent call last):
File "/Users/prezi/Code/mljar_issue/mljar_issue/main.py", line 23, in <module>
predictions = automl.predict(X_test)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/automl.py", line 451, in predict
return self._predict(X)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/base_automl.py", line 1503, in _predict
predictions = self._base_predict(X)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/base_automl.py", line 1465, in _base_predict
predictions = model.predict(X, X_stacked)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/ensemble.py", line 434, in predict
y_predicted_from_model = model.predict(X_stacked)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/model_framework.py", line 448, in predict
y_p = learner.predict(X_data)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/algorithms/sklearn.py", line 66, in predict
return self.model.predict_proba(X)[:, 1]
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/sklearn/ensemble/_forest.py", line 947, in predict_proba
X = self._validate_X_predict(X)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/sklearn/ensemble/_forest.py", line 641, in _validate_X_predict
X = self._validate_data(
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/sklearn/base.py", line 608, in _validate_data
self._check_feature_names(X, reset=reset)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/sklearn/base.py", line 535, in _check_feature_names
raise ValueError(message)
ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- 100_NearestNeighbors_prediction
- 101_NearestNeighbors_prediction
- 102_Xgboost_BoostOnErrors_prediction
- 102_Xgboost_prediction
- 103_Xgboost_prediction
- ...
Feature names seen at fit time, yet now missing:
- 100_NearestNeighbors_prediction_0_for_<=50K_1_for_>50K
- 101_NearestNeighbors_prediction_0_for_<=50K_1_for_>50K
- 102_Xgboost_BoostOnErrors_prediction_0_for_<=50K_1_for_>50K
- 102_Xgboost_prediction_0_for_<=50K_1_for_>50K
- 103_Xgboost_prediction_0_for_<=50K_1_for_>50K
- ...
from mljar-supervised.
Related Issues (20)
- sklearn/metrics/_scorer.py:548: FutureWarning HOT 2
- Get confidence scores for regression predictions HOT 1
- FutureWarning: The `needs_threshold` and `needs_proba` parameter. HOT 1
- What's the parameter sample_weight used for? HOT 1
- trained error HOT 1
- problem run in colab HOT 2
- UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
- report() not working in JupyterLab HOT 1
- Functionality to retrain or continue training models using the library.
- problem with automl._best_model() HOT 1
- Please document all preprocessing methods HOT 4
- Links to models are not working in report
- view individual CV metrics or CV metric AUC mean and standard deviations HOT 2
- Google colab - Feature selection not working HOT 7
- Fix issues from AutoML benchmark
- X has feature names, but StandardScaler was fitted without feature names
- 'module' object is not callable HOT 3
- Error after resume training from previous non finished training
- Creating a Simple enough UI that a person from Non-tech background could understand. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mljar-supervised.