pycaret / pycaret Goto Github PK

View Code? Open in Web Editor NEW

8.4K 130.0 1.7K 260.82 MB

An open-source, low-code machine learning library in Python

Home Page: https://www.pycaret.org

License: MIT License

Python 2.29% Jupyter Notebook 97.70% Dockerfile 0.01%

data-science citizen-data-scientists python machine-learning pycaret ml gpu time-series regression classification

pycaret's Introduction

An open-source, low-code machine learning library in Python

🎉🎉🎉 PyCaret 3.3 is now available. 🎉🎉🎉

`pip install --upgrade pycaret`

Docs • Tutorials • Blog • LinkedIn • YouTube • Slack

Overview
CI/CD
Code
Downloads
License
Community

Welcome to PyCaret

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.

In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, Optuna, Hyperopt, Ray, and few more.

The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise. PyCaret was inspired by the caret library in R programming language.

🚀 Installation

🌐 Option 1: Install via PyPi

PyCaret is tested and supported on 64-bit systems with:

Python 3.9, 3.10 and 3.11
Ubuntu 16.04 or later
Windows 7 or later

You can install PyCaret with Python's pip package manager:

# install pycaret
pip install pycaret

PyCaret's default installation will not install all the optional dependencies automatically. Depending on the use case, you may be interested in one or more extras:

# install analysis extras
pip install pycaret[analysis]

# models extras
pip install pycaret[models]

# install tuner extras
pip install pycaret[tuner]

# install mlops extras
pip install pycaret[mlops]

# install parallel extras
pip install pycaret[parallel]

# install test extras
pip install pycaret[test]

##

# install multiple extras together
pip install pycaret[analysis,models]

Check out all optional dependencies. If you want to install everything including all the optional dependencies:

# install full version
pip install pycaret[full]

📄 Option 2: Build from Source

Install the development version of the library directly from the source. The API may be unstable. It is not recommended for production use.

pip install git+https://github.com/pycaret/pycaret.git@master --upgrade

📦 Option 3: Docker

Docker creates virtual environments with containers that keep a PyCaret installation separate from the rest of the system. PyCaret docker comes pre-installed with a Jupyter notebook. It can share resources with its host machine (access directories, use the GPU, connect to the Internet, etc.). The PyCaret Docker images are always tested for the latest major releases.

# default version
docker run -p 8888:8888 pycaret/slim

# full version
docker run -p 8888:8888 pycaret/full

🏃‍♂️ Quickstart

1. Functional API

# Classification Functional API Example

# loading sample dataset
from pycaret.datasets import get_data
data = get_data('juice')

# init setup
from pycaret.classification import *
s = setup(data, target = 'Purchase', session_id = 123)

# model training and selection
best = compare_models()

# evaluate trained model
evaluate_model(best)

# predict on hold-out/test set
pred_holdout = predict_model(best)

# predict on new data
new_data = data.copy().drop('Purchase', axis = 1)
predictions = predict_model(best, data = new_data)

# save model
save_model(best, 'best_pipeline')

2. OOP API

# Classification OOP API Example

# loading sample dataset
from pycaret.datasets import get_data
data = get_data('juice')

# init setup
from pycaret.classification import ClassificationExperiment
s = ClassificationExperiment()
s.setup(data, target = 'Purchase', session_id = 123)

# model training and selection
best = s.compare_models()

# evaluate trained model
s.evaluate_model(best)

# predict on hold-out/test set
pred_holdout = s.predict_model(best)

# predict on new data
new_data = data.copy().drop('Purchase', axis = 1)
predictions = s.predict_model(best, data = new_data)

# save model
s.save_model(best, 'best_pipeline')

📁 Modules

Classification

Functional API	OOP API

Regression

Functional API	OOP API

Time Series

Functional API	OOP API

Clustering

Functional API	OOP API

Anomaly Detection

Functional API	OOP API

👥 Who should use PyCaret?

PyCaret is an open source library that anybody can use. In our view the ideal target audience of PyCaret is:

Experienced Data Scientists who want to increase productivity.
Citizen Data Scientists who prefer a low code machine learning solution.
Data Science Professionals who want to build rapid prototypes.
Data Science and Machine Learning students and enthusiasts.

🎮 Training on GPUs

To train models on the GPU, simply pass use_gpu = True in the setup function. There is no change in the use of the API; however, in some cases, additional libraries have to be installed. The following models can be trained on GPUs:

Extreme Gradient Boosting
CatBoost
Light Gradient Boosting Machine requires GPU installation
Logistic Regression, Ridge Classifier, Random Forest, K Neighbors Classifier, K Neighbors Regressor, Support Vector Machine, Linear Regression, Ridge Regression, Lasso Regression requires cuML >= 0.15

🖥️ PyCaret Intel sklearnex support

You can apply Intel optimizations for machine learning algorithms and speed up your workflow. To train models with Intel optimizations use sklearnex engine. There is no change in the use of the API, however, installation of Intel sklearnex is required:

pip install scikit-learn-intelex

🤝 Contributors

📝 License

PyCaret is completely free and open-source and licensed under the MIT license.

ℹ️ More Information

Important Links	Description
⭐ Tutorials	Tutorials developed and maintained by core developers
📋 Example Notebooks	Example notebooks created by community
📙 Blog	Official blog by creator of PyCaret
📚 Documentation	API docs
📺 Videos	Video resources
✈️ Cheat sheet	Community Cheat sheet
📢 Discussions	Community Discussion board on GitHub
🛠️ Release Notes	Release Notes

pycaret's People

Stargazers

Watchers

Forkers

moezali1 azharirfan dataprofessor jiaodaxiaozi ranga6897 tomzhang bobbyeasy divyeshardeshana williamlipro yeshwanth16 yinjianmin fangzuliang amosjoseph lunyang mingjunw mgc26 micseb ashishpatel26 laomagic kirtan-desai gridl ajayprakashm a3digit fcakyon fprabowo hongzhonglu tejamoy swlft udaypratapyati omrajkumar sofq timur9831 jingmouren mkvtvseries marcogandra dagut phillip1029 wsdjs imanifaiz sccaiofm nyk510 dcha0tic benthomasson henriquesilvadev suthzx bingoko neulearn dawn2295 ct3huang jmcass jason-lee-lxx conradbm bhankit1410 citron aburgool kaitokikuhara vishal-subedi dasarisasidhar moaisus intereses ronnielee135 azeemx fagan2888 lengpoh ssitb zeng8280 codebyteme matanbaruch1988 ashishkiitm bizmaercq zhihanyue bharatr21 ustmtrecap virajpai ameer05 jaykimbravekjh tiaozi foeinlove rallaking yixf-self nishnash54 mzyjiushiwo ngupta23 prhldk shaan360 datadevopscloud jaedukseo riazone harimenath manojitsaha gururajang huguensjean gwamakacharles premrajravishankar hiteshai mridulkatta plthiyagu surefirelin mayer613 7more0

pycaret's Issues

compare model bugg

compare mode failing here :

int32 issue

Hi,
I tried to use the package for classification problem, and when i entered the dataframe with target data in setup environment and ran the code, it showed no dtype for the dataframe, but it has dtype of 'int32'. When i changed the dtype to int64, it worked. Can you fix this issue.

AttributeError

Hi - I really enjoy the package! However, when executing the script on a server, I get the following error:

Traceback (most recent call last):
File "RF_pycaret.py", line 15, in
model1 = create_model('rf')
File "/home/../.conda/envs/pycaret/lib/python3.8/site-packages/pycaret/regression.py", line 1691, in create_model
display_id = display_.display_id
AttributeError: 'NoneType' object has no attribute 'display_id'

Any clue what's going on?
Thanks for your help!

Docstring spaces in Google Colab

Reformat docstrings for create_model and tune_model specifically as Google Colab cannot handle table spacing.

Colab link for Binary Classification Tutorial sends you to NLP Tutorial

On the Binary Classification Tutorial (CLF102) – Level Intermediate page, click on the Colab icon. You get sent to the NLP Tutorial notebook.

Problem with numba

Hi everyone!
When I try to run script where I use pycaret classification module this thing happens:

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\numba\core\typeconv\typeconv.py", line 4, in
from numba.core.typeconv import _typeconv
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/tamara.ciric/PycharmProjects/intelliq/model/risk/services/ml/pycaret_train.py", line 76, in
train(date=datetime(2019, 1, 1), sources=config.DATA_SOURCES)
File "C:/Users/tamara.ciric/PycharmProjects/intelliq/model/risk/services/ml/pycaret_train.py", line 43, in train
exp_clf = setup(X, target=config.Y)
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\pycaret\classification.py", line 880, in setup
from pycaret import preprocess
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\pycaret\preprocess.py", line 26, in
from pyod.models.knn import KNN
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\pyod_init_.py", line 4, in
from . import utils
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\pyod\utils_init_.py", line 11, in
from .stat_models import pairwise_distances_no_broadcast
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\pyod\utils\stat_models.py", line 11, in
from numba import njit
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\numba_init_.py", line 20, in
from numba.misc.special import (
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\numba\misc\special.py", line 3, in
from numba.core.typing.typeof import typeof
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\numba\core\typing_init_.py", line 1, in
from .context import BaseContext, Context
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\numba\core\typing\context.py", line 11, in
from numba.core.typeconv import Conversion, rules
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\numba\core\typeconv\rules.py", line 2, in
from .typeconv import TypeManager, TypeCastingRules
File "C:\ProgramData\Anaconda3_all\envs\intelliq_risk\lib\site-packages\numba\core\typeconv\typeconv.py", line 17, in
raise ImportError(msg % (url, reportme, str(e), sys.executable))
ImportError: Numba could not be imported.
If you are seeing this message and are undertaking Numba development work, you may need to re-run:

python setup.py build_ext --inplace

(Also, please check the development set up guide http://numba.pydata.org/numba-doc/latest/developer/contributing.html.)

If you are not working on Numba development:

Please report the error message and traceback, along with a minimal reproducer
at: https://github.com/numba/numba/issues/new

If more help is needed please feel free to speak to the Numba core developers
directly at: https://gitter.im/numba/numba

Thanks in advance for your help in improving Numba!

The original error was: 'DLL load failed: The specified module could not be found.'

If possible please include the following in your error report:

sys.executable: C:\ProgramData\Anaconda3_all\envs\intelliq_risk\python.exe

Tree visulization

Hi,

Thank You for a nice package.

Is there a way to visualize the tree output in Pycaret?
Like plot_tree or export_graphviz?

Creating separate Models

Please make a alternative model name's dictionary for Creating separate Models.
PyCaret doesn't recognize 'Extra Trees Classifier'

It would be helpful to get the names from this table

Vocab Size showing zero while using NLP (pycaret.nlp)

Can you please provides the solution about this, I try many data but vocab size is showing zero and not working create_model('lda'), showing error "cannot compute LDA over an empty collection (no terms)"

I attached screen shot also you can see below..?

interpret_model() problem with XGBoost regression

I was trying interpret_model with XGBoost for a regression model and had this error

I searched for it and seems to be a known issue of shap-0.32.1.
shap/shap#887
I updated to shap-0.35.0 and the problem was fixed. Maybe you should consider not using the shap-0.32.1 version.

for multi-class classification AUC is 0

Problem installing pycaret

Hi everyone,

I had some issues pip-installing pycaret. Some of my other libraries use the same underlying packages as pycaret, but require more recent versions.

Here are the error messages I get:

ERROR: prettierplot 0.1.1 has requirement scikit-learn>=0.22.1, but you'll have scikit-learn 0.22 which is incompatible.
ERROR: mlmachine 0.1.3 has requirement catboost>=0.22, but you'll have catboost 0.20.2 which is incompatible.
ERROR: mlmachine 0.1.3 has requirement scikit-learn>=0.22.1, but you'll have scikit-learn 0.22 which is incompatible.
ERROR: mlmachine 0.1.3 has requirement shap>=0.35.0, but you'll have shap 0.32.1 which is incompatible.
ERROR: mlmachine 0.1.3 has requirement xgboost>=1.0.2, but you'll have xgboost 0.90 which is incompatible.

Would it be a possibility to update your requirements file ?

Thank you

Bayesian Hyperparameter Optimization

Hi,
I was wondering if we can have Bayesian Hyperparameter Optimization technique used instead of Random Grid. This will help with speed of tuning and allow us to scrape through much larger grid scientifically. We can have this enhancement along with ability to add custom grid in tuning.

Thanks

Ipython exceptions for python

When using python directly Ipython is not supported hence display function doesn't work. Create an exception rule for future.

xgboost not working

First of all, I love this package. It is great and makes training, testing, and tuning so much simpler.

I am unable to compare or create an xgboost classification or an xgboost regression model. With the same dataset, all of the other models work except xgboost. For the classification, i get the error:
"attempt to get argmax of an empty sequence"

For the regression, i get the error:
"Found array with 0 feature(s) (shape=(4035, 0)) while a minimum of 1 is required."

Please advise.

Top Models from Compare Models

Hi,

I was wondering if there is a way to select top 'n' models from compare models. If we can have this feature, the whole training process can be automated. This would also help users to use it in platforms like KNIME/Power BI that do not support HTML output.

Thanks,
Riaz

Change Fold Strategy

In pycaret, the number of k-folds can be changed by using fold as an argument.
However, the fold strategy is fixed as random split (i.e. KFold), cannot be changed by the user. For general problems, there are many cases where you want to use Group KFold or TimeSplitFold.

So I propose that allow the user to change the fold strategy by allowing them to pass an instance that inherits _BaseFold https://github.com/scikit-learn/scikit-learn/blob/95d4f0841d57e8b5f6b2a570312e9d832e69debc/sklearn/model_selection/_split.py#L269.

It will be a fairly widespread change, but I think the change will make the project be more wonderful

Best Regards.

Dataframe constructor not properly called

While running a simple linear regression model on kaggle using pycaret, I get the error
DataFrame constructor not properly called on running setup. The dataset contains only two columns.

df=pd.read_csv("/kaggle/input/salary-data-simple-linear-regression/Salary_Data.csv")`
setup_data1 = setup(data = df, target = 'Salary', session_id=123)

Logs:

ValueError Traceback (most recent call last)
in
----> 1 setup_data1 = setup(data = df, target = 'Salary', session_id=123)

/opt/conda/lib/python3.6/site-packages/pycaret/regression.py in setup(data, target, train_size, sampling, sample_estimator, categorical_features, categorical_imputation, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_interaction, feature_ratio, interaction_threshold, transform_target, transform_target_method, session_id, silent, profile)
955 target_transformation = transform_target, #new
956 target_transformation_method = transform_target_method_pass, #new
--> 957 random_state = seed)
958
959 progress.value += 1

/opt/conda/lib/python3.6/site-packages/pycaret/preprocess.py in Preprocess_Path_One(train_data, target_variable, ml_usecase, test_data, categorical_features, numerical_features, time_features, features_todrop, display_types, imputation_type, numeric_imputation_strategy, categorical_imputation_strategy, apply_zero_nearZero_variance, club_rare_levels, rara_level_threshold_percentage, apply_untrained_levels_treatment, untrained_levels_treatment_method, apply_ordinal_encoding, ordinal_columns_and_categories, apply_cardinality_reduction, cardinal_method, cardinal_features, apply_binning, features_to_binn, apply_grouping, group_name, features_to_group_ListofList, apply_polynomial_trigonometry_features, max_polynomial, trigonometry_calculations, top_poly_trig_features_to_select_percentage, scale_data, scaling_method, Power_transform_data, Power_transform_method, target_transformation, target_transformation_method, remove_outliers, outlier_contamination_percentage, outlier_methods, apply_feature_selection, feature_selection_top_features_percentage, remove_multicollinearity, maximum_correlation_between_features, remove_perfect_collinearity, apply_feature_interactions, feature_interactions_to_apply, feature_interactions_top_features_to_select_percentage, cluster_entire_data, range_of_clusters_to_try, apply_pca, pca_method, pca_variance_retained_or_number_of_components, random_state)
2538 return(pipe.fit_transform(train_data),pipe.transform(test_data))
2539 else:
-> 2540 return(pipe.fit_transform(train_data))
2541
2542

/opt/conda/lib/python3.6/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
381 """
382 last_step = self._final_estimator
--> 383 Xt, fit_params = self._fit(X, y, **fit_params)
384 with _print_elapsed_time('Pipeline',
385 self._log_message(len(self.steps) - 1)):

/opt/conda/lib/python3.6/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
311 message_clsname='Pipeline',
312 message=self._log_message(step_idx),
--> 313 **fit_params_steps[name])
314 # Replace the transformer of the step with the fitted
315 # transformer. This is necessary when loading the transformer

/opt/conda/lib/python3.6/site-packages/joblib/memory.py in call(self, *args, **kwargs)
353
354 def call(self, *args, **kwargs):
--> 355 return self.func(*args, **kwargs)
356
357 def call_and_shelve(self, *args, **kwargs):

/opt/conda/lib/python3.6/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
724 with _print_elapsed_time(message_clsname, message):
725 if hasattr(transformer, 'fit_transform'):
--> 726 res = transformer.fit_transform(X, y, **fit_params)
727 else:
728 res = transformer.fit(X, y, **fit_params).transform(X)

/opt/conda/lib/python3.6/site-packages/pycaret/preprocess.py in fit_transform(self, dataset, y)
1957 def fit_transform(self,dataset,y=None):
1958 data = dataset.copy()
-> 1959 corr = pd.DataFrame(np.corrcoef(data.drop(self.target,axis=1).T))
1960 corr.columns = data.drop(self.target,axis=1).columns
1961 corr.index = data.drop(self.target,axis=1).columns

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in init(self, data, index, columns, dtype, copy)
483 )
484 else:
--> 485 raise ValueError("DataFrame constructor not properly called!")
486
487 NDFrame.init(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

Python 64 bit only

Website and git documentation to mention that it supports 64 bit only.

Lack of info in the create_model function documentation

In the create_model() function, after the model has been evaluated on a kfold cross validation and its ready to be used to make predictions on unseen data, a question remained (I've read the function documnetation and didn't find an answer): Is the model trained in the entire data set?

If yes, would it be great to have this piece of information stated in the function docs?

Adding SGD to the model list

Based on here. SGD mostly recommended for data more than 100K. Please add it to the list of models to have better comparsion.

Integrate Predictive Power Score (PPS)

How to get the value of compare_models table

When we use the make_model/ compare_models/tune_model function, a table containing much information will be returned. Is it any function used for get a specific row/column/value of the table?

Tune Model [WinError 5] Access is denied

I am using Windows 10 and Python 3.7.1. Whenever I run tune_model() for any model, I receive a [WinError 5] Access is denied error. The full error code is attached.

Please advise.

Error.txt

Unable to enter text for regression setup in databricks

Using pycaret.regression to build the model. But unable to enter after displaying column datatypes.

Normalization is not working

Hi, just noted that the example at https://pycaret.org/normalization doesn't normalize all numerical columns in dataset as in the example. Instead, only the target class if data type is numeric.

# Importing dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary', normalize = True)

WOE Encoder

would you please add WOE as a Encoder alternative?
Source
Library for WOE

Showing warning message instead of error

It would be perfect to inactive buttons for multi-class classification or showing a warning message instead of error.

URGENT HELP - conda distribution

upload pycaret on conda cloud so that people can do conda install just like pip install.

Cross validation with/without shuffling rows

Hi. Great package. Thank you.

Is there a way to select whether we want rows to be shuffled or not during CV? What is the base behavior? Not shuffling is critical for time series analysis.

Thanks

Not able to fill missing values

Hello,

According to Missing Value Imputation section, I am trying out to fill the missing values but the syntax is not mentioned. I am in search of the syntax to impute the missing values as mentioned on the page.

I tried to use the following code but that does not work.

import pycaret
from pycaret.datasets import get_data
hepatitis = get_data('hepatitis')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = hepatitis, target = 'Class',numeric_imputation='mean',categorical_imputation = 'mode')

The above code did not work and the missing values still exist in the dataset. Kindly, provide the syntax for the same.

Thanks

Specify "n_jobs" Parameters

Currently n_jobs=-1 is set in compare_models, tune_model and so on. But I think it would be more useful if user can specify it when the calculation is done on a shared server. I would appreciate if you could consider it.

Set n_jobs

How can i set n_jobs=-1

Custom Tuning Grid

Hi,

The current tune_model() I understand uses randomized grid search for tuning a model. However, how can I provide my own tuning grid for selected hyper-parameters. I don't think PyCaret currently offers this flexibility. If we can have this feature, it will be very instrumental.

Thanks

Multi target columns

Does it support multi target columns(two or more output features)?

Clustering create_model not working

The create_model for Kmeans or other clustering seems not to be working. I get the following error

SystemExit: (Value Error): Estimator Not Available. Please see docstring for list of available estimators.

even n_clusters is also not available.

Setup worked fine.

I am using colab for processing

Integrate custom accuracy function for CV in regression/classification?

Merging Stack_Model and Stacknet

Hi,

stack_model() and create_stacknet() are both wonderful features, something I havent seen being provided by any other package. However, intrinsically I feel the only difference in them is of layers. If we can have just one function with options of choosing number and estimators at each layer, we would not need two functions here.

Thanks.

Confidence interval for performance metrics

Is there any way to compute 95%CI for all performance metrics (especially the hold-out test set) since a single value doesn't really help that much to evaluate models robustness. Many thanks for the great work you have done and keep on doing.

ValueError: need at most 63 handles, got a sequence of length 65

When i try to run tune_model() on Windows server I receive the following error

Exception in thread QueueManagerThread: Traceback (most recent call last): File "d:\anaconda3\envs\pycaret\lib\threading.py", line 916, in _bootstrap_inner self.run() File "d:\anaconda3\envs\pycaret\lib\threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "d:\anaconda3\envs\pycaret\lib\site-packages\joblib\externals\loky\process_executor.py", line 615, in _queue_management_worker ready = wait(readers + worker_sentinels) File "d:\anaconda3\envs\pycaret\lib\multiprocessing\connection.py", line 859, in wait ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout) File "d:\anaconda3\envs\pycaret\lib\multiprocessing\connection.py", line 791, in _exhaustive_wait res = _winapi.WaitForMultipleObjects(L, False, timeout) ValueError: need at most 63 handles, got a sequence of length

I tried to run it on two different Windows servers using different Anaconda enviroments with Python 3.6.10 and 3.7.7 and had the same error.

Estimator Error

Tried regression data and catboost regressor found to be the best performing model, but after tuning the model tried to evaluate the model to view the hyperparameters got the below error
SystemExit: (Estimator Error): CatBoost estimator is not compatible with plot_model function, try using Catboost with interpret_model instead.

Couldnt find any tests

Hello,

First of all, thanks a lot for this comprehensive library.

I wanted to add cross platform continious integration workflow pr but couldnt locate the tests for the implemented functions. Where can i find them?

If there arent any, you should definitely consider adding unit/integration tests, otherwise it would be extremely hard to identify/localize bugs and errors.

Bests

Unattended run in pycaret.clustering and pycaret.anomaly

Support unattended run in clustering and anomaly by defining default value of input() in setup to "Y" when verbose is set to False. Currently verbose = False is over written internally due to supervised = False.

Meta model default in stacking

The default for meta-model is linear/logistic for classification/regression. Though it might work in certain cases, places where I have used, especially with re-stack = True, the model performance only improves when I have chosen one of the better performing models as meta model. Since most of the users would begin with Compared_models(), if there was a way to have the stacking default to best model from compare_model, it would be more convenient.

Integrate Kaggle API for datasets

Integration. Future enhancement.

Feature request: training and predicting time to the compare_models

Please add training and predicting time to the compare_models table.

Recursive Feature Selection is too slow

Please make it verbose to show results in each iterations. may be stopping at the middle is useful.
I test it for this tutorial and after 10min I stopped the process

Some of the required packages are missing in Conda and Conda-Forge.

conda install -f -y -q -c conda-forge --file requirements.txt

PackagesNotFoundError: The following packages are not available from current channels:

cufflinks==0.17.0
kmodes==0.10.1
datefinder==0.7.0
yellowbrick==1.0.1
datetime==4.3

Alternatives would be appreciated. Thanks in advance

Parellelizing compare_models

Hi team,

Wonderful work with pycaret! I'm wondering if you might be open to a PR that parallelizes compare_models(), such that one core takes on one model class? Yes, this might change some of the UI elements, but it might also speed up training. I have some experience using Dask, and am happy to lend some time to make this happen, though I might also need some guidance through the codebase at some point.

Memmory related Crash

Dear All,

Thanks for the great library, I'm facing problems when trying to use pandas to import my csv which has 130 rows and 110 columns. It consumes all my RAM (which is 64GB, MacBook Pro 16") and crashes. Any ideas? Up till now, I was able to use it only with csv files with a limited number of columns .

Nikos

pycaret / pycaret Goto Github PK

pycaret's Introduction

An open-source, low-code machine learning library in Python

🎉🎉🎉 PyCaret 3.3 is now available. 🎉🎉🎉

pip install --upgrade pycaret

Docs • Tutorials • Blog • LinkedIn • YouTube • Slack

Welcome to PyCaret

🚀 Installation

🌐 Option 1: Install via PyPi

📄 Option 2: Build from Source

📦 Option 3: Docker

🏃‍♂️ Quickstart

1. Functional API

2. OOP API

📁 Modules

Classification

Regression

Time Series

Clustering

Anomaly Detection

👥 Who should use PyCaret?

🎮 Training on GPUs

🖥️ PyCaret Intel sklearnex support

🤝 Contributors

📝 License

ℹ️ More Information

pycaret's People

Stargazers

Watchers

Forkers

pycaret's Issues

The original error was: 'DLL load failed: The specified module could not be found.'

Logs:

Recommend Projects

Recommend Topics

Recommend Org

`pip install --upgrade pycaret`