blobcity / autoai Goto Github PK

Python based framework for Automatic AI for Regression and Classification over numerical data. Performs model search, hyper-parameter tuning, and high-quality Jupyter Notebook code generation.

License: Apache License 2.0

Python 100.00%

automl codegen ai ml autoai machine-learning deep-learning python

autoai's People

Contributors

Stargazers

Watchers

autoai's Issues

YAML data fix

If User provide Data frame instead of file path to dataset. In Yaml add a key which represent the Data frame appropriately:

for example:

If the file path is specified:

      data_read:
             type: csv
             class: df
             file: 'something.csv'

If the Data frame provided it should be:
```
     data_read:
             class: df
```

file to refer:

       https://github.com/blobcity/autoai/blob/main/blobcity/main/driver.py
       https://github.com/blobcity/autoai/blob/main/blobcity/store/DictClass.py

Add SGDRegressor

Add SGDRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference SGDRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/SGDRegressor.ipynb

Offical API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an SGDRegressor as a potential best fit model.

Pandas DataFrame Support

files to refer:

      https://github.com/blobcity/autoai/blob/main/blobcity/blobcity.py  
      https://github.com/blobcity/autoai/blob/main/blobcity/utils/FileType.py

Currently, the main driver function train accepts file path as an argument to fetch dataset from user-specified location and identifies file type associated with the file.

Enhancement:
provide user a flexibility by providing support to accept pandas.Dataframe object has an argument to train function and must support other follow up functions inside driver function.

Progress Bar

Add a Python progress bar on the train function, to indicate to the user the current training progress.

model=bc.train("datasetpath","target")

File to refer : https://github.com/blobcity/autoai/blob/main/blobcity/main/driver.py

Example progress bars in Python: https://www.geeksforgeeks.org/progress-bars-in-python/

For accurate progress reporting, create an execution profile to estimate the total number of epochs/steps. Increment the process bar as each training epoch or step is completed.

The progress bar should display correctly in both terminal / command prompt execution, as well as when executing within a Jupyter Notebook.

Add BayesianRidge

Add BayesianRidge model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference BayesianRidge Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/BayesianRidgeRegression.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an BayesianRidge as a potential best fit model.

Reset DictClass.py Class Variable

file to refer: https://github.com/blobcity/autoai/blob/main/blobcity/store/DictClass.py
Reset or Clear data initialized/allotted to Class variables in DictClass.py on each call to driver function train

Estimated Time Calculation

Add functionality to showcase Estimated Time To Completion for the complete process of AutoAI to finish.
the ETC should be displayed when the user calls driver function train using the following code

     model=bc.train("dataset path", "target")

Functionality should consider intermediating function calls and specially model tuning process since it take more time compared to other intermediate process.

files to refer :

Add XGBoost Regressor

Add XGBoostRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference XGBoostRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/XGBoost/XGBoostRegressor.ipynb

Also Refer offical documentation for required parameters:
https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train a XGBoostRegressor as a potential best fit model.

Display Cross Validation Score

files to refer:

    https://github.com/blobcity/autoai/blob/main/blobcity/main/modelSelection.py
    https://github.com/blobcity/autoai/blob/main/blobcity/config/tuner.py

At the end of model selection and model parameter tuning add a log/print option to display cross validation score of the selected model.

Execution Instances Management

file to refer:

https://github.com/blobcity/autoai/blob/main/blobcity/store/DictClass.py

        YAML=dict()
        ObjectExist= False
        ObjectList=None
        def __int__(self):
            self.ObjectExist=False
            self.ObjectList=None

https://github.com/blobcity/autoai/blob/main/blobcity/blobcity.py

Reset following Class data in the DictClass.py file for each call of the train in blobcity.py function to avoid execution failure due to data transfer between function calls.

Add HuberRegressor

Add HuberRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference HuberRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/HuberRegressor.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an HuberRegressor as a potential best fit model.

Add AdaBoostRegressor

Add AdaBoostRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference AdaBoostRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Adaptive%20Boosting/AdaBoostRegressor.ipynb

Official API Refer for parameters:
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an AdaBoostRegressor as a potential best fit model.

Support custom metrics specification for model training

The framework currently optimises for greater accuracy. While accuracy is a widely used metric to assess the efficiency of training, it is not always desired. The framework should default to using accuracy as the training metric, but the user must be provided with a choice to use different optimisation.

Add support for the following optimisations that a user may specify.

Keep in mind that some parameters should be maximised while others should be minimised. An appropriate optimisation direction should be chosen respectively.

How can a user set the optimisation function

bc.optimiseFor("accuracy")

The input can be taken in text form and must be case insensitive. Alternate more elegant solutions for choosing the optimisation time are encouraged.

Text labels to be used for each: accuracy, precision, recall, f1score, roc, auc, mse and mae

Add RadiusNeighborsRegressor

Add RadiusNeighborsRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference RadiusNeighborsRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Radius%20Neighbors/RadiusNeighborsRegressor.ipynb

Offical API Reference for parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.RadiusNeighborsRegressor.html?highlight=radiusneig#sklearn.neighbors.RadiusNeighborsRegressor

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train a RadiusNeighborsRegressor as a potential best fit model.

Add BernoulliNB Classifier

Add BernoulliNB Classifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference BernoulliNB Classifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Naive%20Bayes/BernoulliNB.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a BernoulliNB Classifier as a potential best fit model.

Parallel Processing/Execution

File to refer: https://github.com/blobcity/autoai/blob/main/blobcity/modelSelection.py

      modelScore={m:cvScore(models[m][0](),X,Y,k) for m in best }

The following line of code sequentially trains different machine learning models from the dictionary object (variable: best ). Instead of this sequential execution of for loop, execute the above step in parallel considering all the CPU cores in the system.

Add RadiusNeighborsClassifier

Add RadiusNeighborsClassifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference RadiusNeighborsClassifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Radius%20Neighbors/RadiusNeighborsClassifier.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a RadiusNeighborsClassifier as a potential best fit model.

Fix YAML file storage path

file to refer : https://github.com/blobcity/autoai/blob/main/blobcity/utils/YamlGenerator.py
Currently after the module executes the generate YAMl file gets stored in API directory

       def writeYml(val):
            with open(r'./yml/Process.yaml', 'w') as file:
                yaml.dump(val, file,sort_keys=False)

Change the path of storage to working directory

Add SGDClassifier

Add SGDClassifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference SGDClassifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Linear%20Models/SGDClassifier.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train an SGDClassifier as a potential best fit model.

Add Perceptron Classifier

Add Simple Perceptron Classifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference Perceptron Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Linear%20Models/Perceptron.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a perceptron as a potential best fit model.

Add Lars

Add Lars model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference Lars Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/Lars.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lars.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an Lars as a potential best fit model.

Add NearestCentroid Classifier

Add NearestCentroid Classifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference NearestCentroid Classifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Nearest%20Centroid/NearestCentroidClassifier.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a NearestCentroid Classifier as a potential best fit model.

Add CatBoostClassifier

Add CatBoostClassifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference CatBoostClassifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/CatBoost/CatBoostClassifier.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a CatBoostClassifier as a potential best fit model.

Setup Jekyll docs

Use the Readme + Contributing guide to create necessary project documentation. Host the documentation at /docs within the repository. Use the default Jekyll template that is set up.

Confusion Matrix

Add support to print a Confusion Matrix for Classification type of problems.

Example Use

model = bc.train("classification_data.csv", "target_column")
model.confusionMatrix()

The matrix should be displayed as a matplotlib chart.

Error Conditions

Calling the confusionMatrix() function for a Regression problem must throw an error stating Confusion matrix is available only for Classification problems

files to refer:

Add AdaBoostClassifier

Add AdaBoostClassifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference AdaBoostClassifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Adaptive%20Boosting/AdaBoostClassifier.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train an AdaBoostClassifier as a potential best fit model.

Add ElasticNet

Add ElasticNet model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference ElasticNet Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/ElasticNet.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an ElasticNet as a potential best fit model.

Add HistGradientBoostingClassifier

Add HistGradientBoostingClassifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference HistGradientBoostingClassifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Histogram-Based%20Gradient%20Boosting%20Trees/HistGradientBoostingClassifier.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train an HistGradientBoostingClassifier as a potential best fit model.

Add GammaRegressor

Add GammaRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference GammaRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/GammaRegressor.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.GammaRegressor.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an GammaRegressor as a potential best fit model.

Add CategoricalNB Classifier

Add CategoricalNB Classifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference CategoricalNB Classifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Naive%20Bayes/CategoricalNB.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a CategoricalNB Classifier as a potential best fit model.

Add RidgeClassifier

Add RidgeClassifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference RidgeClassifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Linear%20Models/RidgeClassifier.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train an RidgeClassifier as a potential best fit model.

Add PassiveAggressiveRegressor

Add PassiveAggressiveRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference PassiveAggressiveRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/PassiveAggressiveRegressor.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an PassiveAggressiveRegressor as a potential best fit model.

Add Lasso

Add Lasso model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference Lasso Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/LassoRegression.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an Lasso as a potential best fit model.

Add MultinomialNB Classifier

Add MultinomialNB Classifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference MultinomialNB Classifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Naive%20Bayes/MultinomialNB.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a MultinomialNB Classifier as a potential best fit model.

Add LassoLars

Add LassoLars model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference LassoLars Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/LassoLars.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoLars.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an LassoLars as a potential best fit model.

Error loading CSV from URL

bc.train(file_path='https://cdn.blobcity.com/sample-data/TATASTEEL.csv', target='Close')

The above line when executed fails to load the DataFrame from the CSV hosted at the URL.

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-12-741a56493e07> in <module>
----> 1 bc.train('https://cdn.blobcity.com/sample-data/TATASTEEL.csv', 'Close')

/opt/conda/lib/python3.8/site-packages/blobcity/main/driver.py in train(file_path, target, features)
     34     dc.resetVar()
     35     #data read
---> 36     if file_path!=None:
     37         dataframe= getDataFrameType(file_path, dc)
     38     else:

/opt/conda/lib/python3.8/site-packages/blobcity/utils/FileType.py in getDataFrameType(file_path, dc)
     35     if(extension==".csv"):
     36         Types = "csv"
---> 37         df=pd.read_csv(file_path)
     38     elif extension==".xlsx":
     39         Types = "xlsx"

/opt/conda/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    584     kwds.update(kwds_defaults)
    585 
--> 586     return _read(filepath_or_buffer, kwds)
    587 
    588 

/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds)
    480 
    481     # Create the parser.
--> 482     parser = TextFileReader(filepath_or_buffer, **kwds)
    483 
    484     if chunksize or iterator:

/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py in __init__(self, f, engine, **kwds)
    809             self.options["has_index_names"] = kwds["has_index_names"]
    810 
--> 811         self._engine = self._make_engine(self.engine)
    812 
    813     def close(self):

/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py in _make_engine(self, engine)
   1038             )
   1039         # error: Too many arguments for "ParserBase"
-> 1040         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1041 
   1042     def _failover_to_python(self):

/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py in __init__(self, src, **kwds)
     49 
     50         # open handles
---> 51         self._open_handles(src, kwds)
     52         assert self.handles is not None
     53 

/opt/conda/lib/python3.8/site-packages/pandas/io/parsers/base_parser.py in _open_handles(self, src, kwds)
    220         Let the readers open IOHandles after they are done with their potential raises.
    221         """
--> 222         self.handles = get_handle(
    223             src,
    224             "r",

/opt/conda/lib/python3.8/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    607 
    608     # open URLs
--> 609     ioargs = _get_filepath_or_buffer(
    610         path_or_buf,
    611         encoding=encoding,

/opt/conda/lib/python3.8/site-packages/pandas/io/common.py in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    310         # assuming storage_options is to be interpreted as headers
    311         req_info = urllib.request.Request(filepath_or_buffer, headers=storage_options)
--> 312         with urlopen(req_info) as req:
    313             content_encoding = req.headers.get("Content-Encoding", None)
    314             if content_encoding == "gzip":

/opt/conda/lib/python3.8/site-packages/pandas/io/common.py in urlopen(*args, **kwargs)
    210     import urllib.request
    211 
--> 212     return urllib.request.urlopen(*args, **kwargs)
    213 
    214 

/opt/conda/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/opt/conda/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/opt/conda/lib/python3.8/urllib/request.py in http_response(self, request, response)
    638         # request was successfully received, understood, and accepted.
    639         if not (200 <= code < 300):
--> 640             response = self.parent.error(
    641                 'http', request, response, code, msg, hdrs)
    642 

/opt/conda/lib/python3.8/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

/opt/conda/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    500         for handler in handlers:
    501             func = getattr(handler, meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result

/opt/conda/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

Add LightGBMClassifier

Add LightGBMClassifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference LightGBMClassifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/LightGBM/LGBMClassifier.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a LightGBMClassifier as a potential best fit model.

Documentation

Having a requirements.txt for installed packages and a README.md with all the basic info with regards to the project will be appreciated.

P.S :- will be happy to work on it if someone can give me the basic gist of the Project

Fix Import Statement

Currently import statement to import the library is:

import blobcity.blobcity as bc

Change this statement to :

import blobcity as bc

And fix the driver module driver.py files path associate to import fix.

Add ARDRegressor

Add ARDRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference ARDRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/ARDRegressor.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ARDRegressor.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an ARDRegressor as a potential best fit model.

Metric Statics

Add statistical metric for train model

Metric to utilize in following problem statements:

Regression:
- R2
- MSE
- MAE
- RMSE
Classification:
- Precision
- Recall
- F1-Score

Enhancement:

Add a function to Model Class named stats , which returns above mentioned metric associated to problem type.

file to ref:

   https://github.com/blobcity/autoai/blob/main/blobcity/store/Model.py
   https://github.com/blobcity/autoai/blob/main/blobcity/config/tuner.py
   https://github.com/blobcity/autoai/blob/main/blobcity/main/modelSelection.py

calculate appropriate metric on the basis problem type for selected model along with the tuning parameters on train and test split with 80:20 ratio. And store the resulting data in dictionary data structure in Model Class. when stats function is called report/print all the stored stats from the dictionary.

Too Many Values to Unpack Error

When training on a large dataset (more than 5000 rows), the train function throws a ValueError: too many values to unpack (expected 3)

Example data to use to reproduce the error: https://cdn.blobcity.com/sample-data/TATASTEEL.csv

Code

import blobcity as bc
bc.train('./TATASTEEL.csv', 'Close')

Output / Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-386bcdfc69c5> in <module>
----> 1 bc.train('./TATASTEEL.csv', 'Close')

/opt/conda/lib/python3.8/site-packages/blobcity/main/driver.py in train(file_path, target, features)
     41 
     42     if(features==None):
---> 43         featureList=AFS.FeatureSelection(dataframe,target,dc)
     44         CleanedDF=dataCleaner(dataframe,featureList,target,dc)
     45     else:

/opt/conda/lib/python3.8/site-packages/blobcity/main/modelSelection.py in modelSearch(dataframe, target, DictClass)
    112     modelsList=classifier_config().models if ptype=="Classification" else regressor_config().models
    113     if dataframe.shape[0]>500:
--> 114         best=trainOnFull(dataframe,target,modelsList,trainOnSample(dataframe,target,modelsList,DictClass),DictClass)
    115     else:
    116         best=trainOnFull(dataframe,target,modelsList,modelsList,DictClass)

/opt/conda/lib/python3.8/site-packages/blobcity/main/modelSelection.py in trainOnSample(dataframe, target, models, DictClass)
     74     X,Y=df.drop(target,axis=1),df[target]
     75     k=getKFold(X)
---> 76     modelScore={m:cvScore(models[m][0](),X,Y,k) for m in models }
     77     return dict(itertools.islice(sortScore(modelScore).items(), 5))
     78 

/opt/conda/lib/python3.8/site-packages/blobcity/main/modelSelection.py in <dictcomp>(.0)
     74     X,Y=df.drop(target,axis=1),df[target]
     75     k=getKFold(X)
---> 76     modelScore={m:cvScore(models[m][0](),X,Y,k) for m in models }
     77     return dict(itertools.islice(sortScore(modelScore).items(), 5))
     78 

/opt/conda/lib/python3.8/site-packages/blobcity/main/modelSelection.py in cvScore(model, X, Y, k)
     47     function get above mentioned argument and uses cross_val_score to calculate average accuracy on specified kfolds
     48     """
---> 49     accuracy = cross_val_score(model, X, Y, cv = k,n_jobs=-1)
     50     return accuracy.mean()
     51 

/opt/conda/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                 for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:])
     71             ]
---> 72             args_msg = ", ".join(args_msg)
     73             warnings.warn(
     74                 f"Pass {args_msg} as keyword args. From version "

/opt/conda/lib/python3.8/site-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
    399         The data to fit. Can be for example a list, or an array.
    400 
--> 401     y : array-like of shape (n_samples,) or (n_samples, n_outputs), \
    402             default=None
    403         The target variable to try to predict in the case of

/opt/conda/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                 for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:])
     71             ]
---> 72             args_msg = ", ".join(args_msg)
     73             warnings.warn(
     74                 f"Pass {args_msg} as keyword args. From version "

/opt/conda/lib/python3.8/site-packages/sklearn/model_selection/_validation.py in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score)
    254 
    255     cv = check_cv(cv, y, classifier=is_classifier(estimator))
--> 256 
    257     if callable(scoring):
    258         scorers = scoring

ValueError: too many values to unpack (expected 3)

Add LightGBMRegressor

Add LightGBMRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference LightGBMRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/LightGBM/LGBMRegressor.ipynb

Also Refer offical documentation to select appropriate parameters:
https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train a LightGBMRegressor as a potential best fit model.

Add CatBoost Regressor

Add CatBoostRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference CatBoostRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/CatBoost/CatBoostRegressor.ipynb

Also Refer offical documentation for required parameters:
https://catboost.ai/en/docs/concepts/python-reference_catboostregressor

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train a CatBoostRegressor as a potential best fit model.

Add GaussianNB Classifier

Add GaussianNB Classifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference GaussianNB Classifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/Naive%20Bayes/GaussianNB.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a GaussianNB Classifier as a potential best fit model.

Save and Load a Trained Model

Add functionality to save and load a trained model by using pickle or Keras function for appropriate model type, i.e. either Sci-kit learn model or Tensorflow model .

file to refer : https://github.com/blobcity/autoai/blob/main/blobcity/store/Model.py

Add function save and load in the Model.py file
Required functionality for save function:

takes single string argument, which is path to save model, along with filename for example:
```
     save("path/filename")   
```
If path not specified in argument, utilize default path and default name to save in working directory. And print/log of absolute path for the file.
identify type of model saving strategy to use, i.e. whether use pickle or Keras to save model.

Required functionality for load function:

takes single string argument, which is path of saved model file:

      load("path/filename.pkl")   or  load("path/filename.h5")

on the basis of file extension use appropriate loading strategy either pickle or Keras load functions
return the model.

Add HistGradientBoostingRegressor

Add HistGradientBoostingRegressorinto the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference HistGradientBoostingRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Histogram-Based%20Gradient%20Boosting%20Trees/HistGradientBoostingRegressor.ipynb

Official API Refer for parameters:
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#sklearn.ensemble.HistGradientBoostingRegressor

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an HistGradientBoostingRegressor as a potential best fit model.

Regression / Classification problem type identification improvement

Refers to file: https://github.com/blobcity/autoai/blob/main/blobcity/utils/ProblemType.py

target_length =len(np.unique(data))
            if data.dtype in ['int','float'] and target_length<=100: 
                return dict({'type':'Classification'})
            else: 
                return dict({'type':'Regression'})

The above code is not the best way to differentiate between regression and classification.

Change logic to cardinality off column against length of column. If cardinality of column is greater than or equal to 50% of length, then consider as Regression. If cardinality is less than 50% of length, then consider as Classification.

Add PoissonRegressor

Add PoissonRegressor model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/regressor_config.py

Reference PoissonRegressor Implementation:
https://github.com/blobcity/ai-seed/blob/main/Regression/Linear%20Models/PoissonRegressor.ipynb

Official API Reference for Parameter:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PoissonRegressor.html

Dependencies if any, must be appropriately added. Test run of the train function on a regression problem must pass, and the function must attempt to train an PoissonRegressor as a potential best fit model.

Add XGBoostClassifier

Add XGBoostClassifier model into the library.

Primary File to Change: https://github.com/blobcity/autoai/blob/main/blobcity/config/classifier_config.py

Reference XGBoostClassifier Implementation:
https://github.com/blobcity/ai-seed/blob/main/Classification/XGBoost/XGBoostClassifier.ipynb

Dependencies if any, must be appropriately added. Test run of the train function on a classification problem must pass, and the function must attempt to train a XGBoostClassifier as a potential best fit model.

blobcity / autoai Goto Github PK

autoai's People

Contributors

Stargazers

Watchers

Forkers

autoai's Issues

How can a user set the optimisation function

Example Use

Error Conditions

Code

Output / Error

Recommend Projects

Recommend Topics

Recommend Org