Giter Club home page Giter Club logo

autofeatselect's Introduction

Automated Feature Selection & Importance

PyPI License Downloads

autofeatselect is a python library that automates and accelerates feature selection processes for machine learning projects.

It helps to calculate feature importance scores & rankings with several methods and also helps to detect and remove highly correlated variables.

Installation

You can install from PyPI:

pip install autofeatselect

Key Features

autofeatselect offers a wide range of features to support feature selection and importance analysis:

  • Automated Feature Selection: Various automated feature selection methods, such as LGBM Importance, XGBoost Importance, RFECV so on.
  • Feature Importance Analysis: Calculation and visualization of feature importance scores for different algorithms seperately.
  • Correlation Analysis: Perform correlation analysis to identify and drop correlated features automatically.

Full List of Methods

Correlation Calculation Methods

  • Pearson, Spearman & Kendall Correlation Coefficients for Continuous Variables
  • Cramer's V Scores for Categorical Variables

Feature Selection Methods

  • LightGBM Feature Importance Scores
  • XGBoost Feature Importance Scores (with Target Encoding for Categorical Variables)
  • Random Forest Feature Importance Scores (with Target Encoding for Categorical Variables)
  • LassoCV Coefficients (with One Hot Encoding for Categorical Variables)
  • Permutation Importance Scores (LightGBM as the estimator)
  • RFECV Rankings (LightGBM as the estimator)
  • Boruta Rankings (Random Forest as the estimator)

Usage

  • Calculating Correlations & Detecting Highly Correlated Features
num_static_feats = ['x1', 'x2'] #Static features to be kept regardless of the correlation results.

corr_df_num, remove_list_num = CorrelationCalculator.numeric_correlations(X=X_train,
                                                                          features=num_feats, #List of continuous features
                                                                          static_features=num_static_feats,
                                                                          corr_method='pearson',
                                                                          threshold=0.9)

corr_df_cat, remove_list_cat = CorrelationCalculator.categorical_correlations(X=X_train,
                                                                              features=cat_feats, #List of categorical features
                                                                              static_features=None,
                                                                              threshold=0.9)
  • Calculating Single Feature Importance Score & Plot Results
#Create Feature Selection Object
feat_selector = FeatureSelector(modeling_type='classification', # 'classification' or 'regression'
                                X_train=X_train,
                                y_train=y_train,
                                X_test=None,
                                y_test=None,
                                numeric_columns=num_feats,
                                categorical_columns=cat_feats,
                                seed=24)

#Train LightGBM model & return importance results as pd.DataFrame 
lgbm_importance_df = feat_selector.lgbm_importance(hyperparam_dict=None,
                                                   objective=None,
                                                   return_plot=True)


#Apply RFECV with using LightGBM as the estimator & return importance results as pd.DataFrame 
lgbm_hyperparams = {'learning_rate': 0.01, 'max_depth': 6, 'n_estimators': 400,
                    'num_leaves': 30, 'random_state':24, 'importance_type':'gain'
                   }
rfecv_hyperparams = {'step':3, 'min_features_to_select':5, 'cv':5}

rfecv_importance_df = feat_selector.rfecv_importance(lgbm_hyperparams=lgbm_hyperparams,
                                                     rfecv_hyperparams=rfecv_hyperparams,
                                                     return_plot=False)
  • Calculating Multiple Feature Importance Scores
#Automated correlation analysis & applying multiple feature selection methods
feat_selector = AutoFeatureSelect(modeling_type='classification',
                                  X_train=X_train,
                                  y_train=y_train,
                                  X_test=X_test,
                                  y_test=y_test,
                                  numeric_columns=num_feats,
                                  categorical_columns=cat_feats,
                                  seed=24)

corr_features = feat_selector.calculate_correlated_features(static_features=None,
                                                            num_threshold=0.9,
                                                            cat_threshold=0.9)

feat_selector.drop_correlated_features()

final_importance_df = feat_selector.apply_feature_selection(selection_methods=['lgbm', 'xgb', 'perimp', 'rfecv', 'boruta'])

License

This project is completely free, open-source and licensed under the MIT license.

autofeatselect's People

Contributors

dorukcanga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.