Giter Club home page Giter Club logo

feature-engineering-handbook's Introduction

Feature-Engineering-Handbook

Welcome! This repo provides an interactive and complete practical feature engineering tutorial in Jupyter Notebook. It contains three parts: Data Prepocessing, Feature Selection and Dimension Reduction. Each part is demonstrated separately in one notebook. Since some feature selection algorithms such as Simulated Annealing and Genetic Algorithm lack complete implementation in python, we also provide corresponding python scripts (Simulated Annealing, Genetic Algorithm) and cover them in our tutorial for your reference.

Brief Introduction

Table of Content

  • 1  Data Prepocessing
    • 1.1  Static Continuous Variables
      • 1.1.1  Discretization
        • 1.1.1.1  Binarization
        • 1.1.1.2  Binning
      • 1.1.2  Scaling
        • 1.1.2.1  Stardard Scaling (Z-score standardization)
        • 1.1.2.2  MinMaxScaler (Scale to range)
        • 1.1.2.3  RobustScaler (Anti-outliers scaling)
        • 1.1.2.4  Power Transform (Non-linear transformation)
      • 1.1.3  Normalization
      • 1.1.4  Imputation of missing values
        • 1.1.4.1  Univariate feature imputation
        • 1.1.4.2  Multivariate feature imputation
        • 1.1.4.3  Marking imputed values
      • 1.1.5  Feature Transformation
        • 1.1.5.1  Polynomial Transformation
        • 1.1.5.2  Custom Transformation
    • 1.2  Static Categorical Variables
      • 1.2.1  Ordinal Encoding
      • 1.2.2  One-hot Encoding
      • 1.2.3  Hashing Encoding
      • 1.2.4  Helmert Coding
      • 1.2.5  Sum (Deviation) Coding
      • 1.2.6  Target Encoding
      • 1.2.7  M-estimate Encoding
      • 1.2.8  James-Stein Encoder
      • 1.2.9  Weight of Evidence Encoder
      • 1.2.10  Leave One Out Encoder
      • 1.2.11  Catboost Encoder
    • 1.3  Time Series Variables
      • 1.3.1  Time Series Categorical Features
      • 1.3.2  Time Series Continuous Features
      • 1.3.3  Implementation
        • 1.3.3.1  Create EntitySet
        • 1.3.3.2  Set up cut-time
        • 1.3.3.3  Auto Feature Engineering
  • 2  Feature Selection
    • 2.1  Filter Methods
      • 2.1.1  Univariate Filter Methods
        • 2.1.1.1  Variance Threshold
        • 2.1.1.2  Pearson Correlation (regression problem)
        • 2.1.1.3  Distance Correlation (regression problem)
        • 2.1.1.4  F-Score (regression problem)
        • 2.1.1.5  Mutual Information (regression problem)
        • 2.1.1.6  Chi-squared Statistics (classification problem)
        • 2.1.1.7  F-Score (classification problem)
        • 2.1.1.8  Mutual Information (classification problem)
      • 2.1.2  Multivariate Filter Methods
        • 2.1.2.1  Max-Relevance Min-Redundancy (mRMR)
        • 2.1.2.2  Correlation-based Feature Selection (CFS)
        • 2.1.2.3  Fast Correlation-based Filter (FCBF)
        • 2.1.2.4  ReliefF
        • 2.1.2.5  Spectral Feature Selection (SPEC)
    • 2.2  Wrapper Methods
      • 2.2.1  Deterministic Algorithms
        • 2.2.1.1  Recursive Feature Elimination (SBS)
      • 2.2.2  Randomized Algorithms
        • 2.2.2.1  Simulated Annealing (SA)
        • 2.2.2.2  Genetic Algorithm (GA)
    • 2.3  Embedded Methods
      • 2.3.1  Regulization Based Methods
        • 2.3.1.1  Lasso Regression (Linear Regression with L1 Norm)
        • 2.3.1.2  Logistic Regression (with L1 Norm)
        • 2.3.1.3  LinearSVR/ LinearSVC
      • 2.3.2  Tree Based Methods
  • 3  Dimension Reduction
    • 3.1  Unsupervised Methods
      • 3.1.1  PCA (Principal Components Analysis)
    • 3.2  Supervised Methods
      • 3.2.1  LDA (Linear Discriminant Analysis)

Reference

References have been included in each Jupyter Notebook.

Author

@Yingxiang Chen
@Zihan Yang

Contact

If there are any mistakes, please feel free to reach out and correct us!

Yingxiang Chen E-mail: [email protected]
Zihan Yang E-mai: [email protected]

feature-engineering-handbook's People

Contributors

yc-coder-chen avatar echoyang48 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.