Giter Club home page Giter Club logo

packt_featureengineering_cookbook's Introduction

Python Feature Engineering Cookbook - Code Repository

Python 3.6 License

Published January 22nd, 2020

Paperback: 372 pages
Publisher: Packt Publishing
Language: English ISBN: 9781789806311

Links

Table of Contents and Recipes

  1. Foreseeing Variable problems in building ML models

    1. Identifying numerical and categorical variables
    2. Quantifying missing data
    3. Determining cardinality in categorical variables
    4. Pinpointing rare categories in categorical variables
    5. Identifying a linear relationship
    6. Identifying normal distributions
    7. Distinguishing variable distribution
    8. Highlighting Outliers
    9. Comparing feature magnitude
  2. Missing data imputation

    1. Removing observations with missing data
    2. Performing mean or median imputation
    3. Implementing mode or frequent category imputation
    4. Replacing missing values by an arbitrary number
    5. Capturing missing values in a bespoke category
    6. Replacing missing values by a value at the end of the distribution
    7. Implementing random sample imputation
    8. Adding a missing value indicator variable
    9. Performing multivariate imputation by chained equations, MICE
    10. Assembling an imputation pipeline with Scikit-learn
    11. Assembling an imputation pipeline with feature-engine
  3. Encoding Categorical Variables

    1. Creating binary variables through One Hot Encoding
    2. Performing One hot encoding of frequent categories
    3. Replacing categories by ordinal numbers
    4. Replacing categories by counts or frequency of observations
    5. Encoding with integers in an ordered manner
    6. Encoding with the mean of the target
    7. Encoding with the Weight of evidence
    8. Grouping rare or infrequent categories
    9. Performing Binary encoding
    10. Performing Feature hashing
  4. Transforming Numerical Variables

    1. Transforming variables with the logarithm
    2. Transforming variables with the reciprocal function
    3. Using square and cube root to transform variables
    4. Using power transformations on numerical variables
    5. Performing Box-Cox transformation on numerical variables
    6. Carrying out Yeo-Johnson transformation on numerical variables
  5. Performing Variable Discretisation

    1. Dividing the variable in intervals of equal width
    2. Sorting the variable values in intervals of equal frequency
    3. Performing discretization followed by categorical encoding
    4. Allocating the variable values in arbitrary intervals
    5. Performing discretization with k-means
    6. Using decision trees for discretization
  6. Working with Outliers

    1. Trimming outliers from the data set
    2. Performing Winsorization
    3. Capping the variable at arbitrary maximum and minimum values
    4. Performing zero-coding โ€“ capping the variable at zero
  7. Deriving features from Dates and time variables

    1. Extracting date and time parts from datetime variable
    2. Deriving representations of year and month
    3. Creating representations of day and week
    4. Extracting time parts from a time variable
    5. Capturing elapsed time between datetime variables
    6. Working with time in different timezones
  8. Performing Feature Scaling

    1. Standardization the features
    2. Performing Mean Normalisation
    3. Scaling to the maximum and minimum values
    4. Implementing maximum absolute scaling
    5. Scaling with the median and quantiles
    6. Scaling to vector unit length
  9. Applying Mathematical Computations to Features

    1. Combining multiple features with statistical operations
    2. Combining pairs of features with mathematical functions
    3. Performing polynomial expansion
    4. Deriving new features with decision trees
    5. Carrying out Principal Component Analysis
  10. Creating Features from Time Series and Transactional Data

    1. Aggregating transactions with mathematical operations
    2. Aggregating transactions in a time window
    3. Determining number of local maxima and minima
    4. Deriving time elapsed between time-stamped events
    5. Creating features from transactions with Featuretools
  11. Extracting features from text variables

    1. Counting characters, words and vocabulary
    2. Estimating text complexity by counting sentences
    3. Creating features with Bag of words and ngrams
    4. Implementing term frequency-inverse document frequency
    5. Cleaning and stemming text variables

packt_featureengineering_cookbook's People

Contributors

sebastianof avatar solegalli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

packt_featureengineering_cookbook's Issues

Incompatibility between jedi version required by spyder and pinned jedi version.

ERROR: spyder 4.0.1 has requirement jedi==0.14.1, but you'll have jedi 0.15.2 which is incompatible.

I think the quickest solution is to remove jedi from the requirements, unless you do not need the specific version required by this library.
Less quick solution is to use pip-compile.

Let me know which one do you prefer, I can send a PR!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.