Giter Club home page Giter Club logo

hamoye_stagec's Introduction

Telco Customer Churn

Problem

Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs.

Dataset

In the dataset, each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

  • Customers who left within the last month – the column is called Churn

  • Services that each customer has signed up for :

    • phone,

    • multiple lines,

    • Internet,

    • online security,

    • online backup,

    • device protection,

    • tech support, and

    • streaming TV and movies

  • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges

  • Demographic info about customers – gender, age range, and if they have partners and dependents

Project Instructions

Preprocessing

  • Perform initial data preparation by converting the 'TotalCharges' column to numeric values and filling missing values with 0.
  • Convert the 'Churn' column to binary values, where 'No' is mapped to 0 and 'Yes' is mapped to 1.- - Split the data into an 80-20 train-test split with a random state of “1”. Select these features:
    categorical = ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService','OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies','Contract', 'PaperlessBilling', 'PaymentMethod'] numerical = ['tenure', 'MonthlyCharges', 'TotalCharges']

Feature engineering

  • The numerical features should be scaled using StandardScaler, convert the output back to a dataframe and put back the column names.
  • The categorical features are one-hot encoded using OneHotEncoder(set sparse_output to false), convert the output back to a dataframe and put back the column names.
  • Combine scaled numerical and one-hot encoded categorical features into train and test set dataframes (use pd.concat)
  • Use scikit learn to train a random forest and extra trees classifier, and use xgboost and lightgbm to train an extreme boosting model and a light gradient boosting model. Use random_state = 1 for training all models and evaluate on the test set. Answer the following questions:

Resource

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.