Giter Club home page Giter Club logo

bank-customer-churn-prediction's Introduction

bank-customer-churn-prediction

A supervised machine learning project done with Kubeflow Pipeline

This repository store a simple ML classification project which I used to show how to leverage Kubeflow Pipeline to orchestrate a ML pipeline.

  • This project features a classification problem, where we need to predict whether a given bank customer is likely to close their bank account or not.

  • The EDA notebook shows explorative data analysis of the dataset. The dataset has both categorinal features and numerical features, and the target to be predicted is a binary label. Correlations between the features and the target as well as between the features themselves are analyzed in the EDA.

  • The pipeline is constructed with the following components:

    • Download: The dataset is stored oneline. We initiate the pipeline by downloading it from the url and store it in a volume that our pipeline can have access to (in this case, I stored it in my Google Cloud Storage bucket).

    • Train test split: We split the dataset into training and test datasets (8:2). The test dataset serves as a holdout set and will be used to evaluate our model performance in the end.

    • Data preprocessing: Since the dataset has both categorical and numerical features. Before feeding them to a model, we first one-hot encode the categorical features and standardize the numerical features to keep them on the same scale. The OneHotEncoder and StandardScaler are fit to the training dataset, and then saved so that we can use them to transform the test dataset later on.

    • Model training: We train 3 base line models on the preprocessed training data - Logistic Regression, K Nearest Neighbors, and Random Forests. We perform a cross validation with each model and use Grid Search to find the optimal hyperparameters for the model. We can visualized the selected evaluation metrics for each model

    • Make predictions on the test dataset: Finally, we use one of the three models that gives the best performance to predict the test dataset, and output evaluation metrics. But before making the predictions, don't forget to first preprocess the test dataset using the previously saved OneHotEncoder and StandardScaler.

The pipeline can be compiled with the command python make_pipeline.py. A notebook version is also included in the repository.

Use the pipeline

This pipeline can be generalized to any similar tabular dataset. The user-defined parameters are list below.

parameters

Graph picture of the pipeline

graph

Docker

Kubeflow Pipeline is run on Kubernetes and each component is run within an individual container. In this project, all components use the same base docker image. A simple Dockerfile is included in base_image_docker/.

bank-customer-churn-prediction's People

Contributors

yinanli617 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.