Giter Club home page Giter Club logo

credit_risk_analysis's Introduction

Credit_Risk_Analysis

Overview of the Analysis

Our client would like us to use machine learning to predict credit risk for peer-to-peer loans. For the analysis we created models using six different methods and ran statistical results on the models to help us determine if there is a model that works better to help predict credit risk.

The analysis was conducted using Python and the following components:

  • Pandas
  • Scikit-learn
  • Imbalanced-learn

Results

The initial data was preprocessed by performing the following steps:

  • the null columns where all values are null were dropped
  • the null rows were dropped
  • the loan status of "Issued" was removed
  • the interest rate was converted to a numerical data type
  • the target column was classified at low risk or high risk based on the how late the loan was
  • the training variables were created from string data types to numerical data types using the get dummies method
  • the target variables were created

After the initial preprocessing of the data was performed six models were created for different methods. The different methods used were:

  • Oversampling
  • Undersampling
  • Combination (Over and Under) Sampling
  • Ensemble Learning

Native Random Oversampling

The first model run for the Oversampling method was a Native Random Oversampling model.

The balanced accuracy score for this model is shown below:

NRO_balance

The classification report showing precision and recall scores for this model is shown below: NRO_classification

SMOTE Oversampling

The second model run for the Oversampling method was a SMOTE Oversampling model.

The balanced accuracy score for this model is shown below:

SMOTE_balance

The classification report showing precision and recall scores for this model is shown below: SMOTE_classification

Undersampling

The third model run was an Undersampling model.

The balanced accuracy score for this model is shown below:

Under_balance

The classification report showing precision and recall scores for this model is shown below: Under_classification

Combination (Over and Under) Sampling

The forth model run was the SMOTEENN model of the Comination (Over and Under) Sampling method.

The balanced accuracy score for this model is shown below:

SMOTEENN_balance

The classification report showing precision and recall scores for this model is shown below: SMOTEENN_classification

Balanced Random Forest Classifier

The fifth model model run was the Balanced Random Forest Classifier model of the Ensemble Classifiers method.

The balanced accuracy score for this model is shown below:

BRF_balance

The classification report showing precision and recall scores for this model is shown below: BRF_classification

Easy Ensemble AdaBoost Classifier

The final model run was also from the Ensemble Classifiers method and it was the Easy Ensemble AdaBoost Classifier.

The balanced accuracy score for this model is shown below:

EEABC_balance

The classification report showing precision and recall scores for this model is shown below: EEABC_classification

Summary

The different models provided different results and many of them were quite similar, however the Ensemble Classifiers had a much higher balance accuracy score. Looking at the results of the models the recommended model to use would be the Easy Ensemble AdaBoost Classifier model. In addition to a high balanced accuracy score this model also had precision and recall scores that were substantiall higher than for other models.

credit_risk_analysis's People

Contributors

kkoehn8 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.