Giter Club home page Giter Club logo

credit_risk_analysis's Introduction

Credit_Risk_Analysis

Credit Risk Analysis Project Overview:

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans.

Using a credit card credit dataset and Python, several machine learning modules will be used to evaluate and predict credit risk.

Once these models have been completed, their performance will be evaluated and a written recommendation will be made on whether they should be used to predict credit risk

Technical used in this project to predict credit risk:

  • Oversample the data using the RandomOverSampler and SMOTE algorithms.
  • Undersample the data using the ClusterCentroids algorithm.
  • A combinatorial approach of over- and undersampling using the SMOTEENN algorithm.
  • Compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk.

Deliverables:

  • Deliverable 1: Use Resampling Models to Predict Credit Risk
  • Deliverable 2: Use the SMOTEENN Algorithm to Predict Credit Risk
  • Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk
  • Deliverable 4: A Written Report on the Credit Risk Analysis

Project Resources:

Data Sources:

  • LoanStats_2019Q1.cs

Software:

  • Jupyter Notebook 6.1.4
  • Python 3.8.5

Credit Risk Analysis Project Results:

Deliverable 1 Results: Use Resampling Models to Predict Credit Risk

Oversampling RandomOverSampler Model:

randomoversampler_balanced_accuracy

randomoversampler_confusion_matrix

randomoversampler_classification_report

  • Accuracy Score for the RandomOverSampler model is 63%
  • The precision for the high-risk is 1% and F1 score is 2%, which are not good enough to state that the model will be good at classifying.

SMOTE Oversampling Model:

smote_balanced_accuracy

randomoversampler_confusion_matrix

randomoversampler_classification_report

  • The accuracy score of the SMOTE model is a little bit better than the RandomOverSampler.
  • The precision for the high-risk is very low at 1%, indicating a large number of false positives, which indicates an unreliable classification.
  • The F1 score is 2% which also very low.

Undersampling ClusterCentroids Model:

undersampling_balanced_accuracy

undersampling_confusion_matrix

undersampling_classification_report

  • The 51% accuracy score of ClusterCentroids model performs poorly when compared to the RandomOverSampler and SMOTE models.
  • The precision (1%) and the F1 (1%) are still very low just like the RandomOverSampler and SMOTE models.
  • The ClusterCentroids model is not good at classifying fraudulent loan applications because the model's accuracy, 0.516, and F1 score are low.

Deliverable 2 Results: Use the SMOTEENN Algorithm to Predict Credit Risk

Combination Sampling SMOTEENN Model:

combo_balanced_accuracy

combo_confusion_matrix

combo_classification_report

  • We do see an increase accuracy score (63%) over the ClusterCentroids model (51%) but still about same as RandomOverSampler and SMOTE models accuracy scores.
  • The precision (1%) and the F1 (1%) are still very low for high-risk group, just like the RandomOverSampler, SMOTE and Clustercentroids models.

Deliverable 3 Results: Use Ensemble Classifiers to Predict Credit Risk

Balanced Random Forest Classifier Model:

balanced_random_forest_classification

  • The precision score for the high-risk has improved a bit (4%), but still indicates a large number of false positives, which indicates an unreliable positive classification.
  • The F1 score is still low (14) but improving.

Easy Ensemble AdaBoost Classifier Model:

Easy_Ensemble_AdaBoost_Classifier

  • The accuracy score of the EasyEnsembleClassifier model is a much improved 93% over all the other models.
  • The high-risk precision (7) and F1 score (14) have improved over all the other models.
  • The low-risk precision (1.0) and recall (94) and F1 (97) are the highest of all the models.

Credit Risk Analysis Project Summary:

Overview of the analysis: Explain the purpose of this analysis.

After creating and evaluating these machine learning models it's easy to see that all the models show poor precision when it comes to predicting if a credit risk is high.

It wasn't until the Ensemble models (Easy Ensemble AdaBoost Classifier especially) were used that there was an improvement in accuracy scores. The EasyEnsembleClassifier model did show strong recall for both high-risk(91%) and low-risk(94%).

Key takeaways from Predicting Credit Risk Project:

  • The performance of all the models showed very poor precision for accessing if a credit risk is high.
  • A majority of the models had accuracy score ranging between 51% - 65%
  • The Balanced Random Forest Classifier and Easy Ensemble

I would not recommend any use of these models to predict credit risk. Using these models could lead LendingClub to reject high-risk individuals when, in fact, they are low-risk..

credit_risk_analysis's People

Contributors

dsupps avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.