Giter Club home page Giter Club logo

aidanabekboeva / loan-default-prediction-using-machine-learning Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.35 MB

This project employed multiple machine learning models including Logistic Regression, Decision Trees, Bagging, Random Forest, and Ada Boost, to predict loan sizes for small businesses. The dataset utilized is sourced from the U.S. SBA, an organization that supports and encourages small enterprises in the U.S. credit market

HTML 71.99% Jupyter Notebook 28.01%

loan-default-prediction-using-machine-learning's Introduction

Loan-Default-Prediction-using-Machine-Learning

This project employed multiple machine learning models including Logistic Regression, Decision Trees, Bagging, Random Forest, and Ada Boost, to predict loan sizes for small businesses. The dataset utilized is sourced from the U.S. SBA, an organization that supports and encourages small enterprises in the U.S. credit market

The problem arises from instances of small businesses or startups defaulting on SBA-guaranteed loans. Given the significant role small businesses play in job creation and economic growth, this project aims to create a predictive model to determine the likelihood of a loan being substantial or not. This predictive capability can guide lenders and policymakers in making well-informed choices, thus reducing the risk of loan defaults.

The proposed solution involves applying data mining techniques to construct machine learning models capable of forecasting whether a loan for a small business will be larger than $50,000. Several models, including logistic regression, decision trees, bagging, random forests, and ada boost, will be trained and evaluated. The best-performing model will be selected to ensure accurate and reliable predictions of loan outcomes.

Hyperparameter tuning through cross-validation and grid search will be employed to enhance the model's performance and generalizability to new data. Ultimately, the project aims to support the growth and success of small businesses, contributing to positive economic and social impacts.

The dataset contains 27 variables encompassing details about loans given by the SBA to small businesses. Information about borrowers, banks, loan amounts, loan status, and other pertinent factors is included in these variables.

The project utilized various machine learning models, including Logistic Regression, Decision Trees, Bagging, Random Forest, and Ada Boost, to predict loan sizes for small businesses. The results revealed that all models, except Logistic Regression, achieved a perfect Matthews Correlation Coefficient (MCC) of 1.000 and zero test error, accurately predicting loan defaults. However, the logistic regression model performed well with an MCC of 0.934 and a test error of 0.032. The ensemble models (Decision Trees, Bagging, Random Forest, and Ada Boost) achieved 100% accuracy, suggesting potential overfitting due to the small sample size.

While the ensemble models' accuracy is impressive, it may be a sign of overfitting. The logistic regression model, with an accuracy of 0.9818, offers a more realistic performance without overfitting concerns. Its simplicity and interpretability make it a suitable choice for practical applications and stakeholder communication.

In conclusion, selecting logistic regression over models with perfect accuracy is prudent. The project demonstrated that machine learning can effectively predict loan outcomes for small businesses, aiding lending decisions. It emphasizes the importance of considering both accuracy and model interpretability when addressing real-world business challenges.

loan-default-prediction-using-machine-learning's People

Contributors

aidanabekboeva avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.