Auto-Insurance Risk Analysis

20-50 million people are injured in auto accidents globally with an accrued cost of $518 billion. Since the first insurance claim in 1897, car insurance companies have worked to cover the expenses incurred from these incidents - from property damage, medical expenses, lost wages, to legal fees. With these enormous expenses insurance companies are obligated to take any opportunity to optimize their solvency and minimize losses. A large part of this is appropriately pricing policies and coverage, meaning, to not undercharge for drivers that cause catastrophic, costly incidents and not overcharge for safe customers that might never make a claim and could easily switch providers.

The previous underwriting and risk evaluation process for insurance companies could take a month with all the manual steps involved with generating a car insurance claim. But with so many incidents and claims, insurance companies are already applying technologies such as machine learning to improve their processes. Today you can receive car insurance in an hour and the company has already assessed your risk while considering things like driving history, credit history, and prevalence of fraud or inclement weather in your location. Part of that automated risk assessment includes the consideration of the driver's car model and attributes.

The focus of this project will be to develop a binary classification model to better predict whether risk is associated with a car based on its attributes. This will aid in the risk assessment process to offer more individualized rates car insurance rates, promoting customer retention and company solvency.

Automobile Data Set (UCI source): This data from the UCI Machine Learning Repository was donated from Carnegie Mellon University and consists of 3 sources (1985 Ward's Automotive Yearbook, Insurance Services Personal Auto Manuals, and IIHS Insurance Collision Report). A paper was written in 1988 by UCI (Source) which used this data in their development of 'instance-based prediction' but did not reference the context of car insurance whatsoever. Similarly a Kaggle competition was held in 2020 with this data where users experimented with a range of predictive models but with a sole focus on model performance rather than deployment.

This establishes a need for our model, as the data has been used to train models but the results from the Kaggle competition are kept private along with the training test split to compare performances. The data has 205 records and 26 features (15 continuous, 1 integer, and 10 nominal), including the integer outcome ranging from -3 to +3.

Results:

(Refer to Notebook for detailed analysis)

Model	Recall Score	Accuracy Score	False Negative Rate	F-Beta Score	ROC AUC Score
Decision Tree	1.0	0.795	0.0	0.913	0.821
Random Forest (Bagging)	1.0	0.979	0.0	0.990	1.0
AdaBoost (Boosting)	0.857	0.918	0.142	0.873	0.982
GradientBoost (Boosting)	1.0	0.969	0.0	0.986	1.0
XGBoost (Boosting)	1.0	0.969	0.0	0.986	1.0
k-NN	0.380	0.571	0.619	0.399	0.606
Logistic Regression	1.0	0.428	0.0	0.789	0.840
Multi-Layer Perceptron	0.952	0.816	0.047	0.892	0.914

joe-arul / auto_risk Goto Github PK

auto_risk's Introduction

Auto-Insurance Risk Analysis

auto_risk's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent