Giter Club home page Giter Club logo

omidvheravi / ml_capstone_project Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 1.0 19.96 MB

Allstate insurance, the second largest personal lines insurer in the United States and the largest that is publicly held, approximately 16 million households. In this Project, through machine learning and data analsis techniques, I try to best predict which labels and columns are the best indicators for detecting the severity of an insurance claim.

Jupyter Notebook 100.00%
classification data-science machine-learning prediction-model

ml_capstone_project's Introduction

Machine Learning Nanodegree Capstone

Allstate Insurence Severity Detection

Project Overview

Allstate insurence, the second largest personal lines insurer in the United States and the largest that is publicly held, approximatly 16 million households. For this challenge, Allstate has provided vast dataset on its insurence claims, several dozen key attributes, thorugh which data analysis could help uncover the underlying pattern in detecting the severeness of an insurance claim. Data is provided in the form of categorical, continues format. This challenge, is a very great problem to tackle with machine learning. The automation of reviweing insurance claims will improves productivity and efficiency which will inevitably lead to happier customers and overall satisfaction.

Problem Statement

The challenge requires the data scientists to best predict the label column, or 'loss' as indicated in the data provided. Since the dataset contains both categorical and continious data along with several dozen attributes, and several hundread rows of insurence claim instances, the final algorithm will have to take these constraints into consideraiton. And as such, I belive that a mix of decision tree, supervised regression algorithm will best be able to solve this challenge. For example as refrence to some of the publicly avialable best performing kernels in the challenge page on kaggle can show that XGBoost, MLP, and such perform the best.

Challenge Roadmap

Understanding the Dataset

  • Data Shape
  • Skew
  • Mean, STD deviation, Max, etc.

Data Visualization

  • Scatter Plots
  • Correlation
  • Density Plots

Data Processing

  • Transformation
  • One-Hot encoding
  • Train/Test Data preparation

Model Evalaution

  • Algorithm implementations
  • Model Evaluations
  • Model Predicitons
  • Model Analysis

Appendix

  • Conclusion
  • Personal Thoughts
  • Thanks and Appreciations

Project Highlights

Check the report.md for in depth analysis of the challenge and it's highlights.

Libraries and Dependencies used

  • Sklearn
    • metrics
    • Linear Regression
    • GradientBoosting
    • DecisionTree
    • SGD
    • RandomForest
    • MLP
  • Numpy
  • Pandas
  • Matplotlib
  • Seaborn

Core Files

train.csv = train Data provided by Allstate Challenge. The original Shape of the data is in (188318, 131). This dataset also includes the 'loss' column by which we'll be completing the challenge.

test.csv = test data provided by the challenge. The original shape of the data is in (125546, 130). This datset however does not include the 'loss' column. also train set does not include as munch insurance isntance claims as much as the train dataset.

CapstoneProject.ipynb = This is the main notebook, it includes all the analysis, model building, and predicting.

Supplementary Refrences and Link

Kaggle Challnege: https://www.kaggle.com/c/allstate-claims-severity Refrences:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.