Giter Club home page Giter Club logo

titanic-survival-prediction's Introduction

Kaggle titanic analysis

Jupyter notebook in Python to analyse titanic passenger data from Kaggle.

Project Summary

Situation
Titanic sank after colliding with an iceberg on 15th April 1912, resulting in the death of 1502 out of 2224 passengers and crew. Although there was a luck element to survive in this incident, it seems that some groups of people were more likely to survive than others. This project aims to find key elements related to survived groups and predict who can survive in the Titanic shipwreck.

Action and Goal
Using the passenger data provided by Kaggle, exploratory data analysis (EDA) and model deployment are implemented by using Python and Jupyter notebook. The goal is to build a predictive model that answers the question: what kind of people/groups is more likely to survive? Since the output from the model is expected to be binary (1: survived or 0: not survived), a binary model is needed to be built by exploring classification algorithms such as decision tree, random forest, support vector machine (SVM) and a neural network. The deployed model is evaluated by cross-validation and the result from the model is submitted to the Kaggle leaderboard.

Data

"Titanic passenger dataset" in Kaggle is used. This dataset is publically available (https://www.kaggle.com/c/titanic/data).

Methodology

Exploratory Data Analysis (EDA) / Data enginnering

Initial exploration of the training data and test data has been done on Jupyter notebook (TitanicSurvivalPrediction_EDA_Model.ipynb) using pandas and pandas_profiling. The pandas_plofiling generates ProfileReport describe statistics (data type, missing values, histogram, etc...) of the data on an interactive HTML report. To visualise the detail of the data, Matplotlib, Plotly and Seaborn are used. At this stage, data imputation is also implemented.

Classification model

To classify passengers into survived and not survived, binary models are built using (Linear regression, Logistic Regression, Extra Trees, Random Forest, Gradient Boosting, SVM, Neural Network). To build the models, scikit-learn and Keras were used. The results of the classifications were evaluated using k-fold cross-validation.

Results

The best result among the models with an accuracy of 84% using Random Forest achieved a submission position within the top 3% of the Kaggle Submission Leaderboard (~33,000 teams). The analysis is shown in the notebook (TitanicSurvivalPrediction_EDA_Model.ipynb)

titanic-survival-prediction's People

Contributors

shotashirai avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.