survival_prediction-titanic's Introduction

Survival Prediction - Titanic

Introduction

The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. That's why the name DieTanic. This is a very unforgetable disaster that no one in the world can forget.

It took about $7.5 million to build the Titanic and it sunk under the ocean due to collision. The Titanic Dataset is a very good dataset for beginners to start a journey in data science and participate in competitions in Kaggle.

The objective of this notebook is to give an idea how is the workflow of any predictive modeling problem. How do we analyze features, how do we add new features, how do we make the existing features useful for the model and some Machine Learning Concepts. I have tried to keep the notebook as basic as possible so that even newbies can understand every phase of it.

Objective

We will predict whether a passenger in the Titanic would survive or not based on the passenger's data (name, age, price of ticket, etc).

The process will be broken down into the following steps:

Exploratory Data Analysis

Feature Engineering

Define and train the ML models

Evaluate the performance of our trained models on the validation set

Select the best model and predict the targets for the test set

Dataset

The Titanic dataset is from the kaggle's "Titanic: Machine Learning from Disaster" competition.

Machine Learning Algorithm

Random Forest, XGBoost and Logistic Regression.

Validation Score

The best validation score among the models is given by the Random Forest model, i.e. 83%. On submission of the same to the kaggle competition, the model achieved a public score of 77.033.

Recommend Projects

fais-k / survival_prediction-titanic Goto Github PK