This project is very classic in Kaggle. We used model training to find out the pattern for survivals in Titanic catastrophe
This report aims to predict if a passenger is survived or not in Titanic. It is actually a binary classification problem where given some features of a certain passenger, we are trying to classify this passenger as either survived or not. There is a training data where ground truth label is already offered, meaning we can know if a corresponding passenger is survived. The variables in this training dataset include PassengerID, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin and Embarked. However, not all these features have predictive ability and some of them are superior than others. Better feature allows for building a better model. Therefore, this report will first elaborate the feature engineering we have implemented, followed by different classification models we applied including support vector machine, logistic regression, decision tree, random forest, Naïve Bayes and two different neural network. In the third part, we present the construction of a dedicated random forest model and the ensemble we used to improve the accuracy based on our individual models, which renders us the final predictive accuracy of 0.83254.