Tall ships parade, Castle Island, Boston. Photo: Ozan Aygun

Predicting Survival in Titanic: a deep dive for feature engineering and building classifiers


Finally I found some time to touch this historical data set. It is messier than I previously thought, and there are lots of missing values for many passengers in the ship. This gives us a great opportunity to build sensible missing value imputation models, engineer unexpected features and use them for predictive modeling. Here I first explored the Titanic data set by using simple, parsimonious linear models, then performed feature engineering, missing value imputation and dimension reduction. Finally, I trained classifiers by using random forest, support vector machines, gradient boosted tree algorithms. I also explored stacking these classifiers and model tuning to improve accuracy of the predictions of the final model to predict survival in Titanic.