Tall ships parade, Castle Island, Boston. Photo: Ozan Aygun
Predicting Survival in Titanic: a deep dive for feature engineering and building classifiers
Finally I found some time to touch this historical data set. It is messier than I previously thought, and there are
lots of missing values for many passengers in the ship. This gives us a great opportunity to build
sensible missing value imputation models, engineer unexpected features and use them for predictive modeling.
Here I first explored the Titanic data set by using simple, parsimonious linear models, then performed feature engineering, missing value imputation
and dimension reduction. Finally, I trained classifiers by using random forest, support vector machines,
gradient boosted tree algorithms. I also explored stacking these classifiers and model tuning to
improve accuracy of the predictions of the final model to predict survival in Titanic.