View from Magstræde, Copenhagen, Denmark. Photo: Ozan Aygun

Predicting housing market in Iowa: extreme gradient boosting solution to a traditional regression problem


Today I feel like it is time again to tackle a regression problem - this time a traditional one! Let's get our feet wet by Ames housing data set to perform a fairly comprehensive analysis and predictive modeling to estimate house prices. Here, I have extensively explored the training set, then performed feature engineering, missing value imputation and feature selection using the training set. Finally, I trained both linear models such as lasso regularization, PCA regression, as well as more complex algorithms including gradient boosting, extreme gradient boosting, random forest, and support vector machines. I have obtained a model that predicts house sale prices fairly well, with a RMSE of 0.12717 obtained from the test data set. Feel free to fork the reproducible code from my GitHub page and improve the model with your solution!