View from downtown Annapolis, Maryland. Photo: Ozan Aygun

Retail Price Suggestion


Here I would like to present an end-to-end machine learning solution to a challenging analytics problem. We are going to have a deep dive into a fairly large dataset from an online retailer, which we will use to predict market value of distinct products. The data mainly consists of text, therefore we will apply a lot of Natural Language Processing (NLP) steps to implement a feature engineering pipeline. We will then use these features to train various machine learning algorithms to use in this regression problem. Note that I omitted data exploration steps in this write up to keep it short and focused on NLP and modeling steps. Keep in mind that the design of the pipeline described here required extensive exploratory analysis of the dataset, which was performed before implementing these steps. It is valuable resource, since it covers a lot of important aspects of predictive modeling concepts implemented by Python.

You can learn:

1. How to establish a Natural Language Processing (NLP) feature engineering pipeline using sklearn.

2. How to train various linear, kernel and tree-based models.

3. Hyperparameter optimization using GridSearchCV and RandomizedSearchCV.

4. How to use Deep Neural Networks in a regression problem.

5. How to develop and Ensemble model with improved predictive performance.