Fall foliage in Green Mountains, Vermont. Photo: Ozan Aygun
Neural Network Optimization using Dropout Regularization
Deep Learning has become the focus of many recent applications of Data Science. Thanks to
open-source libraries like Tensorflow and Keras, implementing predictive models
using deep neural networks become possible for everyone. The theory behind neural networks is
quite complex, and even focus of intense academic research.
The basic idea of using neural networks is automating feature engineering. This enables extraction of
many many features that are composed of interactions between features that are present in the
original data set. Some of these interactions are beyond the intuition of typical
human brain. Within each layer of the neural network, the arrays (also called Tensors) that contain these features
are processed through activation functions and exchanged. Networks learn incrementally by
specialized process called gradient descent, which updates weights associated with neurons through
feedback mechanism called backward propagation.
As the network updates its weights, it has the sole objective of optimizing the loss function,
which results in better predictive performance. However, just like any machine learning algorithm,
increasing model complexity to reduce loss function eventually causes overfitting to the training set.
Dropout regularization is an effective approach to fight against overfitting in deep
neural networks. This approach randomly drops neurons (at a specified dropout rate) for a
given layer of the network at each learning cycle. This results in removal of these neurons,
masking contribution of their weights to the final prediction. Nearby neurons are expected to
compensate for the impact of dropped weights to achieve the same loss in the cost function.
This process, when used properly, leads to a better generalization of the model and help reducing
the impact of overfitting.
Here you can learn a regularization strategy to optimize your deep neural networks:
1. Start with benchmark models: low, medium and high complexity. Develop expectations about your model and overfitting.
2. Perform regularization on your "medium complexity" network to monitor performance.
3. Tune-down regularization, slighly increase network complexity, by adding neurons and/or layers, observe overfitting.
4. Turn-on regularization once again, monitor any noticable boost in out-of-the-box network performance.