What is regularization?
Regularization is a technique used to reduce the complexity of a model, thereby preventing overfitting. There are three common types of regularization used in Deep Neural Networks (DNN):
L2 Regularization: We define the complexity of a model by the sum of the squares of its weights: $W = w_0^2 + w_1^2 + … + w_n^2$. We add this term to the loss function to obtain:
$L(\text{data}, \text{model}) = \text{loss}(\text{data}, \text{model}) + \lambda \sum w_i^2$
We then aim to minimize this total loss. As the derivative of $W$ with respect to each weight $w_i$ is $2w_i$, backpropagation reduces the weights by penalizing larger values, effectively “decaying” them.
L1 Regularization: This is similar to L2 regularization, but $W$ is defined as the sum of the absolute values of the weights:
$W = \sum |w_i|$
The derivative of $W$ with respect to $w_i$ is a constant ($\pm 1$) this time, so weights can be reduced exactly to zero, unlike in L2 regularization. This often leads to sparse models.
Dropout: Unlike the previous two methods, dropout is implemented as a layer within the neural network rather than a modification to the loss function.
A dropout layer randomly sets a subset of activations to zero during training. For example, a dropout layer with a rate of 0.3 will randomly deactivate 30% of the neurons in that layer for each training step.