Deep Feedforward Networks
Formal textbook analysis of MLP architectures, forward propagation, and the role of depth in function approximation.
📋 6 · Deep Feedforward Networks
1 · Deep Feedforward Networks
Deep feedforward networks, also called feedforward neural networks or multi-layer perceptrons (MLPs), are the quintessential deep learning models. The goal is to approximate some function \( f^* \). A feedforward network defines a mapping \( \mathbf{y} = f(\mathbf{x}; heta) \) and learns the value of parameters \( heta \) that result in the best function approximation.
2 · Forward Propagation Algorithm
During training, information flows forward through the network. The inputs \( \mathbf{x} \) provide the initial information that then propagates up to the hidden units at each layer and finally produces \( \hat{\mathbf{y}} \). This is called forward propagation.
For each layer \( i = n_i + 1, \dots, n \):
1. \( \mathbf{a}^{(i)} = \mathbf{b}^{(i)} + \mathbf{W}^{(i)}\mathbf{h}^{(i-1)} \)
2. \( \mathbf{h}^{(i)} = f^{(i)}(\mathbf{a}^{(i)}) \)
4 · Architecture Design
Deeper models tend to perform better. This is not merely because the model is larger; increasing the number of parameters without increasing depth is not nearly as effective. Shallow models overfit at around 20 million parameters while deep ones can benefit from over 60 million.
5 · Hidden Units (ReLU)
Modern feedforward networks use ReLU defined by \( g(z) = \max\{0, z\} \). They are easy to optimize because they are very similar to linear units. Variations include Leaky ReLU, PReLU, and Maxout units.