Backpropagation Algorithms
Deep-dive into the mathematics of backpropagation, computational graphs, and symbol-to-symbol differentiation algorithms.
📋 6.5 · Back-Propagation Algorithms
1 · Computational Graphs
To describe back-propagation precisely, we use a computational graph where each node indicates a variable (scalar, vector, matrix, or tensor) and directed edges represent operations.
2 · The Chain Rule
Back-propagation is an algorithm that computes the chain rule with a specific order of operations that is highly efficient. In vector notation: \( abla_{\mathbf{x}} z = \left( rac{\partial \mathbf{y}}{\partial \mathbf{x}} ight)^ op abla_{\mathbf{y}} z \), where \( rac{\partial \mathbf{y}}{\partial \mathbf{x}} \) is the Jacobian matrix.
3 · The Back-Prop Procedure
1. Run forward propagation to obtain activations.
2. Initialize grad_table[u_n] = 1.
3. For \( j = n-1 \) down to 1:
grad_table[u_j] = \sum grad_table[u_i] (\partial u_i / \partial u_j)
4 · Symbol-to-Symbol Derivatives
Modern software implementations use the symbol-to-symbol approach. Instead of accessing numeric values directly, the algorithm adds nodes to the computational graph describing how to compute derivatives. This allows a generic graph evaluation engine to compute gradients for any specific values.