Home → DL → Backpropagation
Reading
0%

1 · Computational Graphs

To describe back-propagation precisely, we use a computational graph where each node indicates a variable (scalar, vector, matrix, or tensor) and directed edges represent operations.

2 · The Chain Rule

Back-propagation is an algorithm that computes the chain rule with a specific order of operations that is highly efficient. In vector notation: \( abla_{\mathbf{x}} z = \left( rac{\partial \mathbf{y}}{\partial \mathbf{x}} ight)^ op abla_{\mathbf{y}} z \), where \( rac{\partial \mathbf{y}}{\partial \mathbf{x}} \) is the Jacobian matrix.

3 · The Back-Prop Procedure

🔍
Algorithm 6.2: Scalar Backprop

1. Run forward propagation to obtain activations.
2. Initialize grad_table[u_n] = 1.
3. For \( j = n-1 \) down to 1:
   grad_table[u_j] = \sum grad_table[u_i] (\partial u_i / \partial u_j)

4 · Symbol-to-Symbol Derivatives

Modern software implementations use the symbol-to-symbol approach. Instead of accessing numeric values directly, the algorithm adds nodes to the computational graph describing how to compute derivatives. This allows a generic graph evaluation engine to compute gradients for any specific values.

📚 See Also