Sequence Modeling: RNNs - Python Learning Hub

📋 10 · Sequence Modeling: RNNs

1. Unfolding Computational Graphs
2. Parameter Sharing across Time
3. RNN Forward Propagation
4. Back-Propagation Through Time (BPTT)
5. Teacher Forcing & Open-Loop

1 · Unfolding Graphs

Unfolding is the operation that maps a circuit with recurrent connections to a computational graph with repeated pieces—one per time step. This allows the model to handle variable-length histories using a fixed-size input transition function.

3 · RNN Forward Propagation

Assuming a hyperbolic tangent activation and softmax output, the standard RNN update equations are:

\( \mathbf{a}^{(t)} = \mathbf{b} + \mathbf{W}\mathbf{h}^{(t-1)} + \mathbf{U}\mathbf{x}^{(t)} \)
\( \mathbf{h}^{(t)} = anh(\mathbf{a}^{(t)}) \)
\( \mathbf{o}^{(t)} = \mathbf{c} + \mathbf{V}\mathbf{h}^{(t)} \)
\( \hat{\mathbf{y}}^{(t)} = ext{softmax}(\mathbf{o}^{(t)}) \)

4 · BPTT Algorithm

⏳

Back-Propagation Through Time

Computing the gradient involves a forward pass (left to right) followed by a backward pass (right to left). The runtime is \( O( au) \) where \( au \) is the sequence length. This cannot be parallelized because the graph is inherently sequential.

5 · Teacher Forcing

Teacher forcing is a training technique where the ground truth output \( \mathbf{y}^{(t)} \) is fed as input at time \( t+1 \), rather than the model's own (potentially noisy) output. This decouples time steps and allows for parallelization during training.

📚 See Also

Spatial Hierarchy Attention

← PreviousCNN Next →Transformers