Sequential Modelling: Recurrent and Recursive Nets
Recurrent and Recursive Nets:
Specialized for processing a sequence data Sharing parameters across different parts of a
model Time step index refer to the position in sequence.
Recurrent Neural Networks (RNNs) are a type of neural network designed to handle sequential data — like
time series, sentences, or any input where the order of elements matters.
Just like convolutional neural networks (CNNs) are specialized for images (grids of values), RNNs are
specialized for sequences. They’re particularly useful when processing sequences of varying lengths,
because they can reuse the same parameters (weights) at every time step. This property, called parameter
sharing, allows RNNs to generalize better and efficiently handle inputs of different lengths or structures.
Imagine trying to extract key information (like a year) from a sentence. That information might
appear at any position:
• “I went to Nepal in 2009.”
• “In 2009, I went to Nepal.”
A traditional feedforward neural network would need to learn different rules for each possible
position. In contrast, an RNN uses the same parameters at each step, making it much better at
recognizing repeated patterns regardless of their location in the sequence.
RNNs operate by processing one element of the sequence at a time and using information from
previous steps to influence the current output. This is achieved by connecting the output of one time
step as part of the input to the next, forming a loop in the computational graph. This loop allows the
network to "remember" what it has seen so far.
This gives RNNs a form of memory — they're not just seeing one word or value at a time, but also
taking into account what came before.
• Convolution over time: Similar to CNNs, some models (like time-delay neural networks)
apply convolution across sequences. These models also share parameters across time but
typically consider only nearby elements and are not as deep as RNNs.
• Depth of computation: Because RNNs use the same operation repeatedly over time
steps, they can be thought of as very deep networks — just stretched out over time.
• Not just about time: The "time steps" in RNNs don’t have to correspond to actual time.
They could just represent positions in any ordered sequence, such as characters in text
or frames in a video.
• RNNs can be trained on minibatches of sequences, each potentially with a different
length.
• The underlying structure of RNNs includes cycles in the computational graph,
allowing feedback connections over time.
• These cycles are what distinguish RNNs from feedforward networks and enable
them to process sequences in a way that's sensitive to order and context.
•
Unfolding Computational Graphs:
• A computational graph represents how computations (like mappings from inputs to
outputs) are structured. In recurrent or recursive systems, these graphs have repetitive
patterns due to their time-dependent nature.
• The concept of unfolding means transforming a recurrent graph into a non-recurrent
one (a directed acyclic graph, or DAG) by expanding it over time steps. This helps
visualize and compute the network more easily, while still sharing the same parameters
at each time step.
The system is defined recursively:
s (t) = f(s (t−1) ; θ)
where s (t) is the state at time t, and fis a function parameterized by θ.
By unfolding this for τ = 3time steps:
The recurrence is removed, showing explicit computation over time.
Figure 10.1 illustrates this unfolded graph: each node is a state s(t) , and each arrow applies
the same function f, with shared parameters θ.
input-Driven Dynamical System
• The model can also include external inputs x (t) :
s(t) = f(s(t−1) , x (t) ; θ)