0% found this document useful (0 votes)
19 views12 pages

RNNs and LSTMs: Deep Learning Insights

The document discusses Recurrent Neural Networks (RNNs) and their variants, including Bidirectional RNNs and Recursive Neural Networks, highlighting their structures and applications in processing sequential data. It also covers Long Short-Term Memory (LSTM) networks, which address long-term dependencies and the vanishing gradient problem through the use of gates and memory cells. The advantages of these neural network architectures are emphasized, particularly in tasks such as speech recognition, natural language processing, and image captioning.

Uploaded by

kpash4028
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

RNNs and LSTMs: Deep Learning Insights

The document discusses Recurrent Neural Networks (RNNs) and their variants, including Bidirectional RNNs and Recursive Neural Networks, highlighting their structures and applications in processing sequential data. It also covers Long Short-Term Memory (LSTM) networks, which address long-term dependencies and the vanishing gradient problem through the use of gates and memory cells. The advantages of these neural network architectures are emphasized, particularly in tasks such as speech recognition, natural language processing, and image captioning.

Uploaded by

kpash4028
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BAI701- Deep Learning and Reinforcement Learning

Module – 4 Notes (Recurrent and Recursive Neural Networks)

Recurrent Neural Network:


Recurrent neural network (RNN) processes data sequences:
A Recurrent Neural Network (RNN) processes sequential data by using recurrent
connections that allow information to be passed from one time step to the next. The RNN
receives an input sequence 𝑥(1), 𝑥(2), … , 𝑥(𝜏)and produces a sequence of outputs
𝑜(1), 𝑜(2), … , 𝑜(𝜏).

Fig- Recurrent networks that produce an output at each time step and have recurrent
connections between hidden units, illustrated in figure. Next,

Figure : An RNN whose only recurrence is the feedback connection from the output
to the hidden layer.

1
Figure - Time-unfolded recurrent neural network with a single output at the end of the
sequence.
To handle sequences, the network is unfolded through time, forming a chain of identical
layers, each corresponding to one time step.
1. Parameter Sharing Across Time
The RNN uses the same set of parameters at every time step:
• Input-to-hidden weights: 𝑈
• Hidden-to-hidden recurrent weights: 𝑊
• Hidden-to-output weights: 𝑉
• Bias vectors: 𝑏and 𝑐

2
This shared structure allows the RNN to process sequences of any length with a fixed
number of parameters.
2. Forward Propagation Through Time

3
1. Computing Parameter Gradients:

4
10. Teacher forcing (training technique)

• During training, the true output 𝑦(𝑡)is fed into the next step instead of the predicted
output.
• Helps improve learning stability.

So,

Bidirectional RNNs:
Bidirectional RNN
A Bidirectional Recurrent Neural Network (BRNN) is a special type of RNN designed to use
information from both the past and the future of a sequence.
The text explains that:
Why BRNNs are needed:
• Traditional (causal) RNNs only capture information from past inputs 𝑥(1), … , 𝑥(𝑡 −
1)and the present input 𝑥(𝑡).

5
• But in many tasks (speech recognition, handwriting recognition), the correct output
at time t depends not only on the past but also on future inputs.
• Example: Understanding a phoneme in speech may require looking ahead at future
phonemes or even future words.
How a Bidirectional RNN works
• It uses two RNNs:
o A forward RNN moving from the start to end of the sequence, producing
hidden states ℎ(𝑡) .

o A backward RNN moving from the end to start, producing hidden states 𝑔(𝑡) .

• At each time step 𝑡:


o The forward RNN summarizes past information → ℎ(𝑡)
o The backward RNN summarizes future information → 𝑔(𝑡)
o The output unit 𝑜 (𝑡) combines both.
This allows the network to compute an output 𝑜 (𝑡) that is influenced by:
• Relevant past (via ℎ(𝑡) )
• Relevant future (via 𝑔(𝑡) )
No fixed-size context window is needed, unlike CNNs or feedforward networks.
Extended to images
• The idea can be extended to 2-D data (images) by using four RNNs moving:
o up, down, left, right
• This allows each output 𝑂𝑖,𝑗 to depend on both local and long-range image features.

6
Fig - Computation of a typical bidirectional recurrent neural network, meant to learn to map
input sequences x to target sequences y, with loss L(t) at each step t.
Figure Shows:

✔ Two RNNs (forward & backward)


✔ Their hidden states combining at each time step
✔ Output 𝑜 (𝑡) derived from both
✔ Loss calculated at each step
✔ The network learns to map input sequence 𝑥to output sequence 𝑦
The diagram shows a Bidirectional RNN unrolled through time.

7
Recursive Neural Networks.
Recursive Neural Networks
A Recursive Neural Network (RecNN) is a generalization of the recurrent neural network,
but instead of having a chain-like structure (like an RNN), it has a tree-structured
computational graph.

Fig - A recursive network has a computational graph that generalizes that of the recurrent
network from a chain to a tree.
→ The diagram represents how a recursive neural network computes the output from a
sequence using a tree structure.

Bottom Layer – Input Nodes


• The leaves are the input sequence:
𝑥 (1) , 𝑥 (2) , 𝑥 (3) , 𝑥 (4)
• Each input is transformed by weight matrix 𝑉.
Middle Layers – Tree Composition
• Pairs of inputs are merged to form intermediate nodes using:
o 𝑈for the left input
o 𝑊for the right input
• These intermediate nodes form the first internal layer of the tree.
• Then, these two nodes are again combined (using 𝑈and 𝑊) to form a higher-level
node.
Top Layer – Output
• The top internal node feeds into output unit 𝑜.
• The output 𝑜is compared with target 𝑦.
• The loss function 𝐿is computed using both 𝑜and 𝑦.
So,
• A recursive network builds the representation bottom-up.
• It repeatedly combines child nodes into parent nodes.
• Eventually, a single fixed-size vector (the root node) is produced.
• This vector is used to generate the final output 𝑜.

8
The computational structure is a deep tree, not a linear sequence.
• A sequence 𝑥 (1) , 𝑥 (2) , … , 𝑥 (𝜏) of variable length can be mapped to a single fixed-size
output 𝑜.
• This mapping is achieved using a fixed set of weight matrices:
o 𝑈
o 𝑉
o 𝑊
• Recursive networks were introduced by Pollack (1990).
• They have been successfully used in:
o Natural language processing (Socher et al.)
o Vision (Socher et al.)
o Learning structured data (Frasconi et al.)
o Reasoning (Bottou)

Advantages
• For a sequence of length 𝜏, the depth of computation reduces from
𝜏(in RNNs) → 𝑂(log⁡ 𝜏)in recursive nets.
• This reduces the problem of learning long-term dependencies.
• Tree structures can be chosen in different ways:
1. Fixed balanced binary tree (structure does not depend on data).
2. Tree from external methods, such as a parse tree of a sentence for NLP tasks.
3. Ideally, the model learns its own tree structure (open research problem).

Variants
• Some recursive nets attach inputs and targets to individual nodes of the tree.
• The computation at each node does not have to be simple affine + nonlinearity.
• More complex operations like tensor operations and bilinear forms may be used
(Socher et al., 2013)

Working principle of an LSTM network (with block diagram and equations)


Long Short-Term Memory (LSTM)
• Long Short-Term Memory (LSTM) networks are a special type of gated recurrent
neural network designed to handle long-term dependencies in sequence data.
• Unlike simple RNNs, LSTMs solve the vanishing and exploding gradient problem by
using gates and a memory cell that allows gradients to flow over long durations.
LSTM
• LSTMs create paths through time where derivatives neither vanish nor explode.
• They use gates to store, forget, and output information dynamically.
• The model learns when to remember old information and when to forget it.
• This is achieved through a self-loop in the cell state, controlled by a forget gate.

9
Fig - Block diagram of the LSTM recurrent network “cell.”

The LSTM cell contains:


• Input gate → decides how much new information enters the cell
• Forget gate → decides how much of past memory should be erased
• Output gate → decides how much of cell state becomes output
• State unit → has a linear self-loop to preserve long-term memory
• Input neuron → computes candidate information
• Delay block (black square) → stores previous time step values
Each gate uses a sigmoid activation (values between 0 and 1).
The cell state uses linear self-loop, allowing gradient flow over long durations.

10
Working Principle of LSTM

11
Advantages of LSTM:
• Learns long-term dependencies
• Avoids vanishing gradients via self-loops
• Learns when to remember and forget
• Performs well in tasks like handwriting recognition, speech, translation, image
captioning, parsing
So,
• LSTM networks extend RNNs by introducing gates and a cell state that allow them to
store, forget, and output information dynamically.
• The forget gate 𝑓(𝑡), input gate 𝑔(𝑡), and output gate 𝑞(𝑡)control the information
flow. The cell state has a linear self-loop, enabling long-term gradient flow.
• The LSTM is mathematically defined by the equations (10.40)–(10.44), which specify
the gating and state update mechanism.
• Because of this design, LSTMs can model long-term dependencies more effectively
than simple RNNs.

12

You might also like