0% found this document useful (0 votes)

19 views12 pages

RNNs and LSTMs: Deep Learning Insights

The document discusses Recurrent Neural Networks (RNNs) and their variants, including Bidirectional RNNs and Recursive Neural Networks, highlighting their structures and applications in processing sequential data. It also covers Long Short-Term Memory (LSTM) networks, which address long-term dependencies and the vanishing gradient problem through the use of gates and memory cells. The advantages of these neural network architectures are emphasized, particularly in tasks such as speech recognition, natural language processing, and image captioning.

Uploaded by

kpash4028

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views12 pages

RNNs and LSTMs: Deep Learning Insights

Uploaded by

kpash4028

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

BAI701- Deep Learning and Reinforcement Learning

Module – 4 Notes (Recurrent and Recursive Neural Networks)

Recurrent Neural Network:

Recurrent neural network (RNN) processes data sequences:
A Recurrent Neural Network (RNN) processes sequential data by using recurrent
connections that allow information to be passed from one time step to the next. The RNN
receives an input sequence 𝑥(1), 𝑥(2), … , 𝑥(𝜏)and produces a sequence of outputs
𝑜(1), 𝑜(2), … , 𝑜(𝜏).

Fig- Recurrent networks that produce an output at each time step and have recurrent
connections between hidden units, illustrated in figure. Next,

Figure : An RNN whose only recurrence is the feedback connection from the output
to the hidden layer.

1
Figure - Time-unfolded recurrent neural network with a single output at the end of the
sequence.
To handle sequences, the network is unfolded through time, forming a chain of identical
layers, each corresponding to one time step.
1. Parameter Sharing Across Time
The RNN uses the same set of parameters at every time step:
• Input-to-hidden weights: 𝑈
• Hidden-to-hidden recurrent weights: 𝑊
• Hidden-to-output weights: 𝑉
• Bias vectors: 𝑏and 𝑐

2
This shared structure allows the RNN to process sequences of any length with a fixed
number of parameters.
2. Forward Propagation Through Time

3
1. Computing Parameter Gradients:

4
10. Teacher forcing (training technique)

• During training, the true output 𝑦(𝑡)is fed into the next step instead of the predicted
output.
• Helps improve learning stability.

So,

Bidirectional RNNs:
Bidirectional RNN
A Bidirectional Recurrent Neural Network (BRNN) is a special type of RNN designed to use
information from both the past and the future of a sequence.
The text explains that:
Why BRNNs are needed:
• Traditional (causal) RNNs only capture information from past inputs 𝑥(1), … , 𝑥(𝑡 −
1)and the present input 𝑥(𝑡).

5
• But in many tasks (speech recognition, handwriting recognition), the correct output
at time t depends not only on the past but also on future inputs.
• Example: Understanding a phoneme in speech may require looking ahead at future
phonemes or even future words.
How a Bidirectional RNN works
• It uses two RNNs:
o A forward RNN moving from the start to end of the sequence, producing
hidden states ℎ(𝑡) .

o A backward RNN moving from the end to start, producing hidden states 𝑔(𝑡) .

• At each time step 𝑡:

o The forward RNN summarizes past information → ℎ(𝑡)
o The backward RNN summarizes future information → 𝑔(𝑡)
o The output unit 𝑜 (𝑡) combines both.
This allows the network to compute an output 𝑜 (𝑡) that is influenced by:
• Relevant past (via ℎ(𝑡) )
• Relevant future (via 𝑔(𝑡) )
No fixed-size context window is needed, unlike CNNs or feedforward networks.
Extended to images
• The idea can be extended to 2-D data (images) by using four RNNs moving:
o up, down, left, right
• This allows each output 𝑂𝑖,𝑗 to depend on both local and long-range image features.

6
Fig - Computation of a typical bidirectional recurrent neural network, meant to learn to map
input sequences x to target sequences y, with loss L(t) at each step t.
Figure Shows:

✔ Two RNNs (forward & backward)

✔ Their hidden states combining at each time step
✔ Output 𝑜 (𝑡) derived from both
✔ Loss calculated at each step
✔ The network learns to map input sequence 𝑥to output sequence 𝑦
The diagram shows a Bidirectional RNN unrolled through time.

7
Recursive Neural Networks.
Recursive Neural Networks
A Recursive Neural Network (RecNN) is a generalization of the recurrent neural network,
but instead of having a chain-like structure (like an RNN), it has a tree-structured
computational graph.

Fig - A recursive network has a computational graph that generalizes that of the recurrent
network from a chain to a tree.
→ The diagram represents how a recursive neural network computes the output from a
sequence using a tree structure.

Bottom Layer – Input Nodes

• The leaves are the input sequence:
𝑥 (1) , 𝑥 (2) , 𝑥 (3) , 𝑥 (4)
• Each input is transformed by weight matrix 𝑉.
Middle Layers – Tree Composition
• Pairs of inputs are merged to form intermediate nodes using:
o 𝑈for the left input
o 𝑊for the right input
• These intermediate nodes form the first internal layer of the tree.
• Then, these two nodes are again combined (using 𝑈and 𝑊) to form a higher-level
node.
Top Layer – Output
• The top internal node feeds into output unit 𝑜.
• The output 𝑜is compared with target 𝑦.
• The loss function 𝐿is computed using both 𝑜and 𝑦.
So,
• A recursive network builds the representation bottom-up.
• It repeatedly combines child nodes into parent nodes.
• Eventually, a single fixed-size vector (the root node) is produced.
• This vector is used to generate the final output 𝑜.

8
The computational structure is a deep tree, not a linear sequence.
• A sequence 𝑥 (1) , 𝑥 (2) , … , 𝑥 (𝜏) of variable length can be mapped to a single fixed-size
output 𝑜.
• This mapping is achieved using a fixed set of weight matrices:
o 𝑈
o 𝑉
o 𝑊
• Recursive networks were introduced by Pollack (1990).
• They have been successfully used in:
o Natural language processing (Socher et al.)
o Vision (Socher et al.)
o Learning structured data (Frasconi et al.)
o Reasoning (Bottou)

Advantages
• For a sequence of length 𝜏, the depth of computation reduces from
𝜏(in RNNs) → 𝑂(log⁡ 𝜏)in recursive nets.
• This reduces the problem of learning long-term dependencies.
• Tree structures can be chosen in different ways:
1. Fixed balanced binary tree (structure does not depend on data).
2. Tree from external methods, such as a parse tree of a sentence for NLP tasks.
3. Ideally, the model learns its own tree structure (open research problem).

Variants
• Some recursive nets attach inputs and targets to individual nodes of the tree.
• The computation at each node does not have to be simple affine + nonlinearity.
• More complex operations like tensor operations and bilinear forms may be used
(Socher et al., 2013)

Working principle of an LSTM network (with block diagram and equations)

Long Short-Term Memory (LSTM)
• Long Short-Term Memory (LSTM) networks are a special type of gated recurrent
neural network designed to handle long-term dependencies in sequence data.
• Unlike simple RNNs, LSTMs solve the vanishing and exploding gradient problem by
using gates and a memory cell that allows gradients to flow over long durations.
LSTM
• LSTMs create paths through time where derivatives neither vanish nor explode.
• They use gates to store, forget, and output information dynamically.
• The model learns when to remember old information and when to forget it.
• This is achieved through a self-loop in the cell state, controlled by a forget gate.

9
Fig - Block diagram of the LSTM recurrent network “cell.”

The LSTM cell contains:

• Input gate → decides how much new information enters the cell
• Forget gate → decides how much of past memory should be erased
• Output gate → decides how much of cell state becomes output
• State unit → has a linear self-loop to preserve long-term memory
• Input neuron → computes candidate information
• Delay block (black square) → stores previous time step values
Each gate uses a sigmoid activation (values between 0 and 1).
The cell state uses linear self-loop, allowing gradient flow over long durations.

10
Working Principle of LSTM

11
Advantages of LSTM:
• Learns long-term dependencies
• Avoids vanishing gradients via self-loops
• Learns when to remember and forget
• Performs well in tasks like handwriting recognition, speech, translation, image
captioning, parsing
So,
• LSTM networks extend RNNs by introducing gates and a cell state that allow them to
store, forget, and output information dynamically.
• The forget gate 𝑓(𝑡), input gate 𝑔(𝑡), and output gate 𝑞(𝑡)control the information
flow. The cell state has a linear self-loop, enabling long-term gradient flow.
• The LSTM is mathematically defined by the equations (10.40)–(10.44), which specify
the gating and state update mechanism.
• Because of this design, LSTMs can model long-term dependencies more effectively
than simple RNNs.

Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
83 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
144 pages
Unit 5 Sem 7 Deepnew
No ratings yet
Unit 5 Sem 7 Deepnew
52 pages
Understanding RNNs and ReNNs Basics
No ratings yet
Understanding RNNs and ReNNs Basics
32 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
33 pages
Overview of Recurrent Neural Networks
100% (1)
Overview of Recurrent Neural Networks
7 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
7 pages
Chapter 5
No ratings yet
Chapter 5
48 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
36 pages
Unfolding RNN Computational Graphs
No ratings yet
Unfolding RNN Computational Graphs
44 pages
Understanding RNN and LSTM Models
No ratings yet
Understanding RNN and LSTM Models
51 pages
Understanding RNNs and Their Applications
No ratings yet
Understanding RNNs and Their Applications
10 pages
Understanding RNNs and Their Variants
No ratings yet
Understanding RNNs and Their Variants
10 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
6 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
16 pages
Lecture 19 - Sequence Models For Text
No ratings yet
Lecture 19 - Sequence Models For Text
21 pages
RNNs for Time Series Prediction in Finance
100% (1)
RNNs for Time Series Prediction in Finance
35 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
51 pages
RNN Tutorial: Types and Applications
No ratings yet
RNN Tutorial: Types and Applications
24 pages
A Survey of Recursive and Recurrent Neural Networks
No ratings yet
A Survey of Recursive and Recurrent Neural Networks
96 pages
RNNs: Understanding Sequential Data
No ratings yet
RNNs: Understanding Sequential Data
30 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
21 pages
Sequence Modeling with RNNs and LSTMs
No ratings yet
Sequence Modeling with RNNs and LSTMs
8 pages
Unfolding Computational Graphs in RNNs
No ratings yet
Unfolding Computational Graphs in RNNs
17 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
20 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
8 pages
RNNs: Types, Applications, and LSTMs
No ratings yet
RNNs: Types, Applications, and LSTMs
19 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
54 pages
LSTM Overview and Applications
No ratings yet
LSTM Overview and Applications
72 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
13 pages
Recursive Neural Networks Explained
No ratings yet
Recursive Neural Networks Explained
16 pages
DL Module 5
No ratings yet
DL Module 5
7 pages
RNN
No ratings yet
RNN
9 pages
RNNs for Long Sequence Data Processing
100% (1)
RNNs for Long Sequence Data Processing
131 pages
Unit 5
No ratings yet
Unit 5
20 pages
Unit IV Notes deep learning notes PDF, neural networks notes, CNN RNN LSTM notes, backpropagation algorithm, gradient descent deep learning, AI deep learning notes, machine learning deep learning PDF, handwritten deep learning notes, engineering AI notes, semester exam preparation, ANN notes PDF
No ratings yet
Unit IV Notes deep learning notes PDF, neural networks notes, CNN RNN LSTM notes, backpropagation algorithm, gradient descent deep learning, AI deep learning notes, machine learning deep learning PDF, handwritten deep learning notes, engineering AI notes, semester exam preparation, ANN notes PDF
18 pages
Week - 19 (1) 3
No ratings yet
Week - 19 (1) 3
60 pages
RNN, LSTM, and GRU Overview
No ratings yet
RNN, LSTM, and GRU Overview
14 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
11 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
7 pages
Lecture6 RNN
No ratings yet
Lecture6 RNN
40 pages
RNN Unrolling and Training Insights
No ratings yet
RNN Unrolling and Training Insights
60 pages
DL Unit 4
No ratings yet
DL Unit 4
28 pages
RNNs: Unfolding Graphs & Applications
No ratings yet
RNNs: Unfolding Graphs & Applications
18 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
8 pages
Sequence Modeling with RNNs and LSTMs
No ratings yet
Sequence Modeling with RNNs and LSTMs
125 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
37 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
21 pages
Overview of Recurrent Neural Networks
No ratings yet
Overview of Recurrent Neural Networks
10 pages
Part One
No ratings yet
Part One
36 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
6 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
36 pages
RNNs and RvNNs: Structures and Applications
No ratings yet
RNNs and RvNNs: Structures and Applications
25 pages
Introduction au Machine Learning
0% (1)
Introduction au Machine Learning
204 pages
Understanding Machine Learning Concepts
100% (75)
Understanding Machine Learning Concepts
416 pages
Exercices Corrigés de Probabilité
100% (1)
Exercices Corrigés de Probabilité
15 pages
CCS355 Neural Networks Overview
No ratings yet
CCS355 Neural Networks Overview
29 pages
Understanding Support Vector Machines
100% (4)
Understanding Support Vector Machines
22 pages
Deep Learning MCQs for Exam Preparation
90% (73)
Deep Learning MCQs for Exam Preparation
34 pages
Image Processing MCQs for Students
88% (16)
Image Processing MCQs for Students
54 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
92% (52)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Machine Learning Course Notes PDF
83% (12)
Machine Learning Course Notes PDF
19 pages
Confusion Matrix and Metrics Quiz
No ratings yet
Confusion Matrix and Metrics Quiz
5 pages
Machine Learning Overview R23
100% (14)
Machine Learning Overview R23
32 pages
QCM sur le Machine Learning
100% (10)
QCM sur le Machine Learning
3 pages
Deep Learning With Python
100% (12)
Deep Learning With Python
396 pages
Software Engineering Overview and Concepts
100% (4)
Software Engineering Overview and Concepts
172 pages
Classification Metrics for Mixed Targets
100% (9)
Classification Metrics for Mixed Targets
114 pages
Overview of Machine Learning Concepts
73% (11)
Overview of Machine Learning Concepts
18 pages
Full Course of Machine Learning
100% (22)
Full Course of Machine Learning
660 pages
Digital Image Processing Fundamentals
No ratings yet
Digital Image Processing Fundamentals
65 pages
Digital Image Processing Exam Papers
100% (1)
Digital Image Processing Exam Papers
4 pages
Deep Learning Question Bank 18CS731
75% (4)
Deep Learning Question Bank 18CS731
5 pages
Deep Learning Exam Questions 2021-22
83% (6)
Deep Learning Exam Questions 2021-22
7 pages
Introduction to Graph Theory Concepts
100% (2)
Introduction to Graph Theory Concepts
92 pages
Deep Learning MCQs and Answers
100% (3)
Deep Learning MCQs and Answers
33 pages
Deep Learning with Python A-Z Guide
100% (10)
Deep Learning with Python A-Z Guide
799 pages
Python Python For Data Science and Machine Learning
100% (6)
Python Python For Data Science and Machine Learning
165 pages
Deep Learning Course Notes IF4071
100% (1)
Deep Learning Course Notes IF4071
189 pages
Cours Complet de Machine Learning
100% (2)
Cours Complet de Machine Learning
194 pages
Data Mining Midterm Exam 2021/2022
100% (2)
Data Mining Midterm Exam 2021/2022
4 pages
Machine Learning With Python
100% (19)
Machine Learning With Python
692 pages
Deep Learning: RNNs and LSTM Insights
No ratings yet
Deep Learning: RNNs and LSTM Insights
23 pages
C# Backpropagation for XOR Problem
No ratings yet
C# Backpropagation for XOR Problem
3 pages
LSTM vs DNN in Stock Market Forecasting
No ratings yet
LSTM vs DNN in Stock Market Forecasting
9 pages
Learnable Parameters in CNN Layers
No ratings yet
Learnable Parameters in CNN Layers
26 pages
AI-Driven Self-Driving Car Simulation
No ratings yet
AI-Driven Self-Driving Car Simulation
20 pages
Minimum Width for GELU Neural Network
No ratings yet
Minimum Width for GELU Neural Network
37 pages
Skin Disease Detection with Deep Learning
No ratings yet
Skin Disease Detection with Deep Learning
18 pages
Python Programs for Neural Network Tasks
No ratings yet
Python Programs for Neural Network Tasks
10 pages
Understanding Biological Neural Networks
No ratings yet
Understanding Biological Neural Networks
4 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
4 pages
Overview of ELU Activation Function
No ratings yet
Overview of ELU Activation Function
34 pages
Overview of Deep Learning Techniques
No ratings yet
Overview of Deep Learning Techniques
25 pages
Hopfield Networks and AI Foundations
No ratings yet
Hopfield Networks and AI Foundations
9 pages
Neural Network Pruning Techniques Guide
No ratings yet
Neural Network Pruning Techniques Guide
123 pages
WaveNet: Advanced Audio Generation
No ratings yet
WaveNet: Advanced Audio Generation
4 pages
A Review On Deep Learning-Based Structural Health Monitoring of Civil Infrastructures
No ratings yet
A Review On Deep Learning-Based Structural Health Monitoring of Civil Infrastructures
20 pages
Timeline of Transformer Models
No ratings yet
Timeline of Transformer Models
5 pages
FPGA XOR Neural Network Design
No ratings yet
FPGA XOR Neural Network Design
4 pages
Interactive Neural Network Playground
No ratings yet
Interactive Neural Network Playground
1 page
Deep Learning Perceptron Assignment
No ratings yet
Deep Learning Perceptron Assignment
45 pages
Deep Learning for Seizure Prediction
No ratings yet
Deep Learning for Seizure Prediction
10 pages
VLSI Design for Neuromorphic BCIs
No ratings yet
VLSI Design for Neuromorphic BCIs
1 page
Transfer Learning with VGG Architectures
No ratings yet
Transfer Learning with VGG Architectures
4 pages
Overview of Generative AI and LLMs
No ratings yet
Overview of Generative AI and LLMs
15 pages
Image Classification Using Convolutional Neural Networks
No ratings yet
Image Classification Using Convolutional Neural Networks
8 pages
MLP Implementation in PyTorch
No ratings yet
MLP Implementation in PyTorch
7 pages
AI, Social Media, and Future Trends
No ratings yet
AI, Social Media, and Future Trends
7 pages
Perceptrons and Neural Networks Overview
No ratings yet
Perceptrons and Neural Networks Overview
31 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
14 pages
Organ Detection in Surgical Videos Using CNN
No ratings yet
Organ Detection in Surgical Videos Using CNN
9 pages

RNNs and LSTMs: Deep Learning Insights

Uploaded by

RNNs and LSTMs: Deep Learning Insights

Uploaded by

BAI701- Deep Learning and Reinforcement Learning

Module – 4 Notes (Recurrent and Recursive Neural Networks)

Recurrent Neural Network:

• At each time step 𝑡:

✔ Two RNNs (forward & backward)

Bottom Layer – Input Nodes

Working principle of an LSTM network (with block diagram and equations)

The LSTM cell contains:

You might also like