0% found this document useful (0 votes)

3 views48 pages

Chapter 5

deep learning

Uploaded by

22301011ankit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views48 pages

Chapter 5

deep learning

Uploaded by

22301011ankit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module # 5

Recurrent Neural Network

CSDC8011.5
Analyze and compare different types of Recurrent Neural
Networks (RNNs) to select appropriate models for sequential
data applications.
What is Sequence Learning Problem?
► In all of the networks that we have covered so far(Fully Connected
Neural Network(FCNN), Convolutional Neural Network(CNN)):
►
► the output at any time step is independent of the previous layer
input/output

► the input was always of the fixed-length/size

Sequence Learning Problem
► In “Sequence Learning Problems”, the “two properties of FCNN and
CNNs do not hold”

► The output at any timestep depends on previous input/output

► The length of the input is not fixed.

► Let’s consider the case of Auto-completion. Say the user types in the
alphabet ‘d’, and the model tries to predict the next character

► CNN models wont work….!!!

● Data comes in sequence form

○ Text (sentences)

○ Time-series (stock prices)

○ Speech signals

● Problem:

○ Traditional NN cannot remember past inputs

● Need:
Model that uses previous information
Unfolding Computational Graphs
A computational graph is a directed graph where:

● Nodes represent variables or intermediate values

● Edges represent operations that transform these values

The beauty of computational graphs is that they allow us to break down complex functions
into simple operations, making it straightforward to compute derivatives using the chain
rule.

► We can unfold a recursive or recurrent computation into a computational graph that

has a repetitive structure
• Corresponding to a chain of events

► Unfolding this graph results in sharing of parameters across a deep network structure
Unfolding Computational Graph

● RNN can be unrolled over time

● Each time step = one layer

👉 Example:
x₁ → h₁ → y₁
x₂ → h₂ → y₂
x₃ → h₃ → y₃
● Helps in training and visualization
When training neural networks, we need to compute
gradients of a loss function with respect to model
parameters.

For a network with thousands or millions of

parameters, computing these derivatives manually
would be impractical.

Deep learning frameworks solve this problem using

computational graphs combined with automatic
differentiation.
The real power of computational graphs becomes apparent when we scale to deep
networks with many layers:

x → Block 1 → x₁ → Block 2 → x₂ → … → Block n → xₙ → Final Block → z

Forward Pass
During the forward pass, we compute the output by sequentially applying each
computational block:

● Start with input x

● Apply Block 1 to get x₁
● Apply Block 2 to get x₂
● Continue through all blocks
● Get final output z

Backward Pass
For the backward pass, we compute gradients iteratively using the chain rule
Dynamic vs Static Graphs

PyTorch uses dynamic computational graphs (built on-the-fly during

execution), while older versions of TensorFlow used static graphs

(defined before execution).

Dynamic graphs are more flexible and intuitive, which is one reason

for PyTorch’s popularity.

What is RNN?

● RNN = Neural Network with memory

● Uses previous output as input

● Handles sequential data

Example:
● Predict next word in sentence

● “I am going to ___”
Recurrent Neural Network
► Recurrent Neural Network(RNN) is a type of Neural Network where
the output from the previous step is fed as input to the current step.

► In traditional neural networks, all the inputs and outputs are

independent of each other, but in cases when it is required to predict
the next word of a sentence, the previous words are required and
hence there is a need to remember the previous words.

► Thus RNN came into existence, which solved this issue with the help
of a Hidden Layer.
Recurrent Neural Network
► The main and most important feature of RNN is its Hidden state,
which remembers some information about a sequence.

► The state is also referred to as Memory State since it remembers the

previous input to the network.

► It uses the same parameters for each input as it performs the same
task on all the inputs or hidden layers to produce the output.

► This reduces the complexity of parameters, unlike other neural

networks.
RNN Working

● Input → Hidden State → Output

● Hidden state stores past information

● Same weights used at every step

Key idea:
● Loop (feedback connection)
Architecture of Recurrent Neural Network
► RNNs have the same input and output architecture as any other deep
neural architecture.

► However, differences arise in the way information flows from input to

output.

► Unlike Deep neural networks where we have different weight

matrices for each Dense network, in RNN, the weight across the
network remains the same.

► It calculates state hidden state Hi for every input Xi .

How RNN works
► The Recurrent Neural Network consists of multiple fixed activation
function units, one for each time step.

► Each unit has an internal state which is called the hidden state of the
unit.

► This hidden state signifies the past knowledge that the network
currently holds at a given time step.

► This hidden state is updated at every time step to signify the change
in the knowledge of the network about the past.

► The hidden state is updated using the following recurrence relation:-

How RNN works
► The formula for calculating the current state:

► where:

► ht -> current state

► ht-1 -> previous state

► xt -> input state

How RNN works
► Formula for applying Activation function(tanh):

► where:

► whh -> weight at recurrent neuron

► wxh -> weight at input neuron

How RNN works
► The formula for calculating output:

► Yt -> output

► Why -> weight at output layer

► These parameters are updated using Backpropagation.

► However, since RNN works on sequential data here we use an updated

backpropagation which is known as Backpropagation through time.
Back Propagation in Time
► In RNN the neural network is in an ordered fashion and since in the
ordered network each variable is computed one at a time in a
specified order like first h1 then h2 then h3 so on.

► Hence we will apply backpropagation throughout all these hidden

time states sequentially.
Back Propagation in Time
► L(θ)(loss function) depends on h3

► h3 in turn depends on h2 and W

► h2 in turn depends on h1 and W

► h1 in turn depends on h0 and W

► where h0 is a constant starting state.

Backpropagation Through Time (BPTT)
● Training method for RNN

● Errors are propagated back through time steps

Steps:
1. Forward pass

2. Calculate error

3. Backward pass through all time steps

Need for bidirectionality

► In speech recognition, the correct interpretation of the

current sound may depend on the next few phonemes
because of coarticulation and the next few words
because of linguistic dependencies
► Also true of handwriting recognition
A birectional RNN

Combine an RNN that moves forward through time from the start of
the sequence
Another RNN that moves backward through time beginning from
the end of the sequence
A bidirectional RNN consists of two RNNs which are stacked on the
top of each other.
The one that processes the input in its original order and the one
that processes the reversed input sequence.
The output is then computed based on the hidden state of both
RNNs.
► A typical bidirectional RNN Maps
input sequences x to target
sequences y with loss L(t) at each
step t h recurrence propagates to
the right g recurrence propagates to
the left.
► This allows output units o(t) to
compute a representation that
depends both the past and the
future
Bidirectional RNN
● Uses:

○ Forward sequence

○ Backward sequence

Advantage:
● Uses past + future context

Example:
● Understanding sentence meaning
► Exploding and vanishing gradient problems during
backpropagation.

► Gradients are those values which to update neural networks

weights. In other words, we can say that Gradient carries
information.

►
Vanishing Gradient Problem
● Gradients become very small

● Model stops learning long-term dependencies

Problem:
Cannot remember old information

● Vanishing gradient is a big problem in deep neural networks.

it vanishes or explodes quickly in earlier layers and this
makes RNN unable to hold information of longer sequence.
and thus RNN becomes short-term memory.

● If we apply RNN for a paragraph RNN may leave out necessary

information due to gradient problems and not be able to carry
information from the initial time step to later time steps.
Exploding Gradient Problem
● Gradients become very large

● Model becomes unstable

Solution:
● Gradient clipping
► The reason for exploding gradient was the capturing of
relevant and irrelevant information. a model which can
decide what information from a paragraph and relevant
and remember only relevant information and throw all the
irrelevant information
► This is achieved by using gates. the LSTM ( Long -short-term
memory ) and GRU ( Gated Recurrent Unit ) have gates as
an internal mechanism, which control what information to
keep and what information to throw out. By doing this
LSTM, GRU networks solve the exploding and vanishing
gradient problem.

► Almost each and every SOTA ( state of the art) model based
on RNN follows LSTM or GRU networks for prediction
LSTM
► Long Short-Term Memory Networks or LSTM in deep learning, is a
sequential neural network that allows information to persist.

► It is a special type of Recurrent Neural Network which is capable of

handling the vanishing gradient problem faced by RNN.

► The shortcoming of RNN is they cannot remember long-term

dependencies due to vanishing gradient. LSTMs are explicitly
designed to avoid long-term dependency problems.
What is LSTM?

LSTM (Long Short-Term Memory) is a recurrent neural network (RNN) architecture

widely used in Deep Learning. It excels at capturing long-term dependencies,

making it ideal for sequence prediction tasks.

LSTM has become a powerful tool in artificial intelligence and deep learning,
enabling breakthroughs in various fields by uncovering valuable insights from
sequential data
Every LSTM network basically contains three gates to
control the flow of information and cells to hold
information. The Cell States carries the information from
initial to later time steps without getting vanished.

Forget Gate:
–This gate decides what information should be carried out
forward or what information should be ignored.
–Information from previous hidden states and the current
state information passes through the sigmoid function.
Values that come out from sigmoid are always between 0
and 1.
–if the value is closer to 1 means information should
proceed forward and if value closer to 0 means information
should be ignored.
Input Gate:
–After deciding the relevant information, the information
goes to the input gate, Input gate passes the relevant
information, and this leads to updating the cell states.
simply saving updating the weight.
–Input gate adds the new relevant information to the
existing information by updating cell states.
Output Gate:
–After the information is passed through the input gate,
now the output gate comes into play.
–Output gate generates the next hidden states. and cell
states are carried over the next time step.
Long Short-Term Memory (LSTM)
● Special type of RNN

● Solves vanishing gradient problem

Uses gates:
● Control information flow
LSTM Gates (Simple)
1. Forget Gate → What to remove

2. Input Gate (Write) → What to store

3. Output Gate (Read) → What to output

Acts like:
Memory box with control switches
Selective Operations in LSTM
● Selective Read → Output important info

● Selective Write → Store useful info

● Selective Forget → Remove useless info

GRU
GRU ( Gated Recurrent Units ) are similar to the LSTM
networks. GRU is a kind of newer version of RNN. However,
there are some differences between GRU and LSTM.
–GRU doesn’t contain a cell state
–GRU uses its hidden states to transport information
–It Contains only 2 gates(Reset and Update Gate)
–GRU is faster than LSTM
–GRU has lesser tensor’s operation that makes it faster
►
1. Update Gate
–
Update Gate is a combination of Forget Gate and Input
Gate. Forget gate decides what information to ignore and
what information to add in memory.
2. Reset Gate
–
This Gate Resets the past information in order to get rid
of gradient explosion. Reset Gate determines how much
past information should be forgotten.
Gated Recurrent Unit (GRU)
● Simpler than LSTM

● Uses:

○ Update Gate

○ Reset Gate

Advantages:
● Faster

● Less complex

● Good performance
LSTM vs GRU
Recent Trends & Applications
● NLP (Chatbots, Translation)

● Speech Recognition

● Time-series Forecasting

● Video Analysis

Used in:
● Google Translate

● Voice Assistants
RNN handles sequence data

BPTT used for training

Problems:

● Vanishing gradient

Solutions:

● LSTM, GRU

Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
83 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
51 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
30 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
35 pages
RNNs Explained: Architecture & Applications
No ratings yet
RNNs Explained: Architecture & Applications
37 pages
RNNs for Time Series Prediction in Finance
100% (1)
RNNs for Time Series Prediction in Finance
35 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
29 pages
RNN
No ratings yet
RNN
20 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
32 pages
Understanding RNN Architecture
No ratings yet
Understanding RNN Architecture
8 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
20 pages
Week - 19 (1) 3
No ratings yet
Week - 19 (1) 3
60 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
19 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
47 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
92 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
190 pages
NNDL Unit-V
No ratings yet
NNDL Unit-V
27 pages
DL Unit 4
No ratings yet
DL Unit 4
28 pages
Unit 4 DLA
No ratings yet
Unit 4 DLA
22 pages
Lecture 19 - Sequence Models For Text
No ratings yet
Lecture 19 - Sequence Models For Text
21 pages
Unit-4 2
No ratings yet
Unit-4 2
30 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
36 pages
Unit 5
No ratings yet
Unit 5
20 pages
RNN Unit3
No ratings yet
RNN Unit3
10 pages
Unit 5 DL
No ratings yet
Unit 5 DL
25 pages
Understanding RNNs and ReNNs Basics
No ratings yet
Understanding RNNs and ReNNs Basics
32 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
44 pages
RNNs and Language Models in NLP
No ratings yet
RNNs and Language Models in NLP
46 pages
Language Models in NLP: RNNs & Transformers
No ratings yet
Language Models in NLP: RNNs & Transformers
81 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
105 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
25 pages
Autoregressive Models & RNNs Explained
No ratings yet
Autoregressive Models & RNNs Explained
40 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
28 pages
NLP Techniques: RNNs, LSTM, GRU, GANs
No ratings yet
NLP Techniques: RNNs, LSTM, GRU, GANs
37 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
54 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
22 pages
GRU vs LSTM: Pros and Cons in NLP
No ratings yet
GRU vs LSTM: Pros and Cons in NLP
59 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
11 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
144 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
37 pages
RNNs: Sequence Modeling Explained
No ratings yet
RNNs: Sequence Modeling Explained
18 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
42 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
48 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
9 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
18 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
32 pages
RNN vs LSTM: Key Differences Explained
No ratings yet
RNN vs LSTM: Key Differences Explained
97 pages
V3 DL Unit-4-3AB
No ratings yet
V3 DL Unit-4-3AB
156 pages
Understanding Recurrent Neural Networks
100% (1)
Understanding Recurrent Neural Networks
78 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
99 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
87 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
86 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
51 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
21 pages
Understanding RNNs and Their Variants
No ratings yet
Understanding RNNs and Their Variants
30 pages
CH 2. HADOOP-Mapreduce.
No ratings yet
CH 2. HADOOP-Mapreduce.
102 pages
CH 1
No ratings yet
CH 1
40 pages
Chap 6
No ratings yet
Chap 6
25 pages
DL Practice Q IA1
No ratings yet
DL Practice Q IA1
1 page
Chapter # 1
No ratings yet
Chapter # 1
112 pages
Chapter # 2
No ratings yet
Chapter # 2
127 pages
IoT Practice Questions and Concepts
No ratings yet
IoT Practice Questions and Concepts
1 page
Overview of Information Retrieval Systems
No ratings yet
Overview of Information Retrieval Systems
7 pages
Chapter # 4
No ratings yet
Chapter # 4
56 pages
Discourse and Pragmatic Analysis Guide
No ratings yet
Discourse and Pragmatic Analysis Guide
58 pages
Chapter # 3
No ratings yet
Chapter # 3
18 pages
Computer Engineering POS Tagging Guide
No ratings yet
Computer Engineering POS Tagging Guide
71 pages
Understanding IP Mobility and Mobile IPv6
No ratings yet
Understanding IP Mobility and Mobile IPv6
56 pages
Understanding Loaders and Linkers
100% (1)
Understanding Loaders and Linkers
64 pages
Data Structures in Assemblers Explained
No ratings yet
Data Structures in Assemblers Explained
64 pages
Knowledge-Based Agents and Logic
No ratings yet
Knowledge-Based Agents and Logic
80 pages
Problem Solving in AI: Search Methods
No ratings yet
Problem Solving in AI: Search Methods
72 pages
Understanding Percept Sequences in AI
No ratings yet
Understanding Percept Sequences in AI
29 pages
DBSCAN Online Calculator Overview
No ratings yet
DBSCAN Online Calculator Overview
50 pages
AI in Clinical Decision Support Systems
No ratings yet
AI in Clinical Decision Support Systems
4 pages
Trachoma Detection Using Deep Learning
No ratings yet
Trachoma Detection Using Deep Learning
107 pages
Supervised Learning: Decision Trees & Random Forest
No ratings yet
Supervised Learning: Decision Trees & Random Forest
73 pages
Data Preprocessing & Feature Engineering
No ratings yet
Data Preprocessing & Feature Engineering
9 pages
Car Damage Detection with CNNs
No ratings yet
Car Damage Detection with CNNs
4 pages
Dispersive Flies Optimisation Explained
No ratings yet
Dispersive Flies Optimisation Explained
6 pages
Class 10 AI Sample Question Paper
No ratings yet
Class 10 AI Sample Question Paper
2 pages
Insurance Fraud Detection with ML
No ratings yet
Insurance Fraud Detection with ML
54 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
37 pages
Innovations in Aviation and Space Exploration
No ratings yet
Innovations in Aviation and Space Exploration
3 pages
Indian Military Academy Training Impact
No ratings yet
Indian Military Academy Training Impact
1 page
Azure AI Fundamentals Study Guide AI-900
No ratings yet
Azure AI Fundamentals Study Guide AI-900
67 pages
SMU Data Science Course Syllabus
No ratings yet
SMU Data Science Course Syllabus
4 pages
Agentic AI: Transforming Business Ecosystems
No ratings yet
Agentic AI: Transforming Business Ecosystems
62 pages
A Digital Twin Framework For Circular Economy and Operational Excellence in Textile Manufacturing
No ratings yet
A Digital Twin Framework For Circular Economy and Operational Excellence in Textile Manufacturing
21 pages
AI-Driven Precision Irrigation Solutions
No ratings yet
AI-Driven Precision Irrigation Solutions
6 pages
ChatGPT's Business Applications Explained
No ratings yet
ChatGPT's Business Applications Explained
3 pages
Data Handling Techniques for ML
No ratings yet
Data Handling Techniques for ML
125 pages
Soft Computing Semester II Question Bank
No ratings yet
Soft Computing Semester II Question Bank
21 pages
Face Mask Detection Using CNN
No ratings yet
Face Mask Detection Using CNN
26 pages
Stochastics AM 2023 Course Overview
No ratings yet
Stochastics AM 2023 Course Overview
20 pages
Backtesting Strategies Cheat Sheet
No ratings yet
Backtesting Strategies Cheat Sheet
17 pages
Phishing URL Detection System Report
No ratings yet
Phishing URL Detection System Report
13 pages
Machine Learning for Virtual Flow Metering
No ratings yet
Machine Learning for Virtual Flow Metering
15 pages
Generalized Parametric Contrastive Learning
No ratings yet
Generalized Parametric Contrastive Learning
12 pages
Wealth Management Internship Report
No ratings yet
Wealth Management Internship Report
46 pages
Train-Test Split and Model Evaluation Guide
No ratings yet
Train-Test Split and Model Evaluation Guide
8 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
16 pages
Bci - Unit-4
No ratings yet
Bci - Unit-4
39 pages
Self-Supervised Anomaly Detection Review
No ratings yet
Self-Supervised Anomaly Detection Review
59 pages

Chapter 5

Uploaded by

Chapter 5

Uploaded by

Module # 5

Recurrent Neural Network

► the input was always of the fixed-length/size

► The output at any timestep depends on previous input/output

► The length of the input is not fixed.

► CNN models wont work….!!!

○ Time-series (stock prices)

○ Traditional NN cannot remember past inputs

● Nodes represent variables or intermediate values

► We can unfold a recursive or recurrent computation into a computational graph that

● RNN can be unrolled over time

● Each time step = one layer

For a network with thousands or millions of

Deep learning frameworks solve this problem using

x → Block 1 → x₁ → Block 2 → x₂ → … → Block n → xₙ → Final Block → z

● Start with input x

PyTorch uses dynamic computational graphs (built on-the-fly during

execution), while older versions of TensorFlow used static graphs

(defined before execution).

for PyTorch’s popularity.

● RNN = Neural Network with memory

● Uses previous output as input

● Handles sequential data

► In traditional neural networks, all the inputs and outputs are

► The state is also referred to as Memory State since it remembers the

► This reduces the complexity of parameters, unlike other neural

● Input → Hidden State → Output

● Hidden state stores past information

● Same weights used at every step

► However, differences arise in the way information flows from input to

► Unlike Deep neural networks where we have different weight

► It calculates state hidden state Hi for every input Xi .

► The hidden state is updated using the following recurrence relation:-

► ht -> current state

► ht-1 -> previous state

► xt -> input state

► whh -> weight at recurrent neuron

► wxh -> weight at input neuron

► Why -> weight at output layer

► These parameters are updated using Backpropagation.

► However, since RNN works on sequential data here we use an updated

► Hence we will apply backpropagation throughout all these hidden

► h3 in turn depends on h2 and W

► h2 in turn depends on h1 and W

► h1 in turn depends on h0 and W

► where h0 is a constant starting state.

● Errors are propagated back through time steps

3. Backward pass through all time steps

► In speech recognition, the correct interpretation of the

► Gradients are those values which to update neural networks

● Model stops learning long-term dependencies

● Vanishing gradient is a big problem in deep neural networks.

● If we apply RNN for a paragraph RNN may leave out necessary

● Model becomes unstable

► It is a special type of Recurrent Neural Network which is capable of

► The shortcoming of RNN is they cannot remember long-term

LSTM (Long Short-Term Memory) is a recurrent neural network (RNN) architecture

widely used in Deep Learning. It excels at capturing long-term dependencies,

making it ideal for sequence prediction tasks.

● Solves vanishing gradient problem

2. Input Gate (Write) → What to store

3. Output Gate (Read) → What to output

● Selective Write → Store useful info

● Selective Forget → Remove useless info

BPTT used for training

You might also like