0% found this document useful (0 votes)

21 views10 pages

Understanding RNNs and Their Variants

The document provides an overview of Recurrent Neural Networks (RNNs) and their variants, including Bidirectional RNNs and Long Short-Term Memory (LSTM) networks, highlighting their ability to process sequential data and capture temporal dependencies. It also discusses the Encoder-Decoder architecture for sequence-to-sequence tasks, the concept of teacher forcing in training, and the challenges of gradient computation in RNNs, particularly the vanishing and exploding gradient problems. Additionally, it introduces Recursive Neural Networks (RvNNs) for structured data and Deep RNNs for learning complex temporal representations.

Uploaded by

udemy6061

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views10 pages

Understanding RNNs and Their Variants

Uploaded by

udemy6061

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

1.

Explain how the Recurrent Neural Network (RNN) processes data sequences

A Recurrent Neural Network (RNN) is a type of neural network specifically designed to

process sequential data such as time series, text, speech, and video. Unlike feedforward
neural networks, RNNs have memory, which allows them to retain information from
previous inputs while processing the current input.

Basic Idea of Sequence Processing in RNN

RNN processes data one element at a time in a sequence. At each time step, the network:

 Takes the current input

 Uses information from the previous time step
 Produces an output and updates its internal state

This enables the network to capture temporal dependencies in data.

Working of RNN on a Data Sequence

Consider an input sequence:

x1,x2,x3,…,xTx_1, x_2, x_3, \dots, x_Tx1,x2,x3,…,xT

Step-by-step processing:

1 Input at Time Step ttt

At time step ttt, RNN receives input xt.

2. Hidden State Update

 RNN maintains a hidden state ht, which acts as memory.

 The hidden state is updated using:

ht=f(Wxxt+Whht−1+b)

 xt = current input
 ht−1 = previous hidden state
 Wx,Wh = weight matrices
 b = bias
 f(⋅)= activation function (tanh or ReLU)

3. Output Generation

The output at time step ttt is computed as:

yt=g(Wyht)
where:
Wy = output weight matrix
g(.) = activation function (softmax / sigmoid)

4. Information Flow Through Time

The hidden state ht carries information from:

 Current input xt
 All previous inputs (x1,x2,…,xt−1

This process repeats for every element in the sequence.

Key Characteristics of Sequence Processing in RNN

 Weight sharing across all time steps

 Memory of past inputs through hidden states
 Suitable for variable-length sequences
 Captures temporal and sequential dependencies

[Link] Bidirectional Recurrent Neural Networks (BRNNs) with SuitableArchitecture

A Bidirectional Recurrent Neural Network (Bidirectional RNN or BRNN) is an extension of
the standard RNN that processes a data sequence in both forward and backward directions.
Unlike a unidirectional RNN, which considers only past context, a BRNN uses past as well as
future information to make predictions at each time step.

Motivation for Bidirectional RNN

In many sequence processing tasks, the output at a given time depends not only on previous
inputs but also on future inputs.

Examples:

 Speech recognition
 Natural language processing
 Handwriting recognition

To capture this complete context, Bidirectional RNNs are used.

Working Principle of Bidirectional RNN

A Bidirectional RNN consists of two separate RNN layers:

[Link] RNN

 Processes the input sequence from time step 1 to T

 Generates forward hidden states

[Link] RNN

 Processes the same sequence from time step T to 1

 Generates backward hidden states
The outputs from both directions are combined to produce the final output.

Mathematical Representation

Given an input sequence: x1,x2,x3,…,xT

Forward hidden state: ht→=f(Wxxt+Whht−1→+b)

Backward hidden state: ht←=f(Wxxt+Whht+1←+b)

Output at time step t: yt=g(Wy[ht→,ht←])

Architecture of Bidirectional RNN

 Input sequence is fed simultaneously to:

 Forward RNN layer

 Backward RNN layer

 Each layer maintains its own hidden states

 Hidden states from both directions are:
 Concatenated or summed


3 Long Short-Term Memory (LSTM) is a special type of recurrent neural network that
introduces gated self-loops to allow gradients to flow over long time durations, thereby
effectively learning long-term dependencies.

 LSTM was introduced by Hochreiter and Schmidhuber (1997).

 It solves the vanishing gradient problem in traditional RNNs.
 Uses a memory cell (state) with a linear self-loop.
 The self-loop weight is not fixed; it is controlled by a forget gate.
 The time scale of memory integration can change dynamically based on input.
 LSTM contains three main gates: Forget gate, Input gate, Output gate
 Gates use sigmoid activation to control information flow.
 The cell output is regulated using a tanh activation.

LSTM is widely used in:

 Speech recognition
 Handwriting recognition and generation
 Machine translation
 Image captioning
 Parsing tasks
LSTM Components and Forward Propagation Equations

1. Forget Gate

Controls how much of the previous cell state is retained.

fi(t)=σ(bif+∑jUi,jfxj(t)+∑jWi,jfhj(t−1))

2. Cell State Update

Updates internal memory using forget gate and input gate.

si(t)=fi(t)si(t−1)+gi(t)tanh⁡(bi+∑jUi,jxj(t)+∑jWi,jhj(t−1))

3. Input Gate

Controls how much new information is added to the cell state.

gi(t)=σ(big+∑jUi,jgxj(t)+∑jWi,jghj(t−1))

4. Output Gate and Hidden State

Controls the output of the LSTM cell.

qi(t)=σ(bio+∑jUi,joxj(t)+∑jWi,johj(t−1))q_i^{(t)}

 All gates use sigmoid activation (values between 0 and 1).

 The cell state may also be used as an additional input to gates.
 LSTM cells replace standard hidden units in RNNs.
 Same parameters are reused at each time step.

[Link]–Decoder Sequence-to-Sequence (Seq2Seq) architecture is a neural network

framework used to transform an input sequence into an output sequence, where the
lengths of input and output sequences may differ. This architecture is widely used in
applications such as machine translation, text summarization, and speech recognition.

Basic Idea of Sequence-to-Sequence Learning

 In sequence-to-sequence problems:
 Input is a sequence: x1,x2,…,xT
 Output is another sequence: y1,y2,…,yT
 The encoder–decoder architecture solves this by using:
 Encoder → to encode the input sequence
 Decoder → to generate the output sequence
 Encoder–Decoder Architecture Overview

 The architecture consists of two main components:

 Encoder Network
 Decoder Network
 Both are usually implemented using RNN, LSTM, or GRU units.

Encoder

 The encoder processes the input sequence one time step at a time.
 It converts the input sequence into a fixed-length context vector.
 For each time step t:
 ht=f(ht−1,xt)
 xt = input at time ttt
 ht = hidden state
 After the final input:
 The last hidden state hTh_ThT represents the context vector C. [C=hT]

Decoder

 The decoder generates the output sequence using the context vector from the encoder.
 It predicts one output symbol at a time.
 At time step ttt:
 st=f(st−1,yt−1,C)

 where:
 st= decoder hidden state
 yt−1 = previous output
 g(⋅) = output activation function (softmax)

Working Principle of Encoder–Decoder Architecture

 Encoder reads the entire input sequence

 Information is compressed into a context vector
 Decoder uses this context to generate output sequence
 Output is produced step-by-step
[Link] Forcing and Networks with Output Recurrence
Teacher forcing is a training technique for recurrent neural networks where, during
training, the true output from the training set is fed back into the network at the next time
step instead of the model’s own prediction.

Networks with Output Recurrence

 These networks have recurrent connections only from output to hidden units.
 They lack hidden-to-hidden recurrence, making them less powerful.
 Such networks cannot simulate a universal Turing machine.
 Output units must store all past information needed for future predictions.
 Training becomes simpler and parallelizable because:

 Each time step is decoupled.

 Gradients can be computed independently.
 No need to wait for previous outputs during training.

Teacher forcing is derived from the maximum likelihood criterion, where the model is trained
by feeding the ground-truth output y(t)as input for predicting the next time step.
Maximum Likelihood Formulation:

Log p(y(1),y(2)∣x(1),x(2)) (10.15)

=log p(y(2)∣y(1),x(1),x(2))+log p(y(1)∣x(1),x(2))(10.16)

 At time t=2, the model is trained using the true previous output y(1)
 This shows why ground-truth outputs should be used during training.

Key Points:

 During training:
 True output y(t) is fed into the model at time t+1
 During testing:
 True output is unavailable.
 Model’s own output o(t) is fed back.
 Teacher forcing:
 Avoids back-propagation through time (BPTT) when no hidden-to-hidden recurrence
exists.
 Can still be used in models with hidden recurrence, but BPTT becomes necessary.
 Some models use both teacher forcing and BPTT.

Solutions to Teacher Forcing Problem

Train using a mix of:

 Teacher-forced inputs
 Free-running (self-generated) inputs

Predict targets multiple steps ahead.

Scheduled sampling (Bengio et al., 2015):

 Randomly choose between true output and generated output.

Gradually increase use of generated outputs (curriculum learning).

[Link] Recurrent Neural Networks (Deep RNNs) in Detail

A Deep Recurrent Neural Network (Deep RNN) is an extension of the standard Recurrent
Neural Network in which multiple recurrent layers are stacked on top of each other. This
depth enables the network to learn hierarchical and complex temporal representations
from sequential data such as speech, text, and time series.
Standard (shallow) RNNs have limited representational power because they contain only one
recurrent layer. For complex sequence learning tasks, shallow RNNs may fail to capture:

 High-level temporal patterns

 Long-range dependencies

Deep RNNs overcome this limitation by introducing depth in the temporal model.

Architecture of Deep Recurrent Neural Network

A Deep RNN consists of:

 Multiple recurrent hidden layers

 Each layer processes the output sequence of the previous layer

For a Deep RNN with LLL layers:

 Layer 1 processes the input sequence

 Higher layers process increasingly abstract temporal features.

Working Principle of Deep RNN

 Consider an input sequence: x1,x2,…,xT

 Let ht(l) be the hidden state at time t and layer l.

 Hidden state update equation: ht(l)=f(W(l)ht(l−1)+U(l)ht−1(l)+b(l))

 Output Generation

 The output at time step ttt is computed from the topmost recurrent layer: yt=g(ht(L))

Key Characteristics of Deep RNN

 Depth in time and space
 Learns hierarchical temporal features
 Each layer captures different levels of abstraction
 Uses Backpropagation Through Time (BPTT) for training
Explain Recursive Neural Networks (RvNNs)
A Recursive Neural Network (Recursive NN or RvNN) is a type of neural network
designed to process structured and hierarchical data rather than simple sequences. Unlike
Recurrent Neural Networks, which operate over time sequences, Recursive Neural Networks
operate over tree-like or graph structures, making them suitable for data with a recursive
structure such as parse trees in natural language processing

 .Same neural network is applied repeatedly

 Structure of the network follows the structure of the input data

Working Principle of Recursive Neural Network

Consider a tree-structured input (e.g., sentence parse tree).

Step-by-step working:

Leaf Nodes

 Leaf nodes represent basic inputs (words or tokens).

 Each leaf node is converted into a vector representation.

Recursive Composition

 Parent node representation is computed by combining its child nodes.

 The same function is used at every node.

hp=f(W[hc1,hc2]+b)

Key Characteristics of Recursive Neural Networks

 Operate on hierarchical structures

 Use weight sharing across tree nodes
 Process data in a bottom-up manner
 Depth varies depending on input structure
Describe the Computation of Gradient in a Recurrent Neural Network (RNN)
Training a Recurrent Neural Network (RNN) requires computing gradients of the loss
function with respect to network parameters. Since RNNs have recurrent connections
across time steps, gradient computation is more complex than in feedforward networks. This
is performed using a technique called Backpropagation Through Time (BPTT).

Why Gradient Computation is Different in RNN

 RNN parameters are shared across all time steps

 The hidden state at a given time depends on previous hidden states
 Errors must be propagated backward through time
 Thus, gradients must account for temporal dependencies.

Forward Computation in RNN

For an input sequence x1,x2,…,xTx_1, x_2, \dots, x_Tx1,x2,…,xT:

 Hidden state: ht=f(Wxxt+Whht−1+b)

 Output: yt=g(Wyht)
 Total loss: L=∑t=1TLt(yt,y^t)

Backpropagation Through Time (BPTT)

BPTT unfolds the RNN across time steps, converting it into a deep feedforward network.
Gradients are then computed using the chain rule.

Gradient Computation Steps

[Link] w.r.t. Output Weights Wy

[Link] w.r.t. Hidden State

The error at time step ttt depends on:

Error from the output at time t

Error propagated from future time steps

3 Gradient w.r.t. Recurrent Weights Wh

3️⃣

This shows that gradients accumulate across time steps.

4️⃣Gradient w.r.t. Input Weights WxW_xWx

Vanishing and Exploding Gradients

 During BPTT:
 Repeated multiplication of gradients can cause:
 Vanishing gradients (values → 0)
 Exploding gradients (values → ∞)
 This makes learning long-term dependencies difficult.

Techniques to Handle Gradient Problems

 Gradient clipping
 Proper weight initialization
 Using LSTM or GRU instead of simple RNN

Sequence Modeling with RNNs and LSTMs
No ratings yet
Sequence Modeling with RNNs and LSTMs
8 pages
DL Module 5
No ratings yet
DL Module 5
7 pages
RNNs and Seq2Seq Architectures Explained
No ratings yet
RNNs and Seq2Seq Architectures Explained
23 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
83 pages
Transfer Learning & RNNs Explained
No ratings yet
Transfer Learning & RNNs Explained
63 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
48 pages
Part One
No ratings yet
Part One
36 pages
Recurrent Neural Networks Overview
No ratings yet
Recurrent Neural Networks Overview
20 pages
Chapter 5
No ratings yet
Chapter 5
48 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
92 pages
Understanding RNNs and Their Applications
No ratings yet
Understanding RNNs and Their Applications
10 pages
Recurrent Neural Networks Explained
No ratings yet
Recurrent Neural Networks Explained
17 pages
RNNs for Time Series Prediction in Finance
100% (1)
RNNs for Time Series Prediction in Finance
35 pages
Unfolding RNN Computational Graphs
No ratings yet
Unfolding RNN Computational Graphs
44 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
144 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
51 pages
RNNs and LSTMs: Deep Learning Insights
No ratings yet
RNNs and LSTMs: Deep Learning Insights
12 pages
Lecture 19 - Sequence Models For Text
No ratings yet
Lecture 19 - Sequence Models For Text
21 pages
RNNs for Long Sequence Data Processing
100% (1)
RNNs for Long Sequence Data Processing
131 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
33 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
29 pages
DLUNIT3pdf 2026 03 05 15 15 05
No ratings yet
DLUNIT3pdf 2026 03 05 15 15 05
94 pages
Understanding RNNs and Their Variants
No ratings yet
Understanding RNNs and Their Variants
30 pages
RNN Part1
No ratings yet
RNN Part1
42 pages
Unit4 RNN Answers
No ratings yet
Unit4 RNN Answers
16 pages
Unfolding Computational Graphs in RNNs
No ratings yet
Unfolding Computational Graphs in RNNs
36 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
16 pages
Unit 4
No ratings yet
Unit 4
9 pages
DL Unit 4
No ratings yet
DL Unit 4
28 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
31 pages
UNIT IV Deep Learing
No ratings yet
UNIT IV Deep Learing
31 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
8 pages
RNNs and Sequence Modeling Techniques
No ratings yet
RNNs and Sequence Modeling Techniques
26 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
47 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
10 pages
RNN, LSTM, and GRU Overview
No ratings yet
RNN, LSTM, and GRU Overview
14 pages
Deep Learning with RNNs for Time-Series
No ratings yet
Deep Learning with RNNs for Time-Series
25 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
105 pages
Bidirectional RNN Overview
No ratings yet
Bidirectional RNN Overview
19 pages
RNN Architectures: LSTM vs GRU vs Transformer
0% (1)
RNN Architectures: LSTM vs GRU vs Transformer
123 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
13 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
44 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
22 pages
Lecture6 RNN
No ratings yet
Lecture6 RNN
40 pages
Chapter 3-Part 1
No ratings yet
Chapter 3-Part 1
73 pages
Gentle Introduction to RNNs and LSTMs
No ratings yet
Gentle Introduction to RNNs and LSTMs
21 pages
RNN Unrolling and Training Insights
No ratings yet
RNN Unrolling and Training Insights
60 pages
Unit 5 DL
No ratings yet
Unit 5 DL
25 pages
DL 4
No ratings yet
DL 4
21 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
30 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
16 pages
Sequence Modeling with RNNs and LSTMs
No ratings yet
Sequence Modeling with RNNs and LSTMs
125 pages
Autoregressive Models & RNNs Explained
No ratings yet
Autoregressive Models & RNNs Explained
40 pages
Extreme Networks: Spectralink VIEW Certified Configuration Guide
No ratings yet
Extreme Networks: Spectralink VIEW Certified Configuration Guide
27 pages
Overview of Computer Generations
No ratings yet
Overview of Computer Generations
24 pages
SR2 B201B Installation Instructions
No ratings yet
SR2 B201B Installation Instructions
2 pages
E-Learning System Requirements Document
100% (3)
E-Learning System Requirements Document
9 pages
Zoho People Overview and Pricing
No ratings yet
Zoho People Overview and Pricing
14 pages
Data Analyst Profile: Pooja Kanojia
No ratings yet
Data Analyst Profile: Pooja Kanojia
1 page
FIFA 15 Installation Support Files
No ratings yet
FIFA 15 Installation Support Files
2 pages
Understanding C I/O Operations
No ratings yet
Understanding C I/O Operations
18 pages
AEC Guide For DraftSight 2018 PDF
100% (2)
AEC Guide For DraftSight 2018 PDF
79 pages
MAT 216 Problem Sheet Solutions
No ratings yet
MAT 216 Problem Sheet Solutions
3 pages
KI 825 Installation Manual Overview
No ratings yet
KI 825 Installation Manual Overview
107 pages
Account Statement: March 2024
No ratings yet
Account Statement: March 2024
10 pages
Symbolism in Fahrenheit 451 Explained
No ratings yet
Symbolism in Fahrenheit 451 Explained
57 pages
Profile of Peter Kootsookos
No ratings yet
Profile of Peter Kootsookos
7 pages
ACER AL1932 Service Manual
No ratings yet
ACER AL1932 Service Manual
42 pages
GeneAmp 9700 User Guide: Dual 96 Module
No ratings yet
GeneAmp 9700 User Guide: Dual 96 Module
55 pages
KRAL Bem 150
No ratings yet
KRAL Bem 150
24 pages
MVG Electrical Core Products Overview
No ratings yet
MVG Electrical Core Products Overview
1 page
BBA Marketing Management Assignments
No ratings yet
BBA Marketing Management Assignments
5 pages
NORSOK Z-CR-007: Mechanical Completion Guide
100% (5)
NORSOK Z-CR-007: Mechanical Completion Guide
27 pages
F1 Drive To Survive Season 2 Episode 1
No ratings yet
F1 Drive To Survive Season 2 Episode 1
8 pages
Understanding Management Information Systems
100% (1)
Understanding Management Information Systems
17 pages
Learning MySQL and MariaDB First Edition Dyer Ebook Testbank Solutions Chapter Rich Content
100% (5)
Learning MySQL and MariaDB First Edition Dyer Ebook Testbank Solutions Chapter Rich Content
156 pages
Lexical Analyzer Output Program
No ratings yet
Lexical Analyzer Output Program
25 pages
Telephone Bill Aug
No ratings yet
Telephone Bill Aug
3 pages
Understanding Atomic Sentences in FOL
No ratings yet
Understanding Atomic Sentences in FOL
22 pages
Manas Technology Solutions Overview
No ratings yet
Manas Technology Solutions Overview
8 pages
Hoover Public Library: Community & Learning Hub
100% (2)
Hoover Public Library: Community & Learning Hub
8 pages
Dynamic Posing Techniques for Animation
No ratings yet
Dynamic Posing Techniques for Animation
54 pages
Parallel and Distributed Computing Overview
No ratings yet
Parallel and Distributed Computing Overview
30 pages