ANNAMALAI UNIVERSITY
FACULTY OF ENGINEERING AND TECHNOLOGY
Department of Computer Science and Engineering
SEMINAR
RECURRENT NEURAL NETWORK
COURSE CODE :22CSPESCN
COURSE TITLE :MACHINE LEARNING
STAFF NAME :[Link]
DATE :07.04.2026
SUBMITTED BY:
THARANI K [2336010095]
STAFF SIGNATURE
INTRODUCTION:
Recurrent Neural Networks (RNNs) are deep learning models
designed to process sequen al data by maintaining an internal
memory (hidden state) of previous inputs. Unlike tradi onal
feedforward neural networks, RNNs share weights across me steps,
allowing them to handle variable-length inputs, making them ideal
for text analysis, speech recogni on, and me-series forecas ng.
Designed for sequen al and temporal data
Maintains memory of past inputs
Widely used in NLP, forecas ng and speech task
Key Components of RNNs
There are mainly two components of RNNs that we will discuss.
1. Recurrent Neurons
The fundamental processing unit in RNN is a Recurrent Unit. They
hold a hidden state that maintains informa on about previous inputs
in a sequence. Recurrent units can "remember" informa on from
prior steps by feeding back their hidden state, allowing them to
capture dependencies across me.
2. RNN Unfolding
RNN unfolding or unrolling is the process of expanding the recurrent
structure over me steps. During unfolding each step of the
sequence is represented as a separate layer in a series illustra ng
how informa on flows across each me step.
This unrolling enables backpropaga on through me (BPTT) a
learning process where errors are propagated across me steps to
adjust the network’s weights enhancing the RNN’s ability to learn
dependencies within sequen al data
Recurrent Neural Network Architecture
RNNs share similari es in input and output structures with other
deep learning architectures but differ significantly in how informa on
flows from input to output. Unlike tradi onal deep neural networks
where each dense layer has dis nct weight matrices.
1. Hidden State Calcula on:
ℎ = 𝜎(𝑈 ⋅ 𝑋 + 𝑊 ⋅ ℎ + 𝐵)
Here:
ℎ represents the current hidden state.
𝑈 and 𝑊 are weight matrices.
𝐵 is the bias.
2. Output Calcula on:
𝑌 = 𝑂(𝑉 ⋅ ℎ + 𝐶)
The output 𝑌 is calculated by applying 𝑂 an ac va on func on to the
weighted hidden state where 𝑉 and 𝐶 represent weights and bias.
3. Overall Func on:
𝑌 = 𝑓(𝑋, ℎ, 𝑊, 𝑈, 𝑉, 𝐵, 𝐶)
This func on defines the en re RNN opera on where the state
matrix 𝑆 holds each element 𝑠 represen ng the network's state at
each me step 𝑖.
Upda ng the Hidden State in RNNs
The current hidden state ℎ depends on the previous state ℎ and
the current input 𝑥 and is calculated using the following rela ons:
1. State Update:
ℎ = 𝑓(ℎ ,𝑥 )
where:
ℎ is the current state
ℎ is the previous state
𝑥 is the input at the current me step
2. Activation Function Application:
ℎ = tanh (𝑊 ⋅ℎ +𝑊 ⋅𝑥 )
Here, 𝑊 is the weight matrix for the recurrent neuron
and 𝑊 is the weight matrix for the input neuron
3. Output Calculation:
𝑦 =𝑊 ⋅ℎ
where 𝑦 is the output and 𝑊 is the weight at the
output layer
These parameters are updated using backpropagation.
However, since RNN works on sequential data here we
use an updated backpropagation which is known as
backpropagation through time.
Backpropagation Through Time (BPTT) in RNNs
Since RNNs process sequential data, Backpropagation
Through Time (BPTT) is used to update the network's
parameters. The loss function L(θ) depends on the final
hidden state ℎ and each hidden state relies on
preceding ones forming a sequential dependency chain:
ℎ depends on depends on ℎ , ℎ depends on
ℎ , … , ℎ depends on ℎ .
In BPTT, gradients are backpropagated through each me step. This is
essen al for upda ng network parameters based on temporal
dependencies.
1. Simplified Gradient Calcula on:
∂𝐿(𝜃) ∂𝐿(𝜃) ∂ℎ
= ⋅
∂𝑊 ∂ℎ ∂𝑊
2. Handling Dependencies in Layers: Each hidden state is updated
based on its dependencies:
ℎ = 𝜎(𝑊 ⋅ ℎ + 𝑏)
The gradient is then calculated for each state, considering
dependencies from previous hidden states.
3. Gradient Calcula on with Explicit and Implicit Parts: The gradient
is broken down into explicit and implicit parts summing up the
indirect paths from each hidden state to the weights.
∂ℎ ∂ℎ ∂ℎ ∂ℎ
= + ⋅
∂𝑊 ∂𝑊 ∂ℎ ∂𝑊
4. Final Gradient Expression: The final deriva ve of the loss func on
with respect to the weight matrix W is computed:
∂𝐿(𝜃) ∂𝐿(𝜃) ∂ℎ ∂ℎ
= ⋅ ⋅
∂𝑊 ∂ℎ ∂ℎ ∂𝑊
This itera ve process is the essence of backpropaga on through
me.
Types Of Recurrent Neural Networks
There are four types of RNNs based on the number of inputs and
outputs in the network:
1. One-to-One RNN
This is the simplest type of neural network architecture where there
is a single input and a single output. It is used for straigh orward
classifica on tasks such as binary classifica on where no sequen al
data is involved.
2. One-to-Many RNN
In a One-to-Many RNN the network processes a single input to
produce mul ple outputs over me. This is useful in tasks where one
input triggers a sequence of predic ons (outputs). For example in
image cap oning a single image can be used as input to generate a
sequence of words as a cap on.
3. Many-to-One RNN
The Many-to-One RNN receives a sequence of inputs and generates a
single output. This type is useful when the overall context of the
input sequence is needed to make one predic on. In sen ment
analysis the model receives a sequence of words (like a sentence)
and produces a single output like posi ve, nega ve or neutral.
4. Many-to-Many RNN
The Many-to-Many RNN type processes a sequence of inputs and
generates a sequence of outputs. In language transla on task a
sequence of words in one language is given as input and a
corresponding sequence in another language is generated as output.
Variants of Recurrent Neural Networks (RNNs)
There are several varia ons of RNNs, each designed to address
specific challenges or op mize for certain tasks:
1. Vanilla RNN
This simplest form of RNN consists of a single hidden layer where
weights are shared across me steps. Vanilla RNNs are suitable for
learning short-term dependencies but are limited by the vanishing
gradient problem, which hampers long-sequence learning.
2. Bidirec onal RNNs
Bidirec onal RNNs process inputs in both forward and backward
direc ons, capturing both past and future context for each me step.
This architecture is ideal for tasks where the en re sequence is
available, such as named en ty recogni on and ques on answering.
3. Long Short-Term Memory Networks (LSTMs)
Long Short-Term Memory Networks (LSTMs) introduce a memory
mechanism to overcome the vanishing gradient problem. Each LSTM
cell has three gates:
Input Gate: Controls how much new informa on should be
added to the cell state.
Forget Gate: Decides what past informa on should be
discarded.
Output Gate: Regulates what informa on should be output at
the current step. This selec ve memory enables LSTMs to
handle long-term dependencies, making them ideal for tasks
where earlier context is cri cal.
4. Gated Recurrent Units (GRUs)
Gated Recurrent Units (GRUs) simplify LSTMs by combining the input
and forget gates into a single update gate and streamlining the
output mechanism. This design is computa onally efficient, o en
performing similarly to LSTMs and is useful in tasks where simplicity
and faster training are beneficial.