0% found this document useful (0 votes)
15 views26 pages

Understanding RNNs, LSTMs, and GRUs

Uploaded by

Indoritwist
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views26 pages

Understanding RNNs, LSTMs, and GRUs

Uploaded by

Indoritwist
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Deep Learning

1
Recurrent Neural Network
Foods History Sentence History…

.... working with her.

.... working with him.

2
3
Recurrent Neural Network (RNN)
Recurrent Neural Network is a generalization of feedforward
neural network that has an internal memory. RNN is recurrent in
nature as it performs the same function for every input of data
while the output of the current input depends on the past one
computation. After producing the output, it is copied and sent
back into the recurrent network. For making a decision, it
considers the current input and the output that it has learned
from the previous input.
Unlike feedforward neural networks, RNNs can use their internal
state (memory) to process sequences of inputs. This makes them
applicable to tasks such as unsegmented, connected handwriting
recognition or speech recognition. In other neural networks, all
the inputs are independent of each other. But in RNN, all the
inputs are related to each other. 4
RNN Architecture and Working…

5
Unfold the RNN Layers

6
Examples of RNN with respect to the relationships

/
7
What is Time series Analysis, How relate it is RNN to
A time series is a series of data points
indexed in time order. Most commonly, a time
series is a sequence taken at successive
equally spaced points in time. Thus it is a
sequence of discrete-time data

Time series model is purely dependent on the idea


that past behavior and price patterns can be used
to predict future price behavior.

8
Why RNN and what is difference between
ANN & RNN

This is a cat, and _____


is a good pet animal

9
Vanishing gradient problem
The vanishing gradient makes the gradient very close to zero, so it's difficult to know
where to move in the state space; the exploding gradient makes the gradient a very
large value, so it makes learning unstable. This problem is more pronounced in
recurrent networks since they use the same matrix at each time step.

10
11
Exploding Gradient: Vanishing Gradient:
The working of the exploding When making use of back-
gradient is similar but the weights propagation the goal is to calculate the error which
here change drastically instead is actually found out by finding
of negligible change. Notice the small out the difference between the actual output and
the model output and raising.
change.

12
Exploding Gradient: Vanishing Gradient:
The working of the exploding When making use of back-
gradient is similar but the weights propagation the goal is to calculate the error which
here change drastically instead is actually found out by finding
of negligible change. Notice the small out the difference between the actual output and
the model output and raising.
change.

13
14
Basic LSTM

Long short-term memory network was first introduced in


1997 by Sepp Hochreiter and his supervisor for a Ph.D.
thesis.
LSTM is a special kind of RNN, capable of
learning long term dependencies.
Remembering information for long period of time is it’s
default behaviour.
Long short-term memory (LSTM) network is the most
popular solution to the vanishing gradient problem.

15
16
First Understand the RNN Works

This is a cat, and _____ is a good pet animal

17
Looking More Clearly

18
Looking More Clearly

19
LSTM’s and GRU’s as a solution
LSTM ’s and GRU’s were created as the solution to short-term
memory. They have internal mechanisms called gates that can
regulate the flow of information.

20
LSTM’s as a solution

21
LSTM’s as a solution (steps)
1. First, the previous hidden state and the current input get concatenated. We’ll call it combine.

2. Combine get’s fed into the forget layer. This layer removes non-relevant data.

4. A candidate layer is created using combine. The candidate holds possible values to add to the
cell state.

3. Combine also get’s fed into the input layer. This layer decides what data from the candidate
should be added to the new cell state.

5. After computing the forget layer, candidate layer, and the input layer, the cell state is
calculated using those vectors and the previous cell state.

6. The output is then computed.

7. Pointwise multiplying the output and the new cell state gives us the new hidden state. 22
GRU’s () Gated Recurrent Unit, as a solution
Now we know how an LSTM work, let’s briefly look at the GRU. The GRU is the newer
generation of Recurrent Neural networks and is pretty similar to an LSTM. GRU’s got rid of the
cell state and used the hidden state to transfer information. It also only has two gates, a reset
gate and update gate.

23
RNN vs LSTM vs GRU

24
RNN vs LSTM vs GRU

The key difference between a GRU and an LSTM is that a GRU has two gates
(reset and update gates) whereas an LSTM has three gates (namely input,
output and forget gates).

GRUs train faster and perform better than LSTMs on less training data if you are
doing language modeling (not sure about other tasks).

GRUs are simpler and thus easier to modify, for example adding new gates in
case of additional input to the network. It's just less code in general.

LSTMs should in theory remember longer sequences than GRUs and outperform
25
them in tasks requiring modeling long-distance relations.
Thanks
26

You might also like