0% found this document useful (0 votes)

14 views37 pages

Recurrent Neural Networks

This lecture discusses Recurrent Neural Networks (RNNs) and their application to variable width inputs, particularly in text analysis. It covers the structure and functioning of RNNs, including the concept of stateful models and the importance of memory, as well as the introduction of Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) as advancements to address limitations in basic RNNs. The lecture also highlights training techniques and the evaluation of different recurrent structures.

Uploaded by

bomb98635

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views37 pages

Recurrent Neural Networks

Uploaded by

bomb98635

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 21

Recurrent Neural Networks

25 April 2016

Taylor B. Arnold
Yale Statistics
STAT 365/665

Rfj8
Notes

! Problem set 6 was handed back yesterday

! Problem sets 7 & 8 will be returned by Thursday
! Problem set 9 is due a week from today

kfj8
Recurrentneuralnetworks

Recurrent neural networks address a concern with traditional neural networks that
becomes apparent when dealing with, amongst other applications, text analysis: the
issue of variable width inputs.

jfj8
Recurrentneuralnetworks

Recurrent neural networks address a concern with traditional neural networks that
becomes apparent when dealing with, amongst other applications, text analysis: the
issue of variable width inputs.

This is also, of course, a concern with images but the solution there is quite different
because we can stretch and scale images to ﬁt whatever size we need at the moment.
This not so with words.

jfj8
RNNsastime

An equivalent framing of this problem is to think of a string of text as streaming in over

time. Regardless of how many words I have seen in a given document, I want to make as
good an estimate as possible about whatever outcome is of interest at that moment.

9fj8
Statefulmodels

Using this idea, we can think of variable width inputs such that each new word simply
updates our current prediction. In this way an RNN has two types of data inside of it:

! ﬁxed weights, just as we have been using with CNNs

! stateful variables that are updated as it observes words in a document

We can also think of this as giving ‘memory’ to the neural network.

8fj8
A third way of thinking about recurrent neural networks is to think of a network that has
a loop in it. However, the self-input get’s applied the next time it is called.

efj8
A fourth way of thinking about a recurrent neural network is
mathematically. We now have two parts to the update function in the
RNN:

?i = qti + # + l?i−1

dfj8
A fourth way of thinking about a recurrent neural network is
mathematically. We now have two parts to the update function in the
RNN:

?i = qti + # + l?i−1

Notice that l must always be a square matrix, because we could unravel

this one time further to yield:

?i = qti + # + lqti−1 + l# + l2 ?i−2

dfj8
A noteontimeinitialization

One confusing bit, at least for me the ﬁrst time I saw RNNs, is the relationship between
time and samples. We typically restart the state, or memory, of the RNN when we move
on to a new sample. This detail seems to be glossed over in most tutorials on RNNs, but I
think it clariﬁes a key idea in what these models are capturing.

3fj8
UnrollinganRNN

In truth, an RNN can be seen as a traditional feedforward neural network by unrolling

the time component (assuming that there is a ﬁxed number of time steps).

Nfj8
Unrolling the recurrent neural network.

Ryfj8
TrainingRNNs

While it is nice that we get a ‘running output’ from the model, when we train RNNs we
typically ignore all but the ﬁnal output to the model. Getting the right answer after we
have looked at the entire document is the end goal, anyway. To do this,
back-propogation can be used as before.

While we could unroll the RNN into a FF network and apply the algorithms we saw in
Lecture 13, for both memory consumption and computational efﬁciency, techniques
exist to short-cut this approach.

RRfj8
I. Load IMDB dataset

Rkfj8
II. Basic RNN example

Rjfj8
Because of the state in the model, words that occur early in the sequence can still have
an inﬂuence on later outputs.

R9fj8
Using a basic dense layer as the RNN unit, however, makes it so that long range effects
are hard to pass on.

R8fj8
Long short-term memory was original proposed way back in 1997 in order to alleviate
this problem.

Hochreiter, Sepp, and Jürgen Schmidhuber. ”Long short-term memory.” Neural

computation 9, no. 8 (1997): 1735-1780.

Their speciﬁc idea that has had surprising staying power.

Refj8
A great reference for dissecting the details of their paper is the blog post by Christopher
Olah:

?iiT,ff+QH?X;Bi?m#XBQfTQbibfkyR8@y3@lM/2`biM/BM;@GahJbf

I will pull extensively from it throughout the remainder of today’s lecture.

Rdfj8
Some people consider LSTM’s to be a bit hard to understand; here is a diagram from the
original paper that partially explains where the confusion comes from!
R3fj8
In fact, though, basic idea of an LSTM layer is exactly the same as a simple RNN layer.

RNfj8
It is just that the internal mechanism is just a bit more complex, with two separate
self-loops and several independent weight functions to serve slightly different purposes.

kyfj8
The diagrams use a few simple mechanics, most of which we have seen in some form in
CNNs. The pointwise operation, for example, is used in the ResNet architecture when
creating skip-connections.

kRfj8
A key idea is to separate the response that is passed back into the LSTM and the output
that is emitted; there is no particular reason these need to be the same. The cellstate is
the part of the layer that get’s passed back, and is changed from iteration to iteration
only by two linear functions.
kkfj8
Next, consider the forgetgate. It uses the previous output ?i−1 and the current input ti
to determine multiplicative weights to apply to the cell state. We use a sigmoid layer here
because it makes sense to have weights between 0 and 1.
kjfj8
Next, we have a choice of how to update the cell state. This is done by multiplying an
input gate (again, with a sigmoid layer) by a tanh activated linear layer.

k9fj8
The cell state of the next iteration is now completely determined, and can be calculated
directly.

k8fj8
Now, to determine the output of the model, we want to emit a weighted version of the
cell state. This is done by applying a tanh activation and multiplying by the fourth and
ﬁnal set of weights: the output weights. This passed both as an output to the LSTM layer
as well as into the next time step of the LSTM.
kefj8
III. LSTM

kdfj8
Over the years, variants on the LSTM layers have been given. Confusingly, these are
often presented as LSTM layers rather than minor variants on the original technique.
One modiﬁcation is to add peepholes so that the input, forget, and output gates also take
the current cell state into account.
k3fj8
One natural extension is to set the input and forget gates to be the negation of one
another.

kNfj8
A more dramatically different alternative is known as a Gated Recurrent Unit (GRU),
originally presented in this paper:

Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio.
“On the properties of neural machine translation: Encoder-decoder approaches.”
arXiv preprint arXiv:1409.1259 (2014).

One beneﬁt is that is offers a slight simpliﬁcation in the model with no systematic
performance penalty. Along with LSTM, it is the only other model implemented in
keras, which should point to its growing popularity.

jyfj8
In short, in combines the input and cell states together, and combines the forget and
input gates. This results in one fewer set of weight matrices to learn.

jRfj8
If you would like a good, comphrensive, and empirical evaluation of the various tweaks
to these recurrent structures, I recommend this paper

Greff, Klaus, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, and
Jürgen Schmidhuber. “LSTM: A search space odyssey.” arXiv preprint
arXiv:1503.04069 (2015).

As well as this article:

Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. “An empirical exploration
of recurrent network architectures.” In Proceedings of the 32nd International
Conference on Machine Learning (ICML-15), pp. 2342-2350. 2015.

Though, once you fully understand the LSTM model, the speciﬁcs amongst the
competing approaches typically do not require understanding any new big ideas.

jkfj8
IV. GRU

jjfj8
V. Evaluating a sequence of inputs

j9fj8
VI. Visualize the output

j8fj8

LSTM Overview and Applications
No ratings yet
LSTM Overview and Applications
72 pages
Exploring LSTM Networks Explained
No ratings yet
Exploring LSTM Networks Explained
7 pages
Lecture6 RNN
No ratings yet
Lecture6 RNN
40 pages
Understanding LSTM Networks Explained
No ratings yet
Understanding LSTM Networks Explained
7 pages
Autoregressive Models & RNNs Explained
No ratings yet
Autoregressive Models & RNNs Explained
40 pages
LSTM Networks Explained: Long-Term Memory
No ratings yet
LSTM Networks Explained: Long-Term Memory
9 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
29 pages
RNN, LSTM, and GRU Architectures Explained
No ratings yet
RNN, LSTM, and GRU Architectures Explained
9 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
83 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
144 pages
Understanding LSTM Networks by Colah
No ratings yet
Understanding LSTM Networks by Colah
15 pages
RNN Architectures: LSTM vs GRU vs Transformer
0% (1)
RNN Architectures: LSTM vs GRU vs Transformer
123 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
33 pages
Exploring LSTM Networks in RNNs
No ratings yet
Exploring LSTM Networks in RNNs
10 pages
LSTM Explained: A Simple Overview
No ratings yet
LSTM Explained: A Simple Overview
4 pages
Simple CNN and RNN Model Overview
100% (3)
Simple CNN and RNN Model Overview
20 pages
Markdown To PDF-3
No ratings yet
Markdown To PDF-3
25 pages
Understanding LSTM Networks Summary
No ratings yet
Understanding LSTM Networks Summary
15 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
47 pages
LSTM RNNs in NLP: Lecture Notes
No ratings yet
LSTM RNNs in NLP: Lecture Notes
57 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
16 pages
Understanding RNN and LSTM Models
No ratings yet
Understanding RNN and LSTM Models
51 pages
RNNs vs LSTMs: Key Differences Explained
No ratings yet
RNNs vs LSTMs: Key Differences Explained
49 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
30 pages
RNNs and LSTMs: Deep Learning Insights
No ratings yet
RNNs and LSTMs: Deep Learning Insights
12 pages
UNIT IV Deep Learing
No ratings yet
UNIT IV Deep Learing
31 pages
Understanding LSTM Gates and Equations
No ratings yet
Understanding LSTM Gates and Equations
22 pages
Unfolding Computational Graphs in RNNs
No ratings yet
Unfolding Computational Graphs in RNNs
17 pages
RNN Unrolling and Training Insights
No ratings yet
RNN Unrolling and Training Insights
60 pages
RNNs for Time Series Prediction in Finance
100% (1)
RNNs for Time Series Prediction in Finance
35 pages
Understanding RNNs, LSTMs, and GRUs
No ratings yet
Understanding RNNs, LSTMs, and GRUs
26 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
20 pages
RNNs and LSTM in Sequence Modeling
No ratings yet
RNNs and LSTM in Sequence Modeling
75 pages
Understanding Seq2Seq Models in NLP
No ratings yet
Understanding Seq2Seq Models in NLP
34 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
102 pages
Understanding RNNs and LSTMs Explained
No ratings yet
Understanding RNNs and LSTMs Explained
3 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
14 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
14 pages
LSTM RNNs in NLP: Key Concepts
No ratings yet
LSTM RNNs in NLP: Key Concepts
57 pages
RNNs and LSTMs: Understanding Mechanisms
No ratings yet
RNNs and LSTMs: Understanding Mechanisms
48 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
31 pages
Understanding RNN, LSTM, and GRU Concepts
No ratings yet
Understanding RNN, LSTM, and GRU Concepts
11 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
35 pages
Sequence Modeling with RNNs and LSTMs
No ratings yet
Sequence Modeling with RNNs and LSTMs
8 pages
LSTM and GRU: Illustrated Guide
No ratings yet
LSTM and GRU: Illustrated Guide
15 pages
Scheduled Sampling in RNNs Explained
No ratings yet
Scheduled Sampling in RNNs Explained
37 pages
Understanding Long Short-Term Memory
No ratings yet
Understanding Long Short-Term Memory
25 pages
RNNs and LSTMs Overview
No ratings yet
RNNs and LSTMs Overview
6 pages
Overview of Recurrent Neural Networks
100% (1)
Overview of Recurrent Neural Networks
14 pages
RNN vs LSTM: Key Differences Explained
No ratings yet
RNN vs LSTM: Key Differences Explained
32 pages
Sequence Models in Deep Learning
No ratings yet
Sequence Models in Deep Learning
49 pages
RNNs and Sequence Modeling Techniques
No ratings yet
RNNs and Sequence Modeling Techniques
26 pages
RNNs and LSTMs in Deep Learning
No ratings yet
RNNs and LSTMs in Deep Learning
62 pages
Chapter 5
No ratings yet
Chapter 5
48 pages
Understanding RNNs and Their Variants
No ratings yet
Understanding RNNs and Their Variants
10 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
47 pages
Unfolding RNN Computational Graphs
No ratings yet
Unfolding RNN Computational Graphs
44 pages
RNN Topologies Overview
No ratings yet
RNN Topologies Overview
65 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
68 pages
Installation Guide
No ratings yet
Installation Guide
60 pages
Key Traits of Expository Writing
No ratings yet
Key Traits of Expository Writing
2 pages
Resume: Career Objective
No ratings yet
Resume: Career Objective
2 pages
Understanding Screw and Bolt Standards
No ratings yet
Understanding Screw and Bolt Standards
85 pages
SRv6 uSID Support in FRR
No ratings yet
SRv6 uSID Support in FRR
34 pages
Agentforce Custom Action Optimization
No ratings yet
Agentforce Custom Action Optimization
12 pages
Procurement Evaluation of Igloo Ice Cream
No ratings yet
Procurement Evaluation of Igloo Ice Cream
75 pages
Siphon Check Valve Installation Guide
No ratings yet
Siphon Check Valve Installation Guide
8 pages
Children's Palliative Care Guide 2018
No ratings yet
Children's Palliative Care Guide 2018
21 pages
Work Breakdown Structure Overview
No ratings yet
Work Breakdown Structure Overview
8 pages
AER Directive 010: Casing Design Standards
No ratings yet
AER Directive 010: Casing Design Standards
25 pages
ANSYS Meshing Grid Data Overview
No ratings yet
ANSYS Meshing Grid Data Overview
227 pages
Introduction to Medical Terminology
No ratings yet
Introduction to Medical Terminology
32 pages
Understanding Electroglottography in Voice
No ratings yet
Understanding Electroglottography in Voice
6 pages
Proforma Invoice for Solar Equipment
No ratings yet
Proforma Invoice for Solar Equipment
1 page
Evolution of Classical Guitar Recording
100% (8)
Evolution of Classical Guitar Recording
445 pages
Math MCQs for 11th Grade Students
No ratings yet
Math MCQs for 11th Grade Students
2 pages
TMJ Dysfunction in Whiplash: Tinnitus Link
No ratings yet
TMJ Dysfunction in Whiplash: Tinnitus Link
3 pages
Understanding Mobile Databases
No ratings yet
Understanding Mobile Databases
4 pages
Fusos de Esferas
No ratings yet
Fusos de Esferas
179 pages
GIS Applications in Oil and Gas Industry
100% (1)
GIS Applications in Oil and Gas Industry
14 pages
Inorganic Trends & Exceptions For NEET
No ratings yet
Inorganic Trends & Exceptions For NEET
14 pages
Conservation of Flora and Fauna in India
No ratings yet
Conservation of Flora and Fauna in India
4 pages
Lake Pollution Model
No ratings yet
Lake Pollution Model
5 pages
Community Engagement and Solidarity
80% (49)
Community Engagement and Solidarity
22 pages
Math5 - Q3 - M19 - Measuring The Circumference of A Circle Using Appropriate Tools
No ratings yet
Math5 - Q3 - M19 - Measuring The Circumference of A Circle Using Appropriate Tools
17 pages
UC Davis ARE 107 Problem Set 1 Answer Key
No ratings yet
UC Davis ARE 107 Problem Set 1 Answer Key
6 pages
Nutritional Treatment of Coronavirus
No ratings yet
Nutritional Treatment of Coronavirus
9 pages
Mechanization in Rice Production Challenges
No ratings yet
Mechanization in Rice Production Challenges
45 pages
Java Programming Exam December 2022
No ratings yet
Java Programming Exam December 2022
2 pages

Recurrent Neural Networks

Uploaded by

Recurrent Neural Networks

Uploaded by

Lecture 21

Recurrent Neural Networks

! Problem set 6 was handed back yesterday

An equivalent framing of this problem is to think of a string of text as streaming in over

! ﬁxed weights, just as we have been using with CNNs

We can also think of this as giving ‘memory’ to the neural network.

Notice that l must always be a square matrix, because we could unravel

?i = qti + # + lqti−1 + l# + l2 ?i−2

In truth, an RNN can be seen as a traditional feedforward neural network by unrolling

Hochreiter, Sepp, and Jürgen Schmidhuber. ”Long short-term memory.” Neural

Their speciﬁc idea that has had surprising staying power.

I will pull extensively from it throughout the remainder of today’s lecture.

As well as this article:

You might also like