0% found this document useful (0 votes)
11 views92 pages

Recurrent Neural Networks Overview

The document discusses Recurrent Neural Networks (RNNs) and their applications, including types like Vanilla RNNs, LSTMs, and GRUs. It highlights the limitations of traditional neural networks in modeling sequences and introduces concepts such as backpropagation through time and gradient flow issues. The document also covers the architecture of LSTMs as a solution to the challenges faced by standard RNNs.

Uploaded by

coolmusica44
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views92 pages

Recurrent Neural Networks Overview

The document discusses Recurrent Neural Networks (RNNs) and their applications, including types like Vanilla RNNs, LSTMs, and GRUs. It highlights the limitations of traditional neural networks in modeling sequences and introduces concepts such as backpropagation through time and gradient flow issues. The document also covers the architecture of LSTMs as a solution to the challenges faced by standard RNNs.

Uploaded by

coolmusica44
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Recurrent Neural Nets

&
Visual Captioning

Lecture 17

Slides from: Dhruv Bhatra, Fei-Fei Li, Justin Johnson,


Serena Yeung, Andrej Karpathy
Recurrent Neural Nets

(C) Dhruv Batra Image Credit: Andrej Karpathy 2


Recurrent Neural Nets

Input: No
sequence Input: No sequence Input: Sequence Input: Sequence
Output: Sequence Output: No Output: Sequence
Output: No
sequence sequence
Example: Example: machine translation, video captioning, open-
Im2Caption Example: sentence ended question answering, video question answering
Example:
“standard” classification,
classification / multiple-choice
question answering
regression
problems
(C) Dhruv Batra Image Credit: Andrej Karpathy 3
Synonyms
• Recurrent Neural Networks (RNNs)

• Types:
– “Vanilla” RNNs
– Long Short Term Memory (LSTMs)
– Gated Recurrent Units (GRUs)
– …

• Algorithms
– BackProp Through Time (BPTT)

(C) Dhruv Batra 4


What’s wrong with MLPs/ConvNets?
• Problem 1: Can’t model sequences
– Fixed-sized Inputs & Outputs
– No temporal structure

• Problem 2: Pure feed-forward processing


– No “memory”, no feedback

(C) Dhruv Batra Image Credit: Alex Graves, book 5


Sequences are everywhere…

(C) Dhruv Batra Image Credit: Alex Graves and Kevin Gimpel 6
Even where you might not expect a sequence…

(C) Dhruv Batra Image Credit: Vinyals et al. 7


Recurrent Neural Network

RNN

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Recurrent Neural Network
usually want to
y
predict a vector at
some time steps

RNN

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Recurrent Neural Network
We can process a sequence of vectors x by
applying a recurrence formula at every time step: y

RNN
new state old state input vector at
some time step
some function x
with parameters W

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Recurrent Neural Network
We can process a sequence of vectors x by
applying a recurrence formula at every time step: y

RNN

Notice: the same function and the same set x


of parameters are used at every time step.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


(Vanilla) Recurrent Neural Network
The state consists of a single “hidden” vector h:

y yt = Why ht + by

RNN

ht = tanh(Whh ht 1 + Wxh xt + bh )

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


RNN: Computational Graph

h0 fW h1

x1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


RNN: Computational Graph

h0 fW h1 fW h2

x1 x2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


RNN: Computational Graph

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


RNN: Computational Graph

Re-use the same weight matrix at every time-step

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


RNN: Computational Graph: Many to Many

y1 y2 y3 yT

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


RNN: Computational Graph: Many to Many

y1 L1 y2 L2 y3 L3 yT LT

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


L
RNN: Computational Graph: Many to Many

y1 L1 y2 L2 y3 L3 yT LT

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


RNN: Computational Graph: Many to One

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


RNN: Computational Graph: One to Many

y1 y2 y3 yT

h0 fW h1 fW h2 fW h3
… hT

x
W

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Sequence to Sequence: Many-to-one + one-to-
many
Many to one: Encode input
sequence in a single vector

h0 fW h1 fW h2 fW h3 … hT

x1 x2 x3
W1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Sequence to Sequence: Many-to-one + one-to-
many
One to many: Produce output
sequence from single input vector
Many to one: Encode input
sequence in a single vector
y1 y2

h0 fW h1 fW h2 fW h3 … hT fW h1 fW h2 fW …

x1 x2 x3
W1 W2

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Forward through entire sequence to
Backpropagation through time compute loss, then backward through
entire sequence to compute gradient

Loss

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Truncated Backpropagation through time
Loss

Run forward and backward


through chunks of the
sequence instead of whole
sequence

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Truncated Backpropagation through time
Loss

Carry hidden states


forward in time forever,
but only backpropagate
for some smaller
number of steps

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Truncated Backpropagation through time
Loss

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Example:
Character-level
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Example:
Character-level
ht = tanh(Whh ht 1 + Wxh xt + bh )
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Example:
Character-level
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


“l”
Example: Sample
“e” “l” “o”

Character-level .03
.13
.25
.20
.11
.17
.11
.02
Softmax
Language Model
.00 .05 .68 .08
.84 .50 .03 .79

Sampling

Vocabulary:
[h,e,l,o]

At test-time sample
characters one at a
time, feed back to
model

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


“l”
Example: Sample
“e” “l” “o”

Character-level .03
.13
.25
.20
.11
.17
.11
.02
Softmax
Language Model
.00 .05 .68 .08
.84 .50 .03 .79

Sampling

Vocabulary:
[h,e,l,o]

At test-time sample
characters one at a
time, feed back to
model

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


“l”
Example: Sample
“e” “l” “o”

Character-level .03
.13
.25
.20
.11
.17
.11
.02
Softmax
Language Model
.00 .05 .68 .08
.84 .50 .03 .79

Sampling

Vocabulary:
[h,e,l,o]

At test-time sample
characters one at a
time, feed back to
model

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


“l”
Example: Sample
“e” “l” “o”

Character-level .03
.13
.25
.20
.11
.17
.11
.02
Softmax
Language Model
.00 .05 .68 .08
.84 .50 .03 .79

Sampling

Vocabulary:
[h,e,l,o]

At test-time sample
characters one at a
time, feed back to
model

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


[Link] gist: 112 lines of Python

([Link]
566867f8291f086)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


y

RNN

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


at first:
train more

train more

train more

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Multilayer RNNs

depth

time

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

W tanh

ht-1 stack ht

xt

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

Backpropagation from ht
to ht-1 multiplies by W
(actually WhhT)

W tanh

ht-1 stack ht

xt

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

h0 h1 h2 h3 h4

x1 x2 x3 x4

Computing gradient
of h0 involves many
factors of W
(and repeated tanh)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1:


Computing gradient Exploding gradients
of h0 involves many
factors of W Largest singular value < 1:
(and repeated tanh) Vanishing gradients

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1: Gradient clipping: Scale


Computing gradient Exploding gradients gradient if its norm is too big
of h0 involves many
factors of W Largest singular value < 1:
(and repeated tanh) Vanishing gradients

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1:


Computing gradient Exploding gradients
of h0 involves many
factors of W Largest singular value < 1:
(and repeated tanh) Change RNN architecture
Vanishing gradients

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Long Short Term Memory (LSTM)

Vanilla RNN LSTM

Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation


1997

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Meet LSTMs

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 46


LSTMs Intuition: Memory
• Cell State / Memory

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 47


LSTMs Intuition: Forget Gate
• Should we continue to remember this “bit” of
information or not?

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 48


LSTMs Intuition: Input Gate
• Should we update this “bit” of information or not?
– If so, with what?

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 49


LSTMs Intuition: Memory Update
• Forget that + memorize this

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 50


LSTMs Intuition: Output Gate
• Should we output this “bit” of information to “deeper”
layers?

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 51


LSTMs Intuition: Additive Updates

Backpropagation from
ct to ct-1 only
elementwise
multiplication by f, no
matrix multiply by W

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 52


LSTMs Intuition: Additive Updates

Uninterrupted gradient flow!

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 53


54
LSTMs Intuition: Additive Updates

Image Credit: Christopher Olah ([Link]


Uninterrupted gradient flow!

Softmax
FC 1000
Pool
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
...
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128 / 2
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
Pool
7x7 conv, 64 / 2
Input

Similar to ResNet!

(C) Dhruv Batra


LSTMs
• A pretty sophisticated cell

(C) Dhruv Batra Image Credit: Christopher Olah ([Link] 55


Neural Image Captioning
Image Embedding (VGGNet)
4096-dim

Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP
+ Non-Linearity + Non-Linearity

(C) Dhruv Batra 56


Neural Image Captioning
Image Embedding (VGGNet)
4096-dim

Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP
+ Non-Linearity + Non-Linearity

(C) Dhruv Batra 57


(C) Dhruv Batra
Image Embedding (VGGNet)
4096-dim

Linear
Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP
+ Non-Linearity + Non-Linearity

RNN

<start>
P(next)

RNN
Two
P(next)

RNN
people
P(next)

RNN
and
P(next)

RNN
two
P(next)
Neural Image Captioning

RNN
P(next)

horses.

58
(C) Dhruv Batra
Image Embedding (VGGNet)
4096-dim

Linear
Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP
+ Non-Linearity + Non-Linearity

RNN

<start>
P(next)

RNN
y1

Two
P(next)

RNN
y2

people
P(next)

RNN
y3

and
P(next)

RNN
y4

two
P(next)
Neural Image Captioning

RNN
y5
P(next)

horses.

59
Sequence Model Factor Graph

y1 y2 y3 y4 y5

..
P (yt | y1 , . . . , yt 1)

(C) Dhruv Batra 60


Beam Search Demo
• [Link]

(C) Dhruv Batra 61


Image Captioning

Figure from Karpathy et a, “Deep


Visual-Semantic Alignments for Generating
Image Descriptions”, CVPR 2015; figure
copyright IEEE, 2015.
Reproduced for educational purposes.

• Many recent works on this:


• Baidu/UCLA: Explain Images with Multimodal Recurrent Neural Networks
• Toronto: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
• Berkeley: Long-term Recurrent Convolutional Networks for Visual Recognition and Description
• Google: Show and Tell: A Neural Image Caption Generator
• Stanford:
Fei-Fei
Fei-Fei Li
Li & Deep Johnson
& Justin
Justin Visual-Semantic
Johnson & Alignments
& Serena Yeungfor GeneratingLecture 10 -
Image Description May 4, 2017
• UML/UT: Translating Videos to Natural Language Using Deep Recurrent Neural Networks
• Microsoft/CMU: Learning a Recurrent Visual Representation for Image Caption Generation
• Microsoft: From Captions to Visual Concepts and Back
Recurrent Neural Network

Convolutional Neural Network

Fei-Fei
Fei-Fei Li
Li &
& Justin
Justin Johnson
Johnson &
& Serena Yeung Lecture 10 - May 4, 2017

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


test image

This image is CC0 public domain


test image
test image

X
test image

x0
<STA
RT>

<START>
test image

y0

before:
h = tanh(Wxh * x + Whh * h)
h0
Wih
now:
h = tanh(Wxh * x + Whh * h + Wih * v)
x0
<STA
RT>

v <START>
test image

y0

sample!
h0

x0
<STA straw
RT>

<START>
test image

y0 y1

h0 h1

x0
<STA straw
RT>

<START>
test image

y0 y1

sample!
h0 h1

x0
<STA straw hat
RT>

<START>
test image

y0 y1 y2

h0 h1 h2

x0
<STA straw hat
RT>

<START>
test image

y0 y1 y2
sample
<END> token
h0 h1 h2 => finish.

x0
<STA straw hat
RT>

<START>
Captions generated using neuraltalk2

Image Captioning: Example Results


All images are CC0 Public domain:
cat suitcase, cat tree, dog, bear,
surfers, tennis, giraffe, motorcycle

A cat sitting on a A cat is sitting on a tree A dog is running in the A white teddy bear sitting in
suitcase on the floor branch grass with a frisbee the grass

Two people walking on A tennis player in action Two giraffes standing in a A man riding a dirt bike on
the beach with surfboards on the court grassy field a dirt track

Fei-Fei
Fei-Fei Li
Li &
& Justin
Justin Johnson
Johnson &
& Serena Yeung Lecture 10 - May 4, 2017

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Captions generated using neuraltalk2

Image Captioning: Failure Cases


All images are CC0 Public domain: fur
coat, handstand, spider web, baseball

A bird is perched on
a tree branch

A woman is holding a
cat in her hand

A man in a
baseball uniform
throwing a ball

A woman standing on a
beach holding a surfboard
A person holding a
computer mouse on a desk

Fei-Fei
Fei-Fei Li
Li &
& Justin
Justin Johnson
Johnson &
& Serena Yeung Lecture 10 - May 4, 2017

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


More Image Captioning Examples

From Captions to Visual Concepts and Back, Hao Fang∗ Saurabh Gupta∗ Forrest Iandola∗ Rupesh K. Srivastava∗, Li Deng Piotr
Dollar, Jianfeng Gao Xiaodong He, Margaret Mitchell John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, CVPR 2015.
Show, Attend and Tell

-
-
-
αt,i

α1,1 α1,2 α1,3


α1,4 α1,5 α1,6
α1,7 α1,8 α1,9

78
αt,i

α1,1 α1,2 α1,3


α1,4 α1,5 α1,6
α1,7 α1,8 α1,9

79
αt,i

α1,1 α1,2 α1,3 α2,1 α2,2 α2,3


α1,4 α1,5 α1,6 α2,4 α2,5 α2,6
α1,7 α1,8 α1,9 α2,7 α2,8 α2,9

80
αt,i

α1,1 α1,2 α1,3 α2,1 α2,2 α2,3 α3,1 α3,2 α3,3


α1,4 α1,5 α1,6 α2,4 α2,5 α2,6 α3,4 α3,5 α3,6
α1,7 α1,8 α1,9 α2,7 α2,8 α2,9 α3,7 α3,8 α3,9

81

● α
82


α

- α

-
-
- Set up as reinforcement learning problem:
- Action = which area to attend to next
- Reward = log-likelihood of caption wrt to target sentence
Examples
How to Evaluate different captions?

[Slide: Narayan-Chen, Xie, Shan]


BLEU (BiLingual Evaluation Understudy)

-
-


-

[Slide: Narayan-Chen, Xie, Shan]


[Slide: Narayan-Chen, Xie, Shan]
METEOR

-
-
-
-

[Slide: Narayan-Chen, Xie, Shan]


CIDEr: Consensus-based Image Description Evaluation
-
-

-


[Slide: Narayan-Chen, Xie, Shan]


...

[Slide: Narayan-Chen, Xie, Shan]

You might also like