Deep Learning
Dr. Sugata Ghosal
BITS Pilani [Link]@[Link]
Pilani Campus
BITS Pilani
Pilani Campus
Lecture No. 1| Introduction
Time: 11 AM – 1 PM
Date:07/05/2022
These slides are assembled by the instructor with grateful acknowledgement of the many
others who made their course materials freely available online.
Agenda
• Introduction
• Course Objectives and Logistics
• Introduction to Perceptron and MLP
• Approximation Capabilities
• Characteristics of Deep Learning
Neural Networks are taking over!
• Neural networks have become one of the
major thrust areas recently in various pattern
recognition, prediction, and analysis problems
• In many problems they have established the
state of the art
– Often exceeding previous benchmarks by large
margins
Breakthroughs with neural
networks
Breakthrough with neural
networks
Image segmentation and
recognition
Image recognition
[Link]
Breakthroughs with neural
networks
Success with neural
networks
• Captions generated entirely by a neural
network
Successes with neural
networks
• And a variety of other problems:
– From art to astronomy to healthcare..
– and even predicting stock markets!
Objectives of this course
• Understanding neural networks
• Comprehending the models that do the previously
mentioned tasks
– And maybe build them
• Design, build and train networks for various tasks
• You will not become an expert in one course
Course objectives: Broad level
Deep Dive into Artificial Neural Networks
• Concepts
– Types of neural networks and underlying ideas
– Learning in neural networks
• Training, concepts, practical issues
– Architectures and applications
• Practical
– Familiarity with training and parameter tuning
– Implement various neural network architectures
• Overall: Set you up for further work in your area
Course learning objectives:
Topics
• Basic network formalisms (for classification and
prediction):
– Multi-Layer Perceptron (MLP)
– Convolutional networks (CNN)
– Recurrent networks (RNN)
• Some advanced formalisms (for creation)
– Generative models: VAEs
– Adversarial models: GANs
• Applications we will touch upon:
– Computer vision: recognizing images
– Text processing: modelling and generating language
– ….
Reading
• List of books on Canvas Course Page
• Primary: [Link]
• “Deep Learning”, Goodfellow, Bengio, Courville
• Reference: [Link]
learning-with-python
• “Deep Learning with Python”, Francois Chollet.
• Additional reading material will be posted on
Canvas, if needed
Logistics
• Most relevant info on Canvas
– Handout
– Schedule of Webinars, Quiz, Assignments, ….
– Lecture Slides
– Lab Sheets
– 3 quizzes, best 2 scores will be taken
– Two Assignments
– Quiz, one assignment before midsem
– One assignment after midsem
– submissions beyond deadline will be deducted some marks / day
(unless medical emergencies)
– Programming using Python, Keras / Tensorflow
Webinars (3-4)
• Held during evenings (around 7:30 PM)
• Will cover details of lab sheet, if
needed, and basic exercises
– Important if you wish to get the maximum out of the course
Questions?
• Please post on Discussions Forum
• TAs and instructors will answer
• Collaborate with your fellow students
So what are neural
networks??
Voice Image
[Link] Transcription [Link] Text caption
signal
Game
[Link] Next move
State
• What are these boxes?
So what are neural
networks??
• It begins with this..
Early Models of Human
Cognition
• Associationism
– Humans learn through association
• 400BC-1900AD: Plato, David Hume, Ivan Pavlov..
Early Models of Human
Cognition
• Associationism
– Humans learn through association
• 400BC-1900AD: Plato, David Hume, Ivan Pavlov..
What are “Associations”
• Lightning is generally followed by thunder
– Ergo – “hey here’s a bolt of lightning, we’re going to hear
thunder”
– Ergo – “We just heard thunder; did someone get hit by
lightning”?
• Association!
Observation: The Brain
• Mid 1800s: The brain is a mass of
interconnected neurons
Brain: Interconnected
Neurons
• Many neurons connect in to each neuron
• Each neuron connects out to many neurons
Connectionist Machines
• Network of processing elements
• All world knowledge is stored in the connections
between the elements
Connectionist Machines
• Neural networks are connectionist machines
– As opposed to Von Neumann Machines
Von Neumann/Princeton Machine Neural Network
PROGRAM
PROCESSOR NETWORK
DATA
Processing Memory
unit
• The machine has many non-linear processing units
– The program is the connections between these units
• Connections may also define memory
Modelling the brain
• What are the units?
• A neuron: Soma
Dendrites
Axon
• Signals come in through the dendrites into the Soma
• A signal goes out via the axon to other neurons
– Only one axon per neuron
• Factoid that may only interest me: Neurons do not undergo cell
division
– Neurogenesis occurs from neuronal stem cells, and is minimal after
birth
Rosenblatt’s perceptron
• Original perceptron model
– Groups of sensors (S) on retina combine onto cells in association
area A1
– Groups of A1 cells combine into Association cells A2
– Signals from A2 cells combine into response cells R
– All connections may be excitatory or inhibitory
Simplified mathematical
model of Perceptron
• Number of inputs combine linearly
– Threshold logic: Fire if combined input exceeds
or equal to threshold
>=
His “Simple” Perceptron
• Originally assumed could represent any Boolean circuit and
perform any logic
– “the embryo of an electronic computer that [the Navy] expects
will be able to walk, talk, see, write, reproduce itself and be
conscious of its existence,” New York Times (8 July) 1958
– “Frankenstein Monster Designed by Navy That Thinks,” Tulsa,
Oklahoma Times 1958
Also provided a learning
algorithm
Sequential Learning:
is the desired output in response to input
is the actual output in response to
• Boolean tasks
• Update the weights whenever the perceptron
output is wrong
• Proved convergence for linearly separable classes
Perceptron
X 1
-1
2 X 0
1
Y
X 1
1
1
Y Values shown on edges are weights,
numbers in the circles are thresholds
• Easily shown to mimic any Boolean gate
• But…
Perceptron
No solution for XOR!
Not universal!
X ?
?
?
• Minsky and Papert, 1968
A single neuron is not
enough
• Individual elements are weak computational elements
– Marvin Minsky and Seymour Papert, 1969, Perceptrons:
An Introduction to Computational Geometry
• Networked elements are required
Multi-layer Perceptron!
X 1
1
-1 1
2
1
1
-1
-1
Y
Hidden Layer
• XOR
– The first layer is a “hidden” layer
– Also originally suggested by Minsky and Papert 1968
A more generic model
2
1 1
0 1
1 -1 1 1
2 2 1 2
1 1 1 -1 1 -1
1 1
1
X Y Z A
• A “multi-layer” perceptron
• Can compose arbitrarily complicated Boolean functions!
– In cognitive terms: Can compute arbitrary Boolean functions over
sensory input
– More on this in the next class
But our brain is not
Boolean
• We have real inputs
• We make non-Boolean inferences/predictions
The perceptron with real
inputs
x1
x2
x3
xN
• x1…xN are realvalued
• w1…wN are realvalued
• Unit “fires” if weighted input exceeds a threshold
The perceptron with real
inputs and a real output
b
x1
x2
x3
i i
sigmoid i
xN
• x1…xN are realvalued
• w1…wN are realvalued
• The output y can also be real valued
– Sometimes viewed as the “probability” of firing
The “real” valued
perceptron
b
x1
x2
f(sum)
x3
xN
• Any real-valued “activation” function may operate on the weighted-
sum input
– We will see several later
– Output will be real valued
• The perceptron maps real-valued inputs to real-valued outputs
• Is useful to continue assuming Boolean outputs though, for interpretation
A Perceptron on Reals
x1
x2
x3
1
x2 w1 x1 + w2 x2 = T
xN
0
x1
i i
i
• A perceptron operates on x2
x1
real-valued vectors
– This is a linear classifier
Boolean functions with a
real perceptron
0,1 1,1 0,1 1,1 0,1 1,1
X X Y
0,0 Y 1,0 0,0 Y 1,0 0,0 X 1,0
• Boolean perceptrons are also linear classifiers
– Purple regions have output 1 in the figures
– What are these functions
– Why can we not compose an XOR?
Composing complicated
“decision” boundaries
x2 Can now be composed into
“networks” to compute arbitrary
classification “boundaries”
x1
• Build a network of units with a single output
that fires if the input is in the coloured area
Booleans over the
reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
Booleans over the
reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
Booleans over the
reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
Booleans over the
reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
Booleans over the
reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
Booleans over the
reals
3 N
i
x2 i=1
4
4 AND
3 3
5
y1 y2 y3 y4 y5
4 x1
4
3 3
4 x2 x1
• The network must fire if the input is in the
coloured area
More complex decision
boundaries
OR
AND AND
x2
x1 x1 x2
• Network to fire if the input is in the yellow area
– “OR” two polygons
– A third layer is required
Complex decision
boundaries
• Can compose very complex decision boundaries
Complex decision
boundaries
784 dimensions
(MNIST)
784 dimensions
• Classification problems: finding decision boundaries in
high-dimensional space
– Can be performed by an MLP
• MLPs can classify real-valued inputs
Story so far
• MLPs are connectionist computational models
– Individual perceptrons are computational equivalent of neurons
– The MLP is a layered composition of many perceptrons
• MLPs can model Boolean functions
– Individual perceptrons can act as Boolean gates
– Networks of perceptrons are Boolean functions
• MLPs are Boolean machines
– They represent Boolean functions over linear boundaries
– They can represent arbitrary decision boundaries
– They can be used to classify data
But what about continuous
valued output?
• Inputs may be real valued
• Can outputs be continuous-valued too?
MLP as a continuous-valued
regression
T1
T1
1 1 f(x)
x + T1 T2 x
1 -1
T2
T2
• A simple 3-unit MLP with a “summing” output unit can
generate a “square pulse” over an input
– Output is 1 only if the input lies between T1 and T2
– T1 and T2 can be arbitrarily specified
MLP as a continuous-valued
regression
ℎ2
ℎ1
ℎn
T1
1
T1 x
1 f(x)
x T1 T 2 x
1 -1
T2 ×ℎ1 +
×ℎn
×ℎ2
T2
• A simple 3-unit MLP can generate a “square pulse” over an input
• An MLP with many units can model an arbitrary function over an input
– To arbitrary precision
• Simply make the individual pulses narrower
• This generalizes to functions of any number of inputs
So what does the perceptron
really model?
• Is there a “semantic” interpretation?
– Cognitive version: Is there an interpretation
beyond the simple characterization as Boolean
functions over sensory inputs?
Lets look at the weights
i i
i
x1
x2
x3
T
xN
• What do the weights tell us?
– The neuron fires if the inner product between the
weights and the inputs exceeds a threshold
The weight as a “template”
𝑻
w
x1
x2
x3 –𝟏
xN
• The perceptron fires if the input is within a specified angle
of the weight
• Neuron fires if the input vector is close enough to the
weight vector.
– If the input pattern matches the weight pattern closely enough
The weight as a template
W
Correlation = 0.57 Correlation = 0.82
1 If Σ 𝑤ixi ≥ 0,
0 otherwise
• If the correlation between the weight pattern
and the inputs exceeds a threshold, fire
• The perceptron is a correlation filter!
The MLP as a Boolean function
over feature detectors
DIGIT OR NOT?
• The input layer comprises “feature detectors”
– Detect if certain patterns have occurred in the input
• The network is a Boolean function over the feature detectors
• I.e. it is important for the first layer to capture relevant patterns
The MLP as a cascade of
feature detectors
DIGIT OR NOT?
• The network is a cascade of feature detectors
– Higher level neurons compose complex templates
from features represented by lower-level neurons