Neural Networks
1. Introduction
1
Neural Networks are taking over!
• Neural networks have become one of the main
approaches to AI
• They have been successfully applied to various pattern
recognition, prediction, and analysis problems
• In many problems they have established the state of
the art
– Often exceeding previous benchmarks by large margins
– Sometimes solving problems you couldn’t solve using
earlier ML methods
7
Breakthroughs with neural networks
8
Breakthrough with neural networks
9
Image segmentation and recognition
10
Image recognition
[Link] 11
Breakthroughs with neural networks
12
Success with neural networks
• Captions generated entirely by a neural
network 13
Breakthroughs with neural networks
[Link] uses AI
to generate endless fake faces
– [Link]
fake-people-portraits-thispersondoesnotexist-stylegan 14
Successes with neural networks
• And a variety of other problems:
– From art to astronomy to healthcare..
– and even predicting stock markets!
15
So what are neural networks??
Voice Image
[Link] Transcription [Link] Text caption
signal
Game
[Link] Next move
State
• What are these boxes?
16
So what are neural networks??
• It begins with this..
17
The magical capacity of humans
• Humans can
– Learn
– Solve problems
– Recognize patterns
– Create
– Cogitate
– …
Dante!
• Worthy of emulation
• But how do humans “work“?
19
Cognition and the brain..
• “If the brain was simple enough to be
understood - we would be too simple to
understand it!”
– Marvin Minsky
20
Observation: The Brain
• Mid 1800s: The brain is a mass of
interconnected neurons
28
Brain: Interconnected Neurons
• Many neurons connect in to each neuron
• Each neuron connects out to many neurons
29
Modelling the brain
• What are the units?
• A neuron: Soma
Dendrites
Axon
• Signals come in through the dendrites into the Soma
• A signal goes out via the axon to other neurons
– Only one axon per neuron
• Factoid that may only interest me: Neurons do not undergo cell
division
– Neurogenesis occurs from neuronal stem cells, and is minimal after
birth 45
McCulloch and Pitts
• The Doctor and the Hobo..
– Warren McCulloch: Neurophysiologist
– Walter Pitts: Homeless wannabe logician who
arrived at his door
46
Perceptron: Simplified model
• Number of inputs combine linearly
– Threshold logic: Fire if combined input exceeds threshold
60
The Universal Model
• Originally assumed could represent any Boolean circuit and
perform any logic
– “the embryo of an electronic computer that [the Navy] expects
will be able to walk, talk, see, write, reproduce itself and be
conscious of its existence,” New York Times (8 July) 1958
– “Frankenstein Monster Designed by Navy That Thinks,” Tulsa,
Oklahoma Times 1958
61
Perceptron
X 1
-1
2 X 0
1
Y
X 1
1
1
Y Values shown on edges are weights,
numbers in the circles are thresholds
• Easily shown to mimic any Boolean gate
• But…
63
Perceptron
No solution for XOR!
Not universal!
X ?
?
?
• Minsky and Papert, 1968
64
A single neuron is not enough
• Individual elements are weak computational elements
– Marvin Minsky and Seymour Papert, 1969, Perceptrons:
An Introduction to Computational Geometry
• Networked elements are required
65
Multi-layer Perceptron!
X 1
1
-1 1
2
1
1
-1
-1
Y
Hidden Layer
• XOR
– The first layer is a “hidden” layer
– Also originally suggested by Minsky and Papert 1968
66
A more generic model
21
1 1
01 1
1 -1 1 1
21 21 1 21
1 1 1 -1 1 -1
1 1
1
X Y Z A
• A “multi-layer” perceptron
• Can compose arbitrarily complicated Boolean functions!
– In cognitive terms: Can compute arbitrary Boolean functions over
sensory input
– More on this in the next class
67
Story so far
• Neural networks began as computational models of the brain
• Neural network models are connectionist machines
– The comprise networks of neural units
• McCullough and Pitt model: Neurons as Boolean threshold units
– Models the brain as performing propositional logic
– But no learning rule
• Hebb’s learning rule: Neurons that fire together wire together
– Unstable
• Rosenblatt’s perceptron : A variant of the McCulloch and Pitt neuron with
a provably convergent learning rule
– But individual perceptrons are limited in their capacity (Minsky and Papert)
• Multi-layer perceptrons can model arbitrarily complex Boolean functions
68
But our brain is not Boolean
• We have real inputs
• We make non-Boolean inferences/predictions
69
The perceptron with real inputs
x1
x2
x3
xN
• x1…xN are real valued
• w1…wN are real valued
• Unit “fires” if weighted input matches (or exceeds)
a threshold
70
The perceptron with real inputs
x1
x2
x3
xN
• Alternate view:
– A threshold “activation” operates on the weighted sum of inputs
plus a bias
• An affine function of the inputs
– outputs a 1 if z is non-negative, 0 otherwise
• Unit “fires” if weighted input matches or exceeds a threshold
71
The perceptron with real inputs
and a real output
b
x1
x2
x3
sigmoid
xN
• x1…xN are real valued
• w1…wN are real valued
• The output y can also be real valued
– Sometimes viewed as the “probability” of firing
72
The “real” valued perceptron
b
x1
x2
f(sum)
x3
xN
• Any real-valued “activation” function may operate on the affine
function of the input
– We will see several later
– Output will be real valued
• The perceptron maps real-valued inputs to real-valued outputs
• Is useful to continue assuming Boolean outputs though, for interpretation
73
A Perceptron on Reals
1
x1
x2
x3
x2 w1x1+w2x2=T
xN
0
x1
• A perceptron operates on x2
x1
real-valued vectors
– This is a linear classifier 74
Boolean functions with a real
perceptron
0,1 1,1 0,1 1,1 0,1 1,1
x1 x1 x1
0,0 x2 1,0 0,0 x2 1,0 0,0 x2 1,0
• Boolean perceptrons are also linear classifiers
– Purple regions have output 1 in the figures
– What are these functions
– Why can we not compose an XOR?
75
Composing complicated “decision”
boundaries
x2 Can now be composed into
“networks” to compute arbitrary
classification “boundaries”
x1
• Build a network of units with a single output
that fires if the input is in the coloured area
76
Booleans over the reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
77
Booleans over the reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
78
Booleans over the reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
79
Booleans over the reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
80
Booleans over the reals
x2
x1
x2 x1
• The network must fire if the input is in the
coloured area
81
Booleans over the reals
3
x2
4
4 AND
3 3
5
y1 y2 y3 y4 y5
4 x1
4
3 3
4 x2 x1
• The network must fire if the input is in the
coloured area
82
More complex decision boundaries
OR
AND AND
x2
x1 x1 x2
• Network to fire if the input is in the yellow area
– “OR” two polygons
– A third layer is required 83
Complex decision boundaries
• Can compose very complex decision boundaries
– How complex exactly? More on this in the next class
84
Complex decision boundaries
784 dimensions
(MNIST)
784 dimensions
• Classification problems: finding decision boundaries in
high-dimensional space
– Can be performed by an MLP
• MLPs can classify real-valued inputs 85
Story so far
• MLPs are connectionist computational models
– Individual perceptrons are computational equivalent of neurons
– The MLP is a layered composition of many perceptrons
• MLPs can model Boolean functions
– Individual perceptrons can act as Boolean gates
– Networks of perceptrons are Boolean functions
• MLPs are Boolean machines
– They represent Boolean functions over linear boundaries
– They can represent arbitrary decision boundaries
– They can be used to classify data
86
But what about continuous valued
outputs?
• Inputs may be real-valued
• Can outputs be continuous-valued too?
88
Other things MLPs can do
• Model memory
– Loopy networks can “remember” patterns
• Proposed by Lawrence Kubie in 1930, as a
model for memory in the CNS
• Represent probability distributions
– Over integer, real and complex-valued
domains
– MLPs can model both a posteriori and a
priori distributions of data
• A posteriori conditioned on other variables
– MLPs can generate data from complicated,
or even unknown distributions
• They can rub their stomachs and pat their
heads at the same time..
93
NNets in AI
• The network is a function
– Given an input, it computes the function layer
wise to predict an output
• More generally, given one or more inputs, predicts one
or more outputs
94
These tasks are functions
Voice Image
[Link] Transcription [Link] Text caption
signal
Game
[Link] Next move
State
• Each of these boxes is actually a function
– E.g f: Image Caption
95
These tasks are functions
Voice Image
Transcription Text caption
signal
Game
State Next move
• Each box is actually a function
– E.g f: Image Caption
– It can be approximated by a neural network 96
Story so far
• Multi-layer perceptrons are connectionist
computational models
• MLPs are classification engines
• MLP can also model continuous valued
functions
• Interesting AI tasks are functions that can be
modelled by the network
97
Next Up
• More on neural networks as universal
approximators
– And the issue of depth in networks
98