0% found this document useful (0 votes)
45 views63 pages

Deep Learning Course Introduction Slides

This document provides an introduction to deep learning through a lecture given by Dr. Sugata Ghosal. It discusses the history and development of neural networks from early associationist models of cognition to modern deep learning techniques. Some key points covered include: - Early neural network models like the perceptron that were inspired by the brain's interconnected neurons. - Breakthroughs achieved by neural networks in applications like image recognition, natural language processing, and more. - The objectives of the course, which are to understand neural network models, design and implement networks, and explore applications. - An overview of early neural network research and the development of the multi-layer perceptron capable of learning complex patterns and
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views63 pages

Deep Learning Course Introduction Slides

This document provides an introduction to deep learning through a lecture given by Dr. Sugata Ghosal. It discusses the history and development of neural networks from early associationist models of cognition to modern deep learning techniques. Some key points covered include: - Early neural network models like the perceptron that were inspired by the brain's interconnected neurons. - Breakthroughs achieved by neural networks in applications like image recognition, natural language processing, and more. - The objectives of the course, which are to understand neural network models, design and implement networks, and explore applications. - An overview of early neural network research and the development of the multi-layer perceptron capable of learning complex patterns and
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Deep Learning

Dr. Sugata Ghosal


BITS Pilani [Link]@[Link]
Pilani Campus
BITS Pilani
Pilani Campus

Lecture No. 1| Introduction


Time: 11 AM – 1 PM
Date:07/05/2022
These slides are assembled by the instructor with grateful acknowledgement of the many
others who made their course materials freely available online.
Agenda

• Introduction
• Course Objectives and Logistics
• Introduction to Perceptron and MLP
• Approximation Capabilities
• Characteristics of Deep Learning
Neural Networks are taking over!

• Neural networks have become one of the


major thrust areas recently in various pattern
recognition, prediction, and analysis problems

• In many problems they have established the


state of the art
– Often exceeding previous benchmarks by large
margins
Breakthroughs with neural
networks
Breakthrough with neural
networks
Image segmentation and
recognition
Image recognition

[Link]
Breakthroughs with neural
networks
Success with neural
networks

• Captions generated entirely by a neural


network
Successes with neural
networks

• And a variety of other problems:


– From art to astronomy to healthcare..
– and even predicting stock markets!
Objectives of this course
• Understanding neural networks
• Comprehending the models that do the previously
mentioned tasks
– And maybe build them
• Design, build and train networks for various tasks

• You will not become an expert in one course


Course objectives: Broad level

Deep Dive into Artificial Neural Networks


• Concepts
– Types of neural networks and underlying ideas
– Learning in neural networks
• Training, concepts, practical issues
– Architectures and applications

• Practical
– Familiarity with training and parameter tuning
– Implement various neural network architectures
• Overall: Set you up for further work in your area
Course learning objectives:
Topics
• Basic network formalisms (for classification and
prediction):
– Multi-Layer Perceptron (MLP)
– Convolutional networks (CNN)
– Recurrent networks (RNN)
• Some advanced formalisms (for creation)
– Generative models: VAEs
– Adversarial models: GANs
• Applications we will touch upon:
– Computer vision: recognizing images
– Text processing: modelling and generating language
– ….
Reading
• List of books on Canvas Course Page
• Primary: [Link]
• “Deep Learning”, Goodfellow, Bengio, Courville

• Reference: [Link]
learning-with-python
• “Deep Learning with Python”, Francois Chollet.

• Additional reading material will be posted on


Canvas, if needed
Logistics
• Most relevant info on Canvas
– Handout
– Schedule of Webinars, Quiz, Assignments, ….
– Lecture Slides
– Lab Sheets
– 3 quizzes, best 2 scores will be taken
– Two Assignments
– Quiz, one assignment before midsem
– One assignment after midsem
– submissions beyond deadline will be deducted some marks / day
(unless medical emergencies)
– Programming using Python, Keras / Tensorflow
Webinars (3-4)

• Held during evenings (around 7:30 PM)


• Will cover details of lab sheet, if
needed, and basic exercises
– Important if you wish to get the maximum out of the course
Questions?

• Please post on Discussions Forum


• TAs and instructors will answer
• Collaborate with your fellow students
So what are neural
networks??

Voice Image
[Link] Transcription [Link] Text caption
signal

Game
[Link] Next move
State

• What are these boxes?


So what are neural
networks??

• It begins with this..


Early Models of Human
Cognition

• Associationism
– Humans learn through association
• 400BC-1900AD: Plato, David Hume, Ivan Pavlov..
Early Models of Human
Cognition

• Associationism
– Humans learn through association
• 400BC-1900AD: Plato, David Hume, Ivan Pavlov..
What are “Associations”

• Lightning is generally followed by thunder


– Ergo – “hey here’s a bolt of lightning, we’re going to hear
thunder”
– Ergo – “We just heard thunder; did someone get hit by
lightning”?

• Association!
Observation: The Brain

• Mid 1800s: The brain is a mass of


interconnected neurons
Brain: Interconnected
Neurons

• Many neurons connect in to each neuron


• Each neuron connects out to many neurons
Connectionist Machines

• Network of processing elements


• All world knowledge is stored in the connections
between the elements
Connectionist Machines

• Neural networks are connectionist machines


– As opposed to Von Neumann Machines

Von Neumann/Princeton Machine Neural Network

PROGRAM
PROCESSOR NETWORK
DATA

Processing Memory
unit

• The machine has many non-linear processing units


– The program is the connections between these units
• Connections may also define memory
Modelling the brain
• What are the units?
• A neuron: Soma

Dendrites
Axon

• Signals come in through the dendrites into the Soma


• A signal goes out via the axon to other neurons
– Only one axon per neuron
• Factoid that may only interest me: Neurons do not undergo cell
division
– Neurogenesis occurs from neuronal stem cells, and is minimal after
birth
Rosenblatt’s perceptron

• Original perceptron model


– Groups of sensors (S) on retina combine onto cells in association
area A1
– Groups of A1 cells combine into Association cells A2
– Signals from A2 cells combine into response cells R
– All connections may be excitatory or inhibitory
Simplified mathematical
model of Perceptron

• Number of inputs combine linearly


– Threshold logic: Fire if combined input exceeds
or equal to threshold

>=
His “Simple” Perceptron
• Originally assumed could represent any Boolean circuit and
perform any logic
– “the embryo of an electronic computer that [the Navy] expects
will be able to walk, talk, see, write, reproduce itself and be
conscious of its existence,” New York Times (8 July) 1958
– “Frankenstein Monster Designed by Navy That Thinks,” Tulsa,
Oklahoma Times 1958
Also provided a learning
algorithm

Sequential Learning:
is the desired output in response to input
is the actual output in response to

• Boolean tasks
• Update the weights whenever the perceptron
output is wrong
• Proved convergence for linearly separable classes
Perceptron
X 1
-1
2 X 0
1

Y
X 1

1
1

Y Values shown on edges are weights,


numbers in the circles are thresholds

• Easily shown to mimic any Boolean gate


• But…
Perceptron

No solution for XOR!


Not universal!
X ?

?
?

• Minsky and Papert, 1968


A single neuron is not
enough

• Individual elements are weak computational elements


– Marvin Minsky and Seymour Papert, 1969, Perceptrons:
An Introduction to Computational Geometry

• Networked elements are required


Multi-layer Perceptron!
X 1

1
-1 1

2
1
1

-1
-1

Y
Hidden Layer
• XOR
– The first layer is a “hidden” layer
– Also originally suggested by Minsky and Papert 1968
A more generic model

2
1 1
0 1
1 -1 1 1

2 2 1 2
1 1 1 -1 1 -1
1 1
1
X Y Z A

• A “multi-layer” perceptron
• Can compose arbitrarily complicated Boolean functions!
– In cognitive terms: Can compute arbitrary Boolean functions over
sensory input
– More on this in the next class
But our brain is not
Boolean

• We have real inputs


• We make non-Boolean inferences/predictions
The perceptron with real
inputs
x1
x2

x3

xN

• x1…xN are realvalued


• w1…wN are realvalued
• Unit “fires” if weighted input exceeds a threshold
The perceptron with real
inputs and a real output
b
x1
x2
x3
i i
sigmoid i

xN

• x1…xN are realvalued


• w1…wN are realvalued
• The output y can also be real valued
– Sometimes viewed as the “probability” of firing
The “real” valued
perceptron
b
x1
x2
f(sum)
x3

xN

• Any real-valued “activation” function may operate on the weighted-


sum input
– We will see several later
– Output will be real valued
• The perceptron maps real-valued inputs to real-valued outputs
• Is useful to continue assuming Boolean outputs though, for interpretation
A Perceptron on Reals

x1

x2

x3
1
x2 w1 x1 + w2 x2 = T

xN
0
x1
i i
i

• A perceptron operates on x2
x1
real-valued vectors
– This is a linear classifier
Boolean functions with a
real perceptron

0,1 1,1 0,1 1,1 0,1 1,1

X X Y

0,0 Y 1,0 0,0 Y 1,0 0,0 X 1,0

• Boolean perceptrons are also linear classifiers


– Purple regions have output 1 in the figures
– What are these functions
– Why can we not compose an XOR?
Composing complicated
“decision” boundaries

x2 Can now be composed into


“networks” to compute arbitrary
classification “boundaries”

x1

• Build a network of units with a single output


that fires if the input is in the coloured area
Booleans over the
reals

x2

x1

x2 x1

• The network must fire if the input is in the


coloured area
Booleans over the
reals

x2

x1

x2 x1

• The network must fire if the input is in the


coloured area
Booleans over the
reals

x2

x1

x2 x1

• The network must fire if the input is in the


coloured area
Booleans over the
reals

x2

x1

x2 x1

• The network must fire if the input is in the


coloured area
Booleans over the
reals

x2

x1

x2 x1

• The network must fire if the input is in the


coloured area
Booleans over the
reals

3 N

i
x2 i=1
4
4 AND
3 3
5
y1 y2 y3 y4 y5
4 x1
4

3 3
4 x2 x1

• The network must fire if the input is in the


coloured area
More complex decision
boundaries

OR

AND AND

x2

x1 x1 x2
• Network to fire if the input is in the yellow area
– “OR” two polygons
– A third layer is required
Complex decision
boundaries

• Can compose very complex decision boundaries


Complex decision
boundaries

784 dimensions
(MNIST)
784 dimensions

• Classification problems: finding decision boundaries in


high-dimensional space
– Can be performed by an MLP
• MLPs can classify real-valued inputs
Story so far
• MLPs are connectionist computational models
– Individual perceptrons are computational equivalent of neurons
– The MLP is a layered composition of many perceptrons

• MLPs can model Boolean functions


– Individual perceptrons can act as Boolean gates
– Networks of perceptrons are Boolean functions

• MLPs are Boolean machines


– They represent Boolean functions over linear boundaries
– They can represent arbitrary decision boundaries
– They can be used to classify data
But what about continuous
valued output?

• Inputs may be real valued


• Can outputs be continuous-valued too?
MLP as a continuous-valued
regression

T1
T1
1 1 f(x)
x + T1 T2 x
1 -1
T2
T2

• A simple 3-unit MLP with a “summing” output unit can


generate a “square pulse” over an input
– Output is 1 only if the input lies between T1 and T2
– T1 and T2 can be arbitrarily specified
MLP as a continuous-valued
regression
ℎ2
ℎ1
ℎn
T1
1
T1 x
1 f(x)
x T1 T 2 x
1 -1
T2 ×ℎ1 +
×ℎn
×ℎ2
T2

• A simple 3-unit MLP can generate a “square pulse” over an input


• An MLP with many units can model an arbitrary function over an input
– To arbitrary precision
• Simply make the individual pulses narrower
• This generalizes to functions of any number of inputs
So what does the perceptron
really model?
• Is there a “semantic” interpretation?
– Cognitive version: Is there an interpretation
beyond the simple characterization as Boolean
functions over sensory inputs?
Lets look at the weights

i i
i
x1
x2
x3
T

xN

• What do the weights tell us?


– The neuron fires if the inner product between the
weights and the inputs exceeds a threshold
The weight as a “template”

𝑻
w
x1
x2
x3 –𝟏

xN

• The perceptron fires if the input is within a specified angle


of the weight
• Neuron fires if the input vector is close enough to the
weight vector.
– If the input pattern matches the weight pattern closely enough
The weight as a template
W

Correlation = 0.57 Correlation = 0.82


1 If Σ 𝑤ixi ≥ 0,
0 otherwise

• If the correlation between the weight pattern


and the inputs exceeds a threshold, fire
• The perceptron is a correlation filter!
The MLP as a Boolean function
over feature detectors
DIGIT OR NOT?

• The input layer comprises “feature detectors”


– Detect if certain patterns have occurred in the input
• The network is a Boolean function over the feature detectors
• I.e. it is important for the first layer to capture relevant patterns
The MLP as a cascade of
feature detectors
DIGIT OR NOT?

• The network is a cascade of feature detectors


– Higher level neurons compose complex templates
from features represented by lower-level neurons

You might also like