0% found this document useful (0 votes)

56 views47 pages

Understanding Convolutional Neural Networks

Convolutional neural networks (CNNs) apply learned filters via convolution to images to extract visual features at different levels of abstraction, from low-level edges to mid-level object parts to high-level objects and scenes. CNNs share parameters across their convolutional filters to learn features directly from data in a hierarchical fashion. Modern CNN architectures have millions of parameters and dozens of layers, applying techniques like residual connections to enable very deep networks for complex tasks like image classification.

Uploaded by

kirti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views47 pages

Understanding Convolutional Neural Networks

Uploaded by

kirti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Today: Convolutional Neural Networks (CNNs)

1. Scene understanding and object recognition for machines (and humans)

– Scene/object recognition challenge. Illusions reveal primitives, conflicting info
– Human neurons/circuits. Visual cortex layers==abstraction. General cognition
2. Classical machine vision foundations: features, scenes, filters, convolution
– Spatial structure primitives: edge detectors & other filters, feature recognition
– Convolution: basics, padding, stride, object recognition, architectures
3. CNN foundations: LeNet, de novo feature learning, parameter sharing
– Key ideas: learn features, hierarchy, re-use parameters, back-prop filter learning
– CNN formalization: representations(Conv+ReLU+Pool)*N layers + Fully-connected
4. Modern CNN architectures: millions of parameters, dozens of layers
– Feature invariance is hard: apply perturbations, learn for each variation
– ImageNet progression of best performers
– AlexNet: First top performer CNN, 60M parameters (from 60k in LeNet-5), ReLU
– VGGNet: simpler but deeper (819 layers), 140M parameters, ensembles
– GoogleNet: new primitive=inception module, 5M params, no FC, efficiency
– ResNet: 152 layers, vanishing gradients  fit residuals to enable learning
5. Countless applications: General architecture, enormous power
– Semantic segmentation, facial detection/recognition, self-driving, image
colorization, optimizing pictures/scenes, up-scaling, medicine, biology, genomics
2a. Spatial structure
for image recognition
Using Spatial Structure

Input: 2D Idea: connect

image. patches of input to
Array of pixel neurons in hidden
values layer.
Neuron connected
to region of input.
Only “sees”these
values.
Using Spatial Structure

Connect patch in input layer to a single neuron in subsequent layer.

Use a sliding window to define connections.
How can we weight the patch to detect particular features?
Feature Extraction with Convolution
- Filter of size 4x4 : 16 different weights
- Apply this same filter to 4x4 patches in input
- Shift by 2 pixels for next patch

This “patchy” operation is convolution

1) Apply a set of weights – a filter – to extract local features

2) Use multiple filters to extract different features

3) Spatially share parameters of each filter

Fully Connected Neural Network

Input: Fully Connected:

• 2D image • Each neuron in
• Vector of pixel hidden layer
values connected to all
neurons in input
layer
• No spatial information
• Many, many
parameters

Key idea: Use spatial structure in input to inform architecture

of the network
High Level Feature Detection

Let’s identify key features in each image category

Nose, Eyes,Mouth Wheels, License Plate, Door,Windows,Steps

Headlights
Fully Connected Neural Network
2b. Convolutions and filters
Convolution operation is element wise
multiply and add

Filter / Kernel
Producing Feature Maps

Original Sharpen Edge Detect “Strong” Edge

Detect
A simple pattern: Edges
How can we detect edges with a kernel?

Input

-1 -1 Output
Filter

(Goodfellow 2016)
Simple Kernels / Filters
X or X?

Image is represented as matrix of pixel values… and computers are literal!

We want to be able to classify an X as an X even if it’s shifted, shrunk, rotated, deformed.

Rohrer How do CNNs work?

There are three approaches to edge cases in
convolution
Zero Padding Controls Output Size
(Goodfellow 2016)

• Same convolution: zero pad input so output • Valid-only convolution: output only when
is same size as input dimensions entire kernel contained in input (shrinks output)
• Full convolution: zero pad input so output is produced whenever an output value
contains at least one input value (expands output)

x = [Link].conv2d(x, W, strides=[1,strides,strides,1],padding='SAME')

• TF convolution operator takes stride and zero fill option as parameters

• Stride is distance between kernel applications in each dimension
• Padding can be SAME or VALID
Today: Convolutional Neural Networks (CNNs)
1. Scene understanding and object recognition for machines (and humans)
– Scene/object recognition challenge. Illusions reveal primitives, conflicting info
– Human neurons/circuits. Visual cortex layers==abstraction. General cognition
2. Classical machine vision foundations: features, scenes, filters, convolution
– Spatial structure primitives: edge detectors & other filters, feature recognition
– Convolution: basics, padding, stride, object recognition, architectures
3. CNN foundations: LeNet, de novo feature learning, parameter sharing
– Key ideas: learn features, hierarchy, re-use parameters, back-prop filter learning
– CNN formalization: representations(Conv+ReLU+Pool)*N layers + Fully-connected
4. Modern CNN architectures: millions of parameters, dozens of layers
– Feature invariance is hard: apply perturbations, learn for each variation
– ImageNet progression of best performers
– AlexNet: First top performer CNN, 60M parameters (from 60k in LeNet-5), ReLU
– VGGNet: simpler but deeper (819 layers), 140M parameters, ensembles
– GoogleNet: new primitive=inception module, 5M params, no FC, efficiency
– ResNet: 152 layers, vanishing gradients  fit residuals to enable learning
5. Countless applications: General architecture, enormous power
– Semantic segmentation, facial detection/recognition, self-driving, image
colorization, optimizing pictures/scenes, up-scaling, medicine, biology, genomics
3a. Learning Visual Features
de novo
Key idea:
learn hierarchy of features
directly from the data
(rather than hand-engineering them)

Low level features Mid level features High level features

Edges, dark spots Eyes, ears,nose Facial structure

Lee+ ICML 2009

Key idea: re-use parameters
Convolution shares parameters
Example 3x3 convolution on a 5x5 image
Feature Extraction with Convolution

1) Apply a set of weights – a filter – to extract local features

2) Use multiple filters to extract different features
3) Spatially share parameters of each filter
LeNet-5
• Gradient Based Learning Applied To Document Recognition -
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner; 1998
• Helped establish how we use CNNs today
• Replaced manual feature extraction

[LeCun et al., 1998]

LeNet-5
conv avg pool conv avg pool
...
5×5 f=2 5×5 f=2
s=1 s=2 s=1 s=2
32×32×1 28×28×6 14×14×6 10×10×16

FC FC
... 𝑦𝑦�
⋮ ⋮
10
5×5×16
120 84 Reminder:
Output size = (N+2P-F)/stride + 1
This slide is taken from Andrew Ng [LeCun et al., 1998]
LeNet-5
• Only 60K parameters
• As we go deeper in the network: 𝑁𝑁𝐻𝐻 ↓, 𝑁𝑁𝑊𝑊 ↓, 𝑁𝑁𝐶𝐶 ↑
• General structure:
conv->pool->conv->pool->FC->FC->output

• Different filters look at different channels

• Sigmoid and Tanh nonlinearity

[LeCun et al., 1998]

Backpropagation of convolution

Slide taken from Forward And Backpropagation in Convolutional Neural Network. - Medium
3b. Convolutional Neural
Networks (CNNs)
An image classification CNN
Representation Learning in Deep CNNs

Low level features Mid level features High level features

Edges, dark spots Eyes, ears,nose Facial structure

Conv Layer 1 Conv Layer 2 Conv Layer 3

Lee+ ICML 2009

CNNs for Classification

1. Convolution:Apply filters to generate feature maps.

2. Non-linearity: Often ReLU.
3. Pooling: Downsampling operation on each feature map.
[Link].Conv2
Train model with image data. D
Learn weights of filters in convolutional layers. [Link].
*
[Link].MaxPool2
D
Example – Six convolutional layers
Convolutional Layers: Local Connectivity

[Link].
Conv2D

For a neuron in
hidden layer:
- Take inputs from patch
- Compute weighted
sum
- Apply bias
Convolutional Layers: Local Connectivity

[Link].Conv2D

For a neuron in hidden layer:

• Take inputs from patch
• Compute weighted sum
• Apply bias

4x4 filter:
1) applying a window of weights
matrix of 2) computing linear combinations
weights wij for neuron (p,q) in hidden layer 3) activating with non-linear function
CNNs: Spatial Arrangement of Output
Volume
depth
Layer Dimensions:
ℎ  w d
where h and w are spatial
dimensions d (depth) = number of
height filters

Stride:
Filter step size

Receptive Field:
width Locations in input image
that a node is path
connected to
[Link].Conv2D( filters=d, kernel_size=(h,w), strides=s )
Introducing Non-Linearity
- Apply after every convolution operation
(i.e., after convolutional layers) Rectified Linear Unit
- ReLU: pixel-by-pixel operation that replaces (ReLU)
all negative values by zero.
- Non-linear operation

[Link]

Karn Intuitive CNNs

Pooling

[Link]
Pool2D(
pool_size=(2,2),
) strides=2 1) Reduced
dimensionality
2) Spatial invariance

Max Pooling, average pooling

The REctified Linear Unit (RELU) is a common
non-linear detector stage after convolution

x = [Link].conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')

x = [Link].bias_add(x, b)
x= [Link](x)

f(x) = max(0, x)
When will we backpropagate through this?
Once it “dies” what happens to it?
Pooling reduces dimensionality by giving up
spatial location
• max pooling reports the maximum output
within a defined neighborhood
• Padding can be SAME or VALID
x = [Link].max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME')

Output Input Pooling Batch H W Input channel

Neighborhood
[batch, height, width, channels]
Dilated Convolution
CNNs for Classification: Feature Learning

91
1. Learn features in input image through convolution
2. Introduce non-linearity through activation function (real-world data is
non-linear!)
3. Reduce dimensionality and preserve spatial invariance with pooling
CNNs for Classification: Class Probabilities

- CONV and POOL layers output high-level features of input

- Fully connected layer uses these features for classifying input image
- Express output as probability of image belonging to a particular class
Putting it all together
import tensorflow as tf

def generate_model():
model = [Link]([
# first convolutional layer
[Link].Conv2D(32, filter_size=3, activation='relu’),
[Link].MaxPool2D(pool_size=2, strides=2),

# second convolutional layer

[Link].Conv2D(64, filter_size=3, activation='relu’),
[Link].MaxPool2D(pool_size=2, strides=2),

# fully connected classifier

[Link](),
[Link](1024, activation='relu’),
[Link](10, activation=‘softmax’)
# 10 outputs

])
return model
Today: Convolutional Neural Networks (CNNs)
1. Scene understanding and object recognition for machines (and humans)
– Scene/object recognition challenge. Illusions reveal primitives, conflicting info
– Human neurons/circuits. Visual cortex layers==abstraction. General cognition
2. Classical machine vision foundations: features, scenes, filters, convolution
– Spatial structure primitives: edge detectors & other filters, feature recognition
– Convolution: basics, padding, stride, object recognition, architectures
3. CNN foundations: LeNet, de novo feature learning, parameter sharing
– Key ideas: learn features, hierarchy, re-use parameters, back-prop filter learning
– CNN formalization: representations(Conv+ReLU+Pool)*N layers + Fully-connected
4. Modern CNN architectures: millions of parameters, dozens of layers
– Feature invariance is hard: apply perturbations, learn for each variation
– ImageNet progression of best performers
– AlexNet: First top performer CNN, 60M parameters (from 60k in LeNet-5), ReLU
– VGGNet: simpler but deeper (819 layers), 140M parameters, ensembles
– GoogleNet: new primitive=inception module, 5M params, no FC, efficiency
– ResNet: 152 layers, vanishing gradients  fit residuals to enable learning
5. Countless applications: General architecture, enormous power
– Semantic segmentation, facial detection/recognition, self-driving, image
colorization, optimizing pictures/scenes, up-scaling, medicine, biology, genomics
4a. Real-world feature invariance is
hard
How can computers recognize objects?
How can computers recognize objects?

Challenge:
• Objects can be anywhere in the scene, in any orientation, rotation, color hue, etc.
• How can we overcome this challenge?
Answer:
• Learn a ton of features (millions) from the bottom up
• Learn the convolutional filters, rather than pre-computing them
Feature invariance to perturbation is hard

Detect
features
to
classify

Li/Johnson/Yeung C231n

Notes Unit 3 Convolution Network
No ratings yet
Notes Unit 3 Convolution Network
39 pages
Deep Learning: Convolutional Neural Networks
No ratings yet
Deep Learning: Convolutional Neural Networks
71 pages
Unit IV Convolution Neural Networks - PPTX - Google Slides
No ratings yet
Unit IV Convolution Neural Networks - PPTX - Google Slides
61 pages
CNN Architecture and Functionality Overview
No ratings yet
CNN Architecture and Functionality Overview
66 pages
Neural Networks: Training & Architecture
No ratings yet
Neural Networks: Training & Architecture
64 pages
CNN Basics for Image Classification
No ratings yet
CNN Basics for Image Classification
7 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
37 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
16 pages
DL Unit-IV
No ratings yet
DL Unit-IV
61 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
9 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
33 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
82 pages
Introduction to Convolutional Neural Networks
No ratings yet
Introduction to Convolutional Neural Networks
21 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
34 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
11 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
80 pages
Deep Neural Networks Explained
No ratings yet
Deep Neural Networks Explained
10 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
27 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
21 pages
Back Propagation Network & CNN
No ratings yet
Back Propagation Network & CNN
35 pages
Chapter # 4
No ratings yet
Chapter # 4
56 pages
Understanding Convolutional Neural Networks
100% (1)
Understanding Convolutional Neural Networks
9 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
26 pages
Understanding Convolutional Networks
No ratings yet
Understanding Convolutional Networks
37 pages
CNNs and RNNs in Deep Learning
No ratings yet
CNNs and RNNs in Deep Learning
19 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
9 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
10 pages
Deep Learning: Convolutional Neural Networks
No ratings yet
Deep Learning: Convolutional Neural Networks
47 pages
CNN
No ratings yet
CNN
10 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
30 pages
Scan 30 Sep 23 18 20 44
No ratings yet
Scan 30 Sep 23 18 20 44
30 pages
Deep Learning Fundamentals and CNNs
No ratings yet
Deep Learning Fundamentals and CNNs
61 pages
Notes - 20 - "Convolutional and Recurrent Neural Networks Architectures, Working, and Applications"
No ratings yet
Notes - 20 - "Convolutional and Recurrent Neural Networks Architectures, Working, and Applications"
23 pages
Module 3
No ratings yet
Module 3
58 pages
Computer Vision: CNN Architectures Explained
No ratings yet
Computer Vision: CNN Architectures Explained
59 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
41 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
21 pages
Understanding Convolutional Layers in CNNs
No ratings yet
Understanding Convolutional Layers in CNNs
36 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
31 pages
DL Unit 2
No ratings yet
DL Unit 2
43 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
48 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
64 pages
CNN Applications in Vision and NLP
No ratings yet
CNN Applications in Vision and NLP
22 pages
CNN for Computer Vision Techniques
No ratings yet
CNN for Computer Vision Techniques
43 pages
Understanding CNNs in Deep Learning
No ratings yet
Understanding CNNs in Deep Learning
8 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
77 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
22 pages
Supervised Deep Learning Basics: CNNs
No ratings yet
Supervised Deep Learning Basics: CNNs
29 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
11 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
22 pages
CNN Techniques for Feature Map Reduction
No ratings yet
CNN Techniques for Feature Map Reduction
34 pages
Parameter Sharing in CNNs Explained
No ratings yet
Parameter Sharing in CNNs Explained
16 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
28 pages
CNN Basics for Computer Vision
No ratings yet
CNN Basics for Computer Vision
42 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
80 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
7 pages
Module 5 - Neural Network Models
No ratings yet
Module 5 - Neural Network Models
14 pages
DES Encryption Overview and Analysis
No ratings yet
DES Encryption Overview and Analysis
46 pages
Algebraic Structures in Cryptography
No ratings yet
Algebraic Structures in Cryptography
62 pages
DRM Notes
No ratings yet
DRM Notes
23 pages
Thermo All Notes
No ratings yet
Thermo All Notes
63 pages
Overview of Supercritical Boilers
No ratings yet
Overview of Supercritical Boilers
21 pages
Critical Points of a Multivariable Function
No ratings yet
Critical Points of a Multivariable Function
1 page
Overview of Block Cipher Modes
No ratings yet
Overview of Block Cipher Modes
19 pages
Digital Design Laboratory Manual 2021-22
No ratings yet
Digital Design Laboratory Manual 2021-22
38 pages
Java Traffic Light Simulation
No ratings yet
Java Traffic Light Simulation
4 pages
Asymmetric Key Cryptography Overview
No ratings yet
Asymmetric Key Cryptography Overview
16 pages
Subnetting Strategies for IP Addresses
No ratings yet
Subnetting Strategies for IP Addresses
3 pages
Module 4
No ratings yet
Module 4
53 pages
Eitca Exam Mat Eitc-Ai-Dlpp
No ratings yet
Eitca Exam Mat Eitc-Ai-Dlpp
148 pages
Machine Learning with ANN: Regression & Classification
No ratings yet
Machine Learning with ANN: Regression & Classification
75 pages
Deep Learning Techniques Overview
No ratings yet
Deep Learning Techniques Overview
84 pages
Intelligent Detection of Mobile Malware
No ratings yet
Intelligent Detection of Mobile Malware
10 pages
Deep Learning for NLP: Transformers & RNNs
No ratings yet
Deep Learning for NLP: Transformers & RNNs
65 pages
Deep Learning Challenges and Solutions
No ratings yet
Deep Learning Challenges and Solutions
25 pages
PyTorch Tensors for Deep Learning Guide
No ratings yet
PyTorch Tensors for Deep Learning Guide
5 pages
Business Intelligence & Analytics Course
No ratings yet
Business Intelligence & Analytics Course
5 pages
Overview of Recurrent Neural Networks
No ratings yet
Overview of Recurrent Neural Networks
11 pages
Importance of Activation Functions
No ratings yet
Importance of Activation Functions
8 pages
Architecture of LSTM
No ratings yet
Architecture of LSTM
5 pages
Understanding Boltzmann Machines in DL
No ratings yet
Understanding Boltzmann Machines in DL
4 pages
Introduction to Neural Networks in ML
100% (2)
Introduction to Neural Networks in ML
16 pages
Bidirectional Recurrent Neural Network
No ratings yet
Bidirectional Recurrent Neural Network
10 pages
AI & ML Foundations Question Bank
No ratings yet
AI & ML Foundations Question Bank
2 pages
Deep Learning Interview Questions Guide
No ratings yet
Deep Learning Interview Questions Guide
28 pages
Fuzzy Logic & Neural Networks MCQs
No ratings yet
Fuzzy Logic & Neural Networks MCQs
4 pages
Deep Learning Activation Functions Explained
No ratings yet
Deep Learning Activation Functions Explained
23 pages
Deep Learning Summary Notes
No ratings yet
Deep Learning Summary Notes
6 pages
Neural Network Parameter Calculation
No ratings yet
Neural Network Parameter Calculation
5 pages
Deep Learning Exam Questions Guide
No ratings yet
Deep Learning Exam Questions Guide
3 pages
Brain Tumor Classification with CNNs
No ratings yet
Brain Tumor Classification with CNNs
2 pages
Deep Learning Concepts in ANN
No ratings yet
Deep Learning Concepts in ANN
56 pages
Comprehensive Neural Network Chart
100% (1)
Comprehensive Neural Network Chart
19 pages
A Review On Tomato Leaf Disease Detection Using Deep Learning Approaches
No ratings yet
A Review On Tomato Leaf Disease Detection Using Deep Learning Approaches
18 pages
Perceptron and MLP Overview Guide
No ratings yet
Perceptron and MLP Overview Guide
60 pages
COMP3308 Neural Network Exercises
No ratings yet
COMP3308 Neural Network Exercises
2 pages
Key Concepts in Deep Learning
No ratings yet
Key Concepts in Deep Learning
2 pages
RNNs and LSTMs in NLP
No ratings yet
RNNs and LSTMs in NLP
22 pages

Understanding Convolutional Neural Networks

Uploaded by

Understanding Convolutional Neural Networks

Uploaded by

Today: Convolutional Neural Networks (CNNs)

1. Scene understanding and object recognition for machines (and humans)

Input: 2D Idea: connect

Connect patch in input layer to a single neuron in subsequent layer.

This “patchy” operation is convolution

1) Apply a set of weights – a filter – to extract local features

2) Use multiple filters to extract different features

3) Spatially share parameters of each filter

Input: Fully Connected:

Key idea: Use spatial structure in input to inform architecture

Let’s identify key features in each image category

Nose, Eyes,Mouth Wheels, License Plate, Door,Windows,Steps

Original Sharpen Edge Detect “Strong” Edge

Image is represented as matrix of pixel values… and computers are literal!

Rohrer How do CNNs work?

• TF convolution operator takes stride and zero fill option as parameters

Low level features Mid level features High level features

Edges, dark spots Eyes, ears,nose Facial structure

Lee+ ICML 2009

1) Apply a set of weights – a filter – to extract local features

[LeCun et al., 1998]

• Different filters look at different channels

[LeCun et al., 1998]

Low level features Mid level features High level features

Edges, dark spots Eyes, ears,nose Facial structure

Lee+ ICML 2009

1. Convolution:Apply filters to generate feature maps.

For a neuron in hidden layer:

Karn Intuitive CNNs

Max Pooling, average pooling

x = [Link].conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')

Output Input Pooling Batch H W Input channel

- CONV and POOL layers output high-level features of input

# second convolutional layer

# fully connected classifier

You might also like