0% found this document useful (0 votes)
11 views82 pages

Visual Classification

The document provides an overview of visual classification in computer vision, detailing the image classification pipeline, common classifiers like K-Nearest Neighbor (KNN) and Support Vector Machines (SVM), and challenges in image classification. It explains the process of training classifiers using labeled datasets, evaluating their performance, and highlights the importance of feature extraction and distance measures. Additionally, it includes practical examples and code snippets for implementing KNN and SVM algorithms using datasets such as Fisher's Iris and Caltech 101.

Uploaded by

Talha Younas
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views82 pages

Visual Classification

The document provides an overview of visual classification in computer vision, detailing the image classification pipeline, common classifiers like K-Nearest Neighbor (KNN) and Support Vector Machines (SVM), and challenges in image classification. It explains the process of training classifiers using labeled datasets, evaluating their performance, and highlights the importance of feature extraction and distance measures. Additionally, it includes practical examples and code snippets for implementing KNN and SVM algorithms using datasets such as Fisher's Iris and Caltech 101.

Uploaded by

Talha Younas
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Visual Classification 1

Last updated: 14-04-2024

[NCV] Computer Vision, The National Programme on


Dr. Zulfiqar Habib, Professor
Technology Enhanced Learning (NPTEL),
[Link] [Link]
Teacher Appreciation
2

“A student's progress should be measured in


terms of the questions they are asking, not
merely by the answers that they are reciting”

- Robert John Meehan -


Common Approach
3
Model
Models
Generation
Training
ing
s t
Te Recognition
Image Feature Classification
Preprocessing Segmentation
acquisition extraction Regression
Clustering

Problem Knowledge
Result
domain base
Popular Classifiers
4

▪ K-Nearest Neighbor (KNN)


▪ SVM
o SVMLight
o LIBSVM (Library for Support Vector Machines)
▪ Neural Netwroks
o Deep Learning
o Transfer Learning
o Extreme Learning Machines (ELM)
o Reinforcement learning
Image Classification
5

A core task in computer vision

Given a set of discrete labels: {rabbit, cat, glass, plane, ...}

Rabbit
Image Classification: The Problem
6
Human vs Machine Perception: Images are represented as Rd arrays of
numbers, e.g., R3 with integers between [0, 255], where d = 3 represents 3 color
channels (RGB)

What the
machine
(computer)
sees
Image Classification: Challenges
7
Michelangelo 1475-1564

Viewpoint
variation

Illumination
Image Classification: Challenges
8

Scale
Image Classification: Challenges
9

Deformation

Occlusion
Image Classification: Challenges
10

Background clutter

Intra-class variation

Kilmeny Niland 1995


An Image Classifier
11

>> f = imread('[Link]');
Unlike, e.g., sorting a
>> predict(f) list of numbers,
????

no obvious way to hard-code the algorithm for


recognizing a cat, or other classes.
An Image Classifier: Data-driven approach
12

▪ Use Machine Learning to train an image classifier


on some part of annotated data
▪ Evaluate the classifier on a withheld set of test
images
The Image Classification Pipeline
13

(Learning) (Evaluation)
(Input)
Learning & Testing of
Dataset collection
training an image classifier on
& labelling
classifier withheld images
The Image Classification Pipeline
14
Input: Our input consists of a set of N images, each labelled with one
of K different classes. We refer to this data as the training set.
Learning: Our task is to use the training set to learn what every one of
the classes looks like. We refer to this step as training a classifier,
or learning a model.
Evaluation: Evaluate the quality of the classifier by asking it to predict
labels for a new set of images that it has never seen before. We will
then compare the true labels of these images to the ones predicted by
the classifier. Intuitively, we're hoping that a lot of the predictions
match up with the true answers (which we call the ground truth).
The Machine Learning Framework
15

y = f(x)
Output Prediction function Image feature

Training: given a training set of labelled examples {(x1,y1), …,


(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
Testing: apply f to a never before seen test example x and output the
predicted value y = f(x)
Nearest Neighbor Classifier
16

Assign label of nearest training data point to each test data


point
Nearest Neighbor Classifier
17

Assign label of nearest training data point to each test data


point

Partitioning of feature space for two-category 2D and 3D data


K Nearest Neighbor (KNN)
18

Distance measure: Euclidean

where Xn and Xm are the n-th and m-th data


points

The test sample (green dot) should be classified either to blue


squares or to red triangles. If k = 3 (solid line circle) it is assigned
to the red triangles because there are 2 triangles and only 1 square
inside the inner circle. If k = 5 (dashed line circle) it is assigned to
the blue squares (3 squares vs. 2 triangles inside the outer circle).
KNN vs K-Means Clustering
19

KNN represents a supervised classification algorithm that will


give new data points accordingly to the k number or the
closest data points, while K-Means clustering is
an unsupervised clustering algorithm that gathers and groups
data into k number of clusters.
Example: Sample of data set
20
Given: Fisher's Iris dataset: DataIris (data size = 150)
Species of Iris: petal sepal

Setosa Versicolor Virginica

o The data set consists of 50 samples from each of three species of Iris.
o Four features were measured from each sample: the length and the width of the
sepals and petals, in centimeters.
Example: Sample of data set
21
No. Sepal. Sepal. Petal. Petal. Species
length width length width
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa

76 6.6 3.0 4.4 1.4 versicolor


150 5.9 3.0 5.1 1.8 virginica


Example
22

Task: To classify a sample of 150 irises in the 3 following species:


versicolor, virginica and setosa
Number of given attributes: 4
From 4 characteristics measured on the flowers (the length of the sepal, the
width of the sepal, the length of the petal and the width of the petal). In this
example, only last 2 attributes are considered.
Type of attribute to be predicted:
Discrete with 3 classes
Example: Code…
23
% Load the sample data, which includes Fisher's iris data of 5
measurements on a sample of 150 irises.
>> load fisheriris
>> whos >> species
Name Size Bytes Class Attributes species =
meas 150x4 4800 double 'setosa'
species 150x1 19300 cell
'setosa'
>> meas -------
meas = 'versicolor'
5.1000 3.5000 1.4000 0.2000 'versicolor'
4.9000 3.0000 1.4000 0.2000 -------
4.7000 3.2000 1.3000 0.2000 'virginica'
4.6000 3.1000 1.5000 0.2000
5.0000 3.6000 1.4000 0.2000 'virginica'
-------- ------- ------- -------- -------
Example: Code…
24
>> x = meas(:, 3:4); % use data of last 2 columns for fitting
>> y = species; % response data
>> mdl = [Link](x, y) % 1NN

mdl =

ClassificationKNN:
PredictorNames: {'x1' 'x2'}
ResponseName: 'Y'
ClassNames: {1x3 cell}
ScoreTransform: 'none'
NObservations: 150
Distance: 'euclidean'
NumNeighbors: 1
Example: Code…
25
% Predict the classification of an average flower
>> flwr = mean(x) % an average flower
flwr =
3.7580 1.1993
>> flwrClass = predict(mdl, flwr)

Petal width
flwrClass =
'versicolor‘

>> gscatter(x(:, 1), x(:, 2), species)


>> set(legend, 'location', 'best')
>> line(flwr(1), flwr(2), 'marker', 'x',
'color', 'k', 'markersize', 10, 'linewidth', 2)
Petal length
Example: Code
26
% Predict another flower
>> flwr = [5 1.55]; % Given (petal length, petal width)
>> flwrClass = predict(mdl, flwr) % prediction by 1NN
flwrClass =

'virginica‘

>> md5 = [Link](x,y,'NumNeighbors',5); % 5NN


>> flwrClass = predict(md5, flwr) % prediction by 5NN
flwrClass =

'versicolor'
Why different?
Example: Analysis
27
Species Length Width Distance
virginica 5.0000 1.5000 0.0500
versicolor 4.9000 1.5000 0.1118
versicolor 4.9000 1.5000 0.1118
versicolor 5.1000 1.6000 0.1118
virginica 5.1000 1.5000 0.1118

Value Count Percent


virginica 2 40.00%
versicolor 3 60.00%
K Nearest Neighbor (KNN)
28
Find the k nearest images, have them vote on the label

What is the best distance to use?


What is the best value of k to use?
i.e. how do we set the hyperparameters?
Very problem-dependent.
Must try them all out and see what works best.
SVM
29
Support Vector Machine (SVM) is a supervised learning
algorithm developed by Vladimir Vapnik and it was first heard
in 1992, introduced by Vapnik, Boser and Guyon in COLT-92.
SVM can be used for both classification or regression
challenges. However, it is mostly used in classification
problems.
SVM works by mapping data to a high-dimensional feature
space so that data points can be categorized, even when the
data are not otherwise linearly separable. A separator between
the categories is found, then the data are transformed in such a
way that the separator could be drawn as a hyperplane.
SVM
30

For detail
[Link]
Example: Visual Classification by SVM
31
Given: Caltech 101 dataset [1], 101 categories of about 50-100
images of 300x200 pixels.
We consider its 2 categories only: “aeroplane” and “faces”
Make two folders: TrainSet & TestSet. Copy pictures
01-50 from “aeroplane” to TrainSet
51-100 from “faces” to TrainSet
01-50 from “faces” to TestSet
51-100 from “aeroplane” to TestSet Download from
[Link]

[1] L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian
approach tested on 101 object categories. IEEE. CVPR 2004, Workshop on Generative-Model Based Vision. 2004
Example: Visual Classification by SVM
32

TrainingSet
Example: Visual Classification by SVM
33

TestSet
SVM: Code…
34
clc; clear; clearvars; close all;
% Load datasets
TrainImageSet = 'D:\Classification\Data\101_ObjectCategories\TrainingSet';
TestImageSet = 'D:\Classification\Data\101_ObjectCategories\TestSet';
% struct array with fields: name, date, bytes, isdir, datenum
k = dir(fullfile(TrainImageSet,'*.jpg'));
TrainFN = {k(~[[Link]]).name}; % store file names of all images, not directories
TrainSetSize = length(TrainFN);
k = dir(fullfile(TestImageSet,'*.jpg'));
TestFN = {k(~[[Link]]).name};
TestSetSize = length(TestFN);
SVM: Code…
35
% Labelling, it is always better to label your images numerically
TrainLabel = zeros(size(TestSetSize,1),1);
TrainLabel(1:50,1) = 1; % 1 = Airplanes
TrainLabel(51:100,1) = 2; % 2 = Faces
TestLabel = zeros(size(TestSetSize,1),1);
TestLabel(1:50,1) = 2; % 2 = Faces
TestLabel(51:100,1) = 1; % 1 = Airplanes
SVM: Code…
36
% Normalization: Conversion into grayscale if not already & resizing into same size
width = 8; height = 8; % Normalized image size
% Normalization of training images
TrainSet = cell([], 1);
for j=1:TrainSetSize j=1;
tempImage = imread(horzcat(TrainImageSet,filesep,TrainFN{j})); >> imshow(tempImage)
>> strcmp([Link],'grayscale')
imgInfo = imfinfo(horzcat(TrainImageSet,filesep,TrainFN{j})); ans =
if strcmp([Link],'grayscale') logical
TrainSet{j} = imresize(tempImage,[width height]); 0
>> imshow(TrainSet{j})
else
TrainSet{j} = imresize(rgb2gray(tempImage),[width height]);
end
end
SVM: Code…
37
% Normalization of test images
TestSet = cell([], 1);
for j=1:TestSetSize
tempImage = imread(horzcat(TestImageSet,filesep,TestFN{j}));
imgInfo = imfinfo(horzcat(TestImageSet,filesep,TestFN{j}));
if strcmp([Link],'grayscale')
TestSet{j} = imresize(tempImage,[width height]);
else
TestSet{j} = imresize(rgb2gray(tempImage),[width height]);
end
end
SVM: Code…
38
% Numeric matrix of features, every row is a feature vector for 1 image
Training_Set = uint8(zeros(TrainSetSize,width*height)); % code optimization by preallocating the memory
for j=1:length(TrainSet)
Training_Set(j,:) = reshape(TrainSet{j},1, width*height);
end
Test_Set = uint8(zeros(TestSetSize,width*height));
for j=1:length(TestSet)
Test_Set(j,:) = reshape(TestSet{j},1, width*height);
end
>> j=1;
>> Training_Set_tmp = reshape(TrainSet{j},1, width*height)
Training_Set_tmp = 1×64 uint8 row vector
Columns 1 through 19
247 249 252 230 202 219 215 228 233 210 184 166 87 122 125 173 225 223 205
Columns 20 through 38
156 149 109 129 171 219 218 223 179 143 62 100 169 218 217 217 189 152 92
Columns 39 through 57
106 167 215 216 217 189 144 86 113 163 209 202 181 122 84 116 109 153 232
Columns 58 through 64
218 179 158 167 198 185 201
SVM: Code
39
% SVM training
SVMModel = fitcsvm(double(Training_Set), TrainLabel);
Classes = predict(SVMModel, double(Test_Set));

% SVM testing
MissClassified = find(TestLabel-Classes).'
Accuracy =
(TestSetSize-length(MissClassified))/TestSetSize*100
MissClassified = 1×7
1 3 14 15 16 21 45
Accuracy = 93 >> predict(SVMModel,
double(Test_Set(50,:)))
ans = Face
Accuracy is 90% for 5x5, 93% 2
for 8x8, 10x10 or 100x100 >> predict(SVMModel,
Aeroplane
double(Test_Set(51,:)))
ans =
40
Neural Network => Deep Learning
41
Deep learning is a class of
machine learning algorithms that
uses multiple layers to
progressively extract higher level
features from the raw input.
For example, in image
processing, lower layers may
identify edges, while higher
layers may identify the concepts
relevant to a human such as digits
or letters or faces.
Neural Network => Deep Learning
42
Deep learning (also known as deep structured learning) is part of a
broader family of machine learning methods based on artificial neural
networks with representation learning. Learning can be supervised,
semi-supervised or unsupervised.
Handcrafted Machine Learning

Input layer – Hidden layers – output layer


Neural Network => Deep Learning
43
Deep learning architectures such as deep neural networks, deep
belief networks, recurrent neural networks and convolutional
neural networks (CNN) have been applied to fields including:
Computer vision, speech recognition, natural language
processing, audio recognition, social network filtering, machine
translation, bioinformatics, drug design, medical image
analysis, material inspection and board game programs.
Results are produced comparable to and in some cases
surpassing human expert performance.
Neural Network => Deep Learning => Transfer Learning
44
Transfer learning is commonly used in deep learning applications. You
can take a pretrained network and use it as a starting point to learn a new
task.
Fine-tuning a network with transfer learning is usually much faster and
easier than training a network with randomly initialized weights from
scratch. You can quickly transfer learned features to a new task using a
smaller number of training images. Popular pretrained CNN models are:
o AlexNet
o GoogLeNet
o ResNet (18, 50, 101)
Transfer Learning Using AlexNet
45

AlexNet is a pretrained convolutional neural network. It can be


fine-tuned to perform classification on a new collection of
images.
AlexNet has been trained on over a million images and can
classify images into 1000 object categories (such as keyboard,
coffee mug, pencil, and many animals).
The network has learned rich feature representations for a wide
range of images. The network takes an image as input and
outputs a label for the object in the image together with the
probabilities for each of the object categories.
Transfer Learning Using AlexNet
46
Transfer Learning Using AlexNet
47
Transfer Learning Using AlexNet
48
Transfer Learning Using AlexNet: Example…
49
Transfer Learning Using AlexNet: Example…
50
Load Data: Unzip and load the new images as an image
datastore. imageDatastore automatically labels the images based on
folder names and stores the data as an ImageDatastore object. An image
datastore enables you to store large image data, including data that does
not fit in memory, and efficiently read batches of images during training
of a convolutional neural network.
>> numel([Link])
unzip('[Link]'); ans = 75
imds = imageDatastore('MerchData', ... >> I = readimage(imds,5);
'IncludeSubfolders',true, ... >> imshow(I)
'LabelSource','foldernames'); >> I = readimage(imds,52);
>> imshow(I)
>> size(I)
This is very small data set ans = 227 227 3
contains only 75 images.
Transfer Learning Using AlexNet: Example…
51
Divide the data into training and validation data sets. Use about 70% of
the images for training and 30% for validation. splitEachLabel splits the
images datastore into two new datastores.

[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');

>> numTrainImages = numel([Link])


numTrainImages = 55
Transfer Learning Using AlexNet: Example…
52
Load the pretrained AlexNet neural network. If Neural Network
Toolbox™ Model for AlexNet Network is not installed, then the
software provides a download link. AlexNet is trained on more than one
million images and can classify images into 1000 object categories, such
as keyboard, mouse, pencil, and many animals. As a result, the model
has learned rich feature representations for a wide range of images.

net = alexnet;
Transfer Learning Using AlexNet: Example…
53
>> [Link] % or analyzeNetwork(net)
ans = 25x1 Layer array with layers:
1 'data' Image Input 227x227x3 images with 'zerocenter' normalization (i.e., mean 0, standard deviation 1)
2 'conv1' Convolution 96 11x11x3 convolutions with stride [4 4] and padding [0 0 0 0]
3 'relu1' ReLU ReLU (Rectified Linear Unit) The network has five
4 'norm1' Cross Channel Normalization cross channel normalization with 5 channels per element
5
6
'pool1' Max Pooling
'conv2' Grouped Convolution
3x3 max pooling with stride [2 2] and padding [0 0 0 0]
2 groups of 128 5x5x48 convolutions with stride [1 1] and padding [2 2 2 2]
convolutional layers
7
8
'relu2' ReLU ReLU
'norm2' Cross Channel Normalization cross channel normalization with 5 channels per element
and three fully
9 'pool2' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0]
10 'conv3' Convolution 384 3x3x256 convolutions with stride [1 1] and padding [1 1 1 1] connected layers.
11 'relu3' ReLU ReLU
12 'conv4' Grouped Convolution 2 groups of 192 3x3x192 convolutions with stride [1 1] and padding [1 1 1 1]
13 'relu4' ReLU ReLU
14 'conv5' Grouped Convolution 2 groups of 128 3x3x192 convolutions with stride [1 1] and padding [1 1 1 1]
15 'relu5' ReLU ReLU
16 'pool5' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] >> inputSize = [Link](1).InputSize
17 'fc6' Fully Connected 4096 fully connected layer
18 'relu6' ReLU ReLU inputSize = 227 227 3
19 'drop6' Dropout 50% dropout
20 'fc7' Fully Connected 4096 fully connected layer The first layer, the image input layer,
21 'relu7' ReLU ReLU
22 'drop7' Dropout 50% dropout requires input images of size
23 'fc8' Fully Connected 1000 fully connected layer
24 'prob' Softmax softmax 227-by-227-by-3, where 3 is the number of
25 'output' Classification Output crossentropyex with 'tench' and 999 other classes
color channels.
Transfer Learning Using AlexNet: Example…
54
Replace Final Layers: The last three layers of the pretrained network net
are configured for 1000 classes. These three layers must be fine-tuned for
the new classification problem. Extract all layers, except the last three,
from the pretrained network.
Transfer the layers to the new classification task by replacing the last three
layers with a fully connected layer, a softmax layer, and a classification
output layer. Specify the options of the new fully connected layer
according to the new data. Set the fully connected layer to have the same
size as the number of classes in the new data. To learn faster in the new
layers than in the transferred layers, increase the WeightLearnRateFactor
and BiasLearnRateFactor values of the fully connected layer.
Transfer Learning Using AlexNet: Example…
55

layersTransfer = [Link](1:end-3);
numClasses = numel(categories([Link])) numClasses = 5
layers = [
layersTransfer
fullyConnectedLayer(numClasses,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20)
softmaxLayer
classificationLayer];
Transfer Learning Using AlexNet: Example…
56
Train Network: The network requires input images of size 227-by-227-by-3,
but the images in the image datastores have different sizes. Use an augmented
image datastore to automatically resize the training images. Specify additional
augmentation operations to perform on the training images: randomly flip the
training images along the vertical axis, and randomly translate them up to 30
pixels horizontally and vertically. Data augmentation helps prevent the network
from overfitting and memorizing the exact details of the training images.
pixelRange = [-30 30];
imageAugmenter = imageDataAugmenter( ...
'RandXReflection',true, ...
'RandXTranslation',pixelRange, ...
'RandYTranslation',pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ...
'DataAugmentation',imageAugmenter);
Transfer Learning Using AlexNet: Example…
57
To automatically resize the validation images without performing further data
augmentation, use an augmented image datastore without specifying any additional
preprocessing operations.
Specify the training options. For transfer learning, keep the features from the early layers of
the pretrained network (the transferred layer weights). To slow down learning in the
transferred layers, set the initial learning rate to a small value. In the previous step, you
increased the learning rate factors for the fully connected layer to speed up learning in the
new final layers. This combination of learning rate settings results in fast learning only in
the new layers and slower learning in the other layers. When performing transfer learning,
you do not need to train for as many epochs. An epoch is a full training cycle on the entire
training data set. Specify the mini-batch size and validation data. The software validates the
network every ValidationFrequency iterations during training.
Transfer Learning Using AlexNet: Example…
58
augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);
options = trainingOptions('sgdm', ... Train the network that consists of the
'MiniBatchSize',10, ...
'MaxEpochs',6, ...
transferred and new layers. By default,
'InitialLearnRate',1e-4, ... trainNetwork uses a GPU if one is
'ValidationData',augimdsValidation, ... available (requires Parallel Computing
'ValidationFrequency',3, ... Toolbox™ and a CUDA® enabled GPU
'ValidationPatience',Inf, ... with compute capability 3.0 or higher).
'Verbose',false, ... Otherwise, it uses a CPU. You can also
'Plots','training-progress');
specify the execution environment by
using the 'ExecutionEnvironment'
name-value pair argument of
trainingOptions.
netTransfer = trainNetwork(augimdsTrain,layers,options);
Transfer Learning Using AlexNet: Example…
59
Transfer Learning Using AlexNet: Example…
60
Classify the validation images using the fine-tuned network.
[YPred, scores] = classify(netTransfer,augimdsValidation);
% Display four sample validation images with their predicted labels.
idx = randperm(numel([Link]),4);
figure
for i = 1:4
subplot(2,2,i)
I = readimage(imdsValidation,idx(i));
imshow(I)
label = YPred(idx(i));
title(string(label));
end
Transfer Learning Using AlexNet: Example
61
Calculate the classification accuracy on the validation set. Accuracy is
the fraction of labels that the network predicts correctly.
This trained network has high accuracy. If the accuracy is not high
enough using transfer learning, then try feature extraction instead.

YValidation = [Link];
accuracy = mean(YPred == YValidation)
accuracy = 0.9500
CNN Based Deep Learning Example
62
Deep Learning: Example…
63
% Hand written digits recognition using deep learning
clc; clear; close all;
% Load and Explore Image Data
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ... dataSize = 1×2
'nndatasets','DigitDataset'); 10000 1
imds = imageDatastore(digitDatasetPath, ...
'IncludeSubfolders',true,'LabelSource','foldernames');
dataSize = size([Link])
% Display some of the images in the datastore.
figure;
perm = randperm(dataSize(1), 20);
for i = 1:20
subplot(4,5,i);
imshow([Link]{perm(i)});
end
Deep Learning: Example…
64
labelCount = countEachLabel(imds) Label Count

img = readimage(imds,1); 1 0 1000


imageSize = size(img) imageSize = 1×2
2 1 1000
%Specify Training and Validation Sets 28 28
numTrainFiles = 750; 3 2 1000

[imdsTrain,imdsValidation] = 4 3 1000
splitEachLabel(imds,numTrainFiles,'randomize');
5 4 1000

6 5 1000

7 6 1000

8 7 1000

9 8 1000

10 9 1000
Deep Learning: Example…
65
% Define the convolutional neural network architecture.
layers = [
imageInputLayer([imageSize(1) imageSize(2) 1])
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
fullyConnectedLayer(10)
softmaxLayer
classificationLayer
];
Deep Learning: Example…
66
Image Input Layer An imageInputLayer is where you specify the image size, which, in this case, is
28-by-28-by-1. These numbers correspond to the height, width, and the channel size. The digit data
consists of grayscale images, so the channel size (color channel) is 1. For a color image, the channel
size is 3, corresponding to the RGB values. You do not need to shuffle the data because trainNetwork,
by default, shuffles the data at the beginning of training. trainNetwork can also automatically shuffle
the data at the beginning of every epoch during training.

Convolutional Layer In the convolutional layer, the first argument is filterSize, which is the height
and width of the filters the training function uses while scanning along the images. In this example,
the number 3 indicates that the filter size is 3-by-3. You can specify different sizes for the height and
width of the filter. The second argument is the number of filters, numFilters, which is the number of
neurons that connect to the same region of the input. This parameter determines the number of feature
maps. Use the 'Padding' name-value pair to add padding to the input feature map. For a convolutional
layer with a default stride of 1, 'same' padding ensures that the spatial output size is the same as the
input size. You can also define the stride and learning rates for this layer using name-value pair
arguments of convolution2dLayer.
Deep Learning: Example…
67
Batch Normalization Layer Batch normalization layers normalize the activations and gradients
propagating through a network, making network training an easier optimization problem. Use batch
normalization layers between convolutional layers and nonlinearities, such as ReLU layers, to speed
up network training and reduce the sensitivity to network initialization.
Use batchNormalizationLayer to create a batch normalization layer.

ReLU Layer The batch normalization layer is followed by a nonlinear activation function. The most
common activation function is the rectified linear unit (ReLU). Use reluLayer to create a ReLU layer.

Max Pooling Layer Convolutional layers (with activation functions) are sometimes followed by a
down-sampling operation that reduces the spatial size of the feature map and removes redundant
spatial information. Down-sampling makes it possible to increase the number of filters in deeper
convolutional layers without increasing the required amount of computation per layer. One way of
down-sampling is using a max pooling, which you create using maxPooling2dLayer. The max
pooling layer returns the maximum values of rectangular regions of inputs, specified by the first
argument, poolSize. In this example, the size of the rectangular region is [2,2].
The 'Stride' name-value pair argument specifies the step size that the training function takes as it
scans along the input.
Deep Learning: Example…
68
Fully Connected Layer The convolutional and down-sampling layers are followed by one or more
fully connected layers. As its name suggests, a fully connected layer is a layer in which the neurons
connect to all the neurons in the preceding layer. This layer combines all the features learned by the
previous layers across the image to identify the larger patterns. The last fully connected layer
combines the features to classify the images. Therefore, the OutputSize parameter in the last fully
connected layer is equal to the number of classes in the target data. In this example, the output size is
10, corresponding to the 10 classes. Use fullyConnectedLayer to create a fully connected layer.

Softmax Layer The softmax activation function normalizes the output of the fully connected layer.
The output of the softmax layer consists of positive numbers that sum to one, which can then be used
as classification probabilities by the classification layer. Create a softmax layer using
the softmaxLayer function after the last fully connected layer.

Classification Layer The final layer is the classification layer. This layer uses the probabilities
returned by the softmax activation function for each input to assign the input to one of the mutually
exclusive classes and compute the loss. To create a classification layer, use classificationLayer.
Deep Learning: Example…
69
% Specify Training Options. After defining the network structure, specify
the training options. Train the network using
options = trainingOptions('sgdm', ...
stochastic gradient descent with momentum
'InitialLearnRate',0.01, ... (SGDM) with an initial learning rate of 0.01.
'MaxEpochs',4, ... Set the maximum number of epochs to 4. An
'Shuffle','every-epoch', ... epoch is a full training cycle on the entire
'ValidationData',imdsValidation, ... training data set. Monitor the network
accuracy during training by specifying
'ValidationFrequency',30, ...
validation data and validation frequency.
'Verbose',false, ... Shuffle the data every epoch. The software
'Plots','training-progress'); trains the network on the training data and
calculates the accuracy on the validation data
net = trainNetwork(imdsTrain,layers,options); at regular intervals during training. The
validation data is not used to update the
network weights. Turn on the training
progress plot, and turn off the command
window output.
Deep Learning: Example…
70
Deep Learning: Example
71
% Classify Validation Images and Compute Accuracy.
YPred = classify(net,imdsValidation);
YValidation = [Link];

accuracy = sum(YPred == YValidation)/numel(YValidation) accuracy = 0.9944

% Individual testing
figure
idx = 1000;
imshow([Link]{idx})
label = YPred(idx);
title(string(label));
Watch: Lectures 5-7
72
Assignment 73
Find or prepare a dataset of any size with two types of images. Classify
these two types by using the machine learning techniques in the
following table on the same dataset and show their comparison. Also
include a short note on which method works best for this dataset and
why.

# Machine Learning Algorithm Accuracy Training Memory use Comments


(%age) time (Sec) (KB)
1 KNN
2 SVM
3 Deep Learning
4 Transfer Learning
References 74
1. [Link]
2. [Link]
3. [Link]
4. [Link]
5. [Link]
6. [Link]
7. [Link]
[Link]
Extra Stuff
75
Classification Algorithms
76
Classification Model Training
Each model is trained to detect a specific type of object. The classification
models are trained by extracting features from a set of known images. These
extracted features are then fed into a learning algorithm to train the classification
model. Computer Vision System Toolbox software uses the Viola-Jones cascade
object detector. This detector includes Haar-like [6] features and cascade of
classifiers trained using boosting.
The image size used to train the classifiers defines the smallest region containing
the object. Training image sizes vary according to the application, type of target
object, and available positive images. You must set the MinSize property to a
value greater than or equal to the image size used to train the model…..
Classification Algorithms
77
MATLAB DEMOS
78
Auto Video Surveyalance
79

Abandoned
Object
Detection
A Simple Classifier: Nearest Neighbor
80
1. Collect a dataset of images and label them
2. Use Machine Learning to train an image classifier
3. Evaluate the classifier on a withheld set of test images
Example training set
A Simple Classifier: Nearest Neighbor
81
Example dataset: CIFAR-10
10 labels
50,000 training images
10,000 test images. For every test image (first column),
examples of nearest neighbors in rows
A Simple Classifier: Nearest Neighbor
82
How do we compare the images?
What is the distance metric?

You might also like