Visual Classification
Visual Classification
Problem Knowledge
Result
domain base
Popular Classifiers
4
Rabbit
Image Classification: The Problem
6
Human vs Machine Perception: Images are represented as Rd arrays of
numbers, e.g., R3 with integers between [0, 255], where d = 3 represents 3 color
channels (RGB)
What the
machine
(computer)
sees
Image Classification: Challenges
7
Michelangelo 1475-1564
Viewpoint
variation
Illumination
Image Classification: Challenges
8
Scale
Image Classification: Challenges
9
Deformation
Occlusion
Image Classification: Challenges
10
Background clutter
Intra-class variation
>> f = imread('[Link]');
Unlike, e.g., sorting a
>> predict(f) list of numbers,
????
(Learning) (Evaluation)
(Input)
Learning & Testing of
Dataset collection
training an image classifier on
& labelling
classifier withheld images
The Image Classification Pipeline
14
Input: Our input consists of a set of N images, each labelled with one
of K different classes. We refer to this data as the training set.
Learning: Our task is to use the training set to learn what every one of
the classes looks like. We refer to this step as training a classifier,
or learning a model.
Evaluation: Evaluate the quality of the classifier by asking it to predict
labels for a new set of images that it has never seen before. We will
then compare the true labels of these images to the ones predicted by
the classifier. Intuitively, we're hoping that a lot of the predictions
match up with the true answers (which we call the ground truth).
The Machine Learning Framework
15
y = f(x)
Output Prediction function Image feature
o The data set consists of 50 samples from each of three species of Iris.
o Four features were measured from each sample: the length and the width of the
sepals and petals, in centimeters.
Example: Sample of data set
21
No. Sepal. Sepal. Petal. Petal. Species
length width length width
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
…
mdl =
ClassificationKNN:
PredictorNames: {'x1' 'x2'}
ResponseName: 'Y'
ClassNames: {1x3 cell}
ScoreTransform: 'none'
NObservations: 150
Distance: 'euclidean'
NumNeighbors: 1
Example: Code…
25
% Predict the classification of an average flower
>> flwr = mean(x) % an average flower
flwr =
3.7580 1.1993
>> flwrClass = predict(mdl, flwr)
Petal width
flwrClass =
'versicolor‘
'virginica‘
'versicolor'
Why different?
Example: Analysis
27
Species Length Width Distance
virginica 5.0000 1.5000 0.0500
versicolor 4.9000 1.5000 0.1118
versicolor 4.9000 1.5000 0.1118
versicolor 5.1000 1.6000 0.1118
virginica 5.1000 1.5000 0.1118
For detail
[Link]
Example: Visual Classification by SVM
31
Given: Caltech 101 dataset [1], 101 categories of about 50-100
images of 300x200 pixels.
We consider its 2 categories only: “aeroplane” and “faces”
Make two folders: TrainSet & TestSet. Copy pictures
01-50 from “aeroplane” to TrainSet
51-100 from “faces” to TrainSet
01-50 from “faces” to TestSet
51-100 from “aeroplane” to TestSet Download from
[Link]
[1] L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian
approach tested on 101 object categories. IEEE. CVPR 2004, Workshop on Generative-Model Based Vision. 2004
Example: Visual Classification by SVM
32
TrainingSet
Example: Visual Classification by SVM
33
TestSet
SVM: Code…
34
clc; clear; clearvars; close all;
% Load datasets
TrainImageSet = 'D:\Classification\Data\101_ObjectCategories\TrainingSet';
TestImageSet = 'D:\Classification\Data\101_ObjectCategories\TestSet';
% struct array with fields: name, date, bytes, isdir, datenum
k = dir(fullfile(TrainImageSet,'*.jpg'));
TrainFN = {k(~[[Link]]).name}; % store file names of all images, not directories
TrainSetSize = length(TrainFN);
k = dir(fullfile(TestImageSet,'*.jpg'));
TestFN = {k(~[[Link]]).name};
TestSetSize = length(TestFN);
SVM: Code…
35
% Labelling, it is always better to label your images numerically
TrainLabel = zeros(size(TestSetSize,1),1);
TrainLabel(1:50,1) = 1; % 1 = Airplanes
TrainLabel(51:100,1) = 2; % 2 = Faces
TestLabel = zeros(size(TestSetSize,1),1);
TestLabel(1:50,1) = 2; % 2 = Faces
TestLabel(51:100,1) = 1; % 1 = Airplanes
SVM: Code…
36
% Normalization: Conversion into grayscale if not already & resizing into same size
width = 8; height = 8; % Normalized image size
% Normalization of training images
TrainSet = cell([], 1);
for j=1:TrainSetSize j=1;
tempImage = imread(horzcat(TrainImageSet,filesep,TrainFN{j})); >> imshow(tempImage)
>> strcmp([Link],'grayscale')
imgInfo = imfinfo(horzcat(TrainImageSet,filesep,TrainFN{j})); ans =
if strcmp([Link],'grayscale') logical
TrainSet{j} = imresize(tempImage,[width height]); 0
>> imshow(TrainSet{j})
else
TrainSet{j} = imresize(rgb2gray(tempImage),[width height]);
end
end
SVM: Code…
37
% Normalization of test images
TestSet = cell([], 1);
for j=1:TestSetSize
tempImage = imread(horzcat(TestImageSet,filesep,TestFN{j}));
imgInfo = imfinfo(horzcat(TestImageSet,filesep,TestFN{j}));
if strcmp([Link],'grayscale')
TestSet{j} = imresize(tempImage,[width height]);
else
TestSet{j} = imresize(rgb2gray(tempImage),[width height]);
end
end
SVM: Code…
38
% Numeric matrix of features, every row is a feature vector for 1 image
Training_Set = uint8(zeros(TrainSetSize,width*height)); % code optimization by preallocating the memory
for j=1:length(TrainSet)
Training_Set(j,:) = reshape(TrainSet{j},1, width*height);
end
Test_Set = uint8(zeros(TestSetSize,width*height));
for j=1:length(TestSet)
Test_Set(j,:) = reshape(TestSet{j},1, width*height);
end
>> j=1;
>> Training_Set_tmp = reshape(TrainSet{j},1, width*height)
Training_Set_tmp = 1×64 uint8 row vector
Columns 1 through 19
247 249 252 230 202 219 215 228 233 210 184 166 87 122 125 173 225 223 205
Columns 20 through 38
156 149 109 129 171 219 218 223 179 143 62 100 169 218 217 217 189 152 92
Columns 39 through 57
106 167 215 216 217 189 144 86 113 163 209 202 181 122 84 116 109 153 232
Columns 58 through 64
218 179 158 167 198 185 201
SVM: Code
39
% SVM training
SVMModel = fitcsvm(double(Training_Set), TrainLabel);
Classes = predict(SVMModel, double(Test_Set));
% SVM testing
MissClassified = find(TestLabel-Classes).'
Accuracy =
(TestSetSize-length(MissClassified))/TestSetSize*100
MissClassified = 1×7
1 3 14 15 16 21 45
Accuracy = 93 >> predict(SVMModel,
double(Test_Set(50,:)))
ans = Face
Accuracy is 90% for 5x5, 93% 2
for 8x8, 10x10 or 100x100 >> predict(SVMModel,
Aeroplane
double(Test_Set(51,:)))
ans =
40
Neural Network => Deep Learning
41
Deep learning is a class of
machine learning algorithms that
uses multiple layers to
progressively extract higher level
features from the raw input.
For example, in image
processing, lower layers may
identify edges, while higher
layers may identify the concepts
relevant to a human such as digits
or letters or faces.
Neural Network => Deep Learning
42
Deep learning (also known as deep structured learning) is part of a
broader family of machine learning methods based on artificial neural
networks with representation learning. Learning can be supervised,
semi-supervised or unsupervised.
Handcrafted Machine Learning
[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');
net = alexnet;
Transfer Learning Using AlexNet: Example…
53
>> [Link] % or analyzeNetwork(net)
ans = 25x1 Layer array with layers:
1 'data' Image Input 227x227x3 images with 'zerocenter' normalization (i.e., mean 0, standard deviation 1)
2 'conv1' Convolution 96 11x11x3 convolutions with stride [4 4] and padding [0 0 0 0]
3 'relu1' ReLU ReLU (Rectified Linear Unit) The network has five
4 'norm1' Cross Channel Normalization cross channel normalization with 5 channels per element
5
6
'pool1' Max Pooling
'conv2' Grouped Convolution
3x3 max pooling with stride [2 2] and padding [0 0 0 0]
2 groups of 128 5x5x48 convolutions with stride [1 1] and padding [2 2 2 2]
convolutional layers
7
8
'relu2' ReLU ReLU
'norm2' Cross Channel Normalization cross channel normalization with 5 channels per element
and three fully
9 'pool2' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0]
10 'conv3' Convolution 384 3x3x256 convolutions with stride [1 1] and padding [1 1 1 1] connected layers.
11 'relu3' ReLU ReLU
12 'conv4' Grouped Convolution 2 groups of 192 3x3x192 convolutions with stride [1 1] and padding [1 1 1 1]
13 'relu4' ReLU ReLU
14 'conv5' Grouped Convolution 2 groups of 128 3x3x192 convolutions with stride [1 1] and padding [1 1 1 1]
15 'relu5' ReLU ReLU
16 'pool5' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] >> inputSize = [Link](1).InputSize
17 'fc6' Fully Connected 4096 fully connected layer
18 'relu6' ReLU ReLU inputSize = 227 227 3
19 'drop6' Dropout 50% dropout
20 'fc7' Fully Connected 4096 fully connected layer The first layer, the image input layer,
21 'relu7' ReLU ReLU
22 'drop7' Dropout 50% dropout requires input images of size
23 'fc8' Fully Connected 1000 fully connected layer
24 'prob' Softmax softmax 227-by-227-by-3, where 3 is the number of
25 'output' Classification Output crossentropyex with 'tench' and 999 other classes
color channels.
Transfer Learning Using AlexNet: Example…
54
Replace Final Layers: The last three layers of the pretrained network net
are configured for 1000 classes. These three layers must be fine-tuned for
the new classification problem. Extract all layers, except the last three,
from the pretrained network.
Transfer the layers to the new classification task by replacing the last three
layers with a fully connected layer, a softmax layer, and a classification
output layer. Specify the options of the new fully connected layer
according to the new data. Set the fully connected layer to have the same
size as the number of classes in the new data. To learn faster in the new
layers than in the transferred layers, increase the WeightLearnRateFactor
and BiasLearnRateFactor values of the fully connected layer.
Transfer Learning Using AlexNet: Example…
55
layersTransfer = [Link](1:end-3);
numClasses = numel(categories([Link])) numClasses = 5
layers = [
layersTransfer
fullyConnectedLayer(numClasses,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20)
softmaxLayer
classificationLayer];
Transfer Learning Using AlexNet: Example…
56
Train Network: The network requires input images of size 227-by-227-by-3,
but the images in the image datastores have different sizes. Use an augmented
image datastore to automatically resize the training images. Specify additional
augmentation operations to perform on the training images: randomly flip the
training images along the vertical axis, and randomly translate them up to 30
pixels horizontally and vertically. Data augmentation helps prevent the network
from overfitting and memorizing the exact details of the training images.
pixelRange = [-30 30];
imageAugmenter = imageDataAugmenter( ...
'RandXReflection',true, ...
'RandXTranslation',pixelRange, ...
'RandYTranslation',pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ...
'DataAugmentation',imageAugmenter);
Transfer Learning Using AlexNet: Example…
57
To automatically resize the validation images without performing further data
augmentation, use an augmented image datastore without specifying any additional
preprocessing operations.
Specify the training options. For transfer learning, keep the features from the early layers of
the pretrained network (the transferred layer weights). To slow down learning in the
transferred layers, set the initial learning rate to a small value. In the previous step, you
increased the learning rate factors for the fully connected layer to speed up learning in the
new final layers. This combination of learning rate settings results in fast learning only in
the new layers and slower learning in the other layers. When performing transfer learning,
you do not need to train for as many epochs. An epoch is a full training cycle on the entire
training data set. Specify the mini-batch size and validation data. The software validates the
network every ValidationFrequency iterations during training.
Transfer Learning Using AlexNet: Example…
58
augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);
options = trainingOptions('sgdm', ... Train the network that consists of the
'MiniBatchSize',10, ...
'MaxEpochs',6, ...
transferred and new layers. By default,
'InitialLearnRate',1e-4, ... trainNetwork uses a GPU if one is
'ValidationData',augimdsValidation, ... available (requires Parallel Computing
'ValidationFrequency',3, ... Toolbox™ and a CUDA® enabled GPU
'ValidationPatience',Inf, ... with compute capability 3.0 or higher).
'Verbose',false, ... Otherwise, it uses a CPU. You can also
'Plots','training-progress');
specify the execution environment by
using the 'ExecutionEnvironment'
name-value pair argument of
trainingOptions.
netTransfer = trainNetwork(augimdsTrain,layers,options);
Transfer Learning Using AlexNet: Example…
59
Transfer Learning Using AlexNet: Example…
60
Classify the validation images using the fine-tuned network.
[YPred, scores] = classify(netTransfer,augimdsValidation);
% Display four sample validation images with their predicted labels.
idx = randperm(numel([Link]),4);
figure
for i = 1:4
subplot(2,2,i)
I = readimage(imdsValidation,idx(i));
imshow(I)
label = YPred(idx(i));
title(string(label));
end
Transfer Learning Using AlexNet: Example
61
Calculate the classification accuracy on the validation set. Accuracy is
the fraction of labels that the network predicts correctly.
This trained network has high accuracy. If the accuracy is not high
enough using transfer learning, then try feature extraction instead.
YValidation = [Link];
accuracy = mean(YPred == YValidation)
accuracy = 0.9500
CNN Based Deep Learning Example
62
Deep Learning: Example…
63
% Hand written digits recognition using deep learning
clc; clear; close all;
% Load and Explore Image Data
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ... dataSize = 1×2
'nndatasets','DigitDataset'); 10000 1
imds = imageDatastore(digitDatasetPath, ...
'IncludeSubfolders',true,'LabelSource','foldernames');
dataSize = size([Link])
% Display some of the images in the datastore.
figure;
perm = randperm(dataSize(1), 20);
for i = 1:20
subplot(4,5,i);
imshow([Link]{perm(i)});
end
Deep Learning: Example…
64
labelCount = countEachLabel(imds) Label Count
[imdsTrain,imdsValidation] = 4 3 1000
splitEachLabel(imds,numTrainFiles,'randomize');
5 4 1000
6 5 1000
7 6 1000
8 7 1000
9 8 1000
10 9 1000
Deep Learning: Example…
65
% Define the convolutional neural network architecture.
layers = [
imageInputLayer([imageSize(1) imageSize(2) 1])
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
fullyConnectedLayer(10)
softmaxLayer
classificationLayer
];
Deep Learning: Example…
66
Image Input Layer An imageInputLayer is where you specify the image size, which, in this case, is
28-by-28-by-1. These numbers correspond to the height, width, and the channel size. The digit data
consists of grayscale images, so the channel size (color channel) is 1. For a color image, the channel
size is 3, corresponding to the RGB values. You do not need to shuffle the data because trainNetwork,
by default, shuffles the data at the beginning of training. trainNetwork can also automatically shuffle
the data at the beginning of every epoch during training.
Convolutional Layer In the convolutional layer, the first argument is filterSize, which is the height
and width of the filters the training function uses while scanning along the images. In this example,
the number 3 indicates that the filter size is 3-by-3. You can specify different sizes for the height and
width of the filter. The second argument is the number of filters, numFilters, which is the number of
neurons that connect to the same region of the input. This parameter determines the number of feature
maps. Use the 'Padding' name-value pair to add padding to the input feature map. For a convolutional
layer with a default stride of 1, 'same' padding ensures that the spatial output size is the same as the
input size. You can also define the stride and learning rates for this layer using name-value pair
arguments of convolution2dLayer.
Deep Learning: Example…
67
Batch Normalization Layer Batch normalization layers normalize the activations and gradients
propagating through a network, making network training an easier optimization problem. Use batch
normalization layers between convolutional layers and nonlinearities, such as ReLU layers, to speed
up network training and reduce the sensitivity to network initialization.
Use batchNormalizationLayer to create a batch normalization layer.
ReLU Layer The batch normalization layer is followed by a nonlinear activation function. The most
common activation function is the rectified linear unit (ReLU). Use reluLayer to create a ReLU layer.
Max Pooling Layer Convolutional layers (with activation functions) are sometimes followed by a
down-sampling operation that reduces the spatial size of the feature map and removes redundant
spatial information. Down-sampling makes it possible to increase the number of filters in deeper
convolutional layers without increasing the required amount of computation per layer. One way of
down-sampling is using a max pooling, which you create using maxPooling2dLayer. The max
pooling layer returns the maximum values of rectangular regions of inputs, specified by the first
argument, poolSize. In this example, the size of the rectangular region is [2,2].
The 'Stride' name-value pair argument specifies the step size that the training function takes as it
scans along the input.
Deep Learning: Example…
68
Fully Connected Layer The convolutional and down-sampling layers are followed by one or more
fully connected layers. As its name suggests, a fully connected layer is a layer in which the neurons
connect to all the neurons in the preceding layer. This layer combines all the features learned by the
previous layers across the image to identify the larger patterns. The last fully connected layer
combines the features to classify the images. Therefore, the OutputSize parameter in the last fully
connected layer is equal to the number of classes in the target data. In this example, the output size is
10, corresponding to the 10 classes. Use fullyConnectedLayer to create a fully connected layer.
Softmax Layer The softmax activation function normalizes the output of the fully connected layer.
The output of the softmax layer consists of positive numbers that sum to one, which can then be used
as classification probabilities by the classification layer. Create a softmax layer using
the softmaxLayer function after the last fully connected layer.
Classification Layer The final layer is the classification layer. This layer uses the probabilities
returned by the softmax activation function for each input to assign the input to one of the mutually
exclusive classes and compute the loss. To create a classification layer, use classificationLayer.
Deep Learning: Example…
69
% Specify Training Options. After defining the network structure, specify
the training options. Train the network using
options = trainingOptions('sgdm', ...
stochastic gradient descent with momentum
'InitialLearnRate',0.01, ... (SGDM) with an initial learning rate of 0.01.
'MaxEpochs',4, ... Set the maximum number of epochs to 4. An
'Shuffle','every-epoch', ... epoch is a full training cycle on the entire
'ValidationData',imdsValidation, ... training data set. Monitor the network
accuracy during training by specifying
'ValidationFrequency',30, ...
validation data and validation frequency.
'Verbose',false, ... Shuffle the data every epoch. The software
'Plots','training-progress'); trains the network on the training data and
calculates the accuracy on the validation data
net = trainNetwork(imdsTrain,layers,options); at regular intervals during training. The
validation data is not used to update the
network weights. Turn on the training
progress plot, and turn off the command
window output.
Deep Learning: Example…
70
Deep Learning: Example
71
% Classify Validation Images and Compute Accuracy.
YPred = classify(net,imdsValidation);
YValidation = [Link];
% Individual testing
figure
idx = 1000;
imshow([Link]{idx})
label = YPred(idx);
title(string(label));
Watch: Lectures 5-7
72
Assignment 73
Find or prepare a dataset of any size with two types of images. Classify
these two types by using the machine learning techniques in the
following table on the same dataset and show their comparison. Also
include a short note on which method works best for this dataset and
why.
Abandoned
Object
Detection
A Simple Classifier: Nearest Neighbor
80
1. Collect a dataset of images and label them
2. Use Machine Learning to train an image classifier
3. Evaluate the classifier on a withheld set of test images
Example training set
A Simple Classifier: Nearest Neighbor
81
Example dataset: CIFAR-10
10 labels
50,000 training images
10,000 test images. For every test image (first column),
examples of nearest neighbors in rows
A Simple Classifier: Nearest Neighbor
82
How do we compare the images?
What is the distance metric?