IMAGE CLASSIFICATION
USING CNN AND
TRANSFER LEARNING
STUDENT NAMES:
BHARANIDHARAN THIRUMARAN
HARIKRISHNAN MARIMUTHU
VINOTH RAJENDRAN
SUSAIRAJ ANTHONY
G O K U L A N A N D S R I N I VA S A N
TA M I L S E LVA N PA L A N I S A M Y
Literature Survey
• The Google Inception-V3 model served as the foundation for a research by Burkapalli, V.C. and Patil, P.C. A fully connected
layer was then created on top to optimise the categorising process. In this process of the model-building, it was found that the
convolution layers might learn enough on their own convolution kernel to produce tensor outputs. Before the classification
stage, segmented characteristics were also concatenated with the model, which improved food categorization and enhanced the
capacity of important qualities.
• Another study done by Z. Zong, D. T. Nguyen, P. Ogunbona, and W. Li where the reserachers have made use of subjects
including recognition, analysis, and retrieval as well as multidisciplinary databases like Scopus.
• Y. He, C. Xu, N. Khanna, C. J. Boushey, and E. J. Delp analysed at the factors and combinations of food picture analysis where
they have used a vocabulary tree and k-nearest neighbour classification technique. A collection of 1453 images of eating events
in 42 food categories, captured by 45 persons in genuine eating settings, was used to evaluate the system. According to testing
results, their classification technique, which used a variety of characteristics and word trees, enhanced food classification
performance by around 22% for Top 1 classification accuracy and 10% for Top 4 classification accuracy.
• Food picture identification was the focus of K. Yanai and Y. Kawano, who used a five-layer CNN and data expansion techniques
to increase the number of training photos. This strategy produced a 90% gain.
Image classification using CNN
•Convolutional neural network (CNN) are very popular
techniques used in computer vision for image
classification.
•CNNs has the ability of automatically learning the complex
and hierarchical features and patterns of the pixels in the
raw images which makes it more feasible in using this
technique for image classification.
•A basic CNN model contains layers like convolutional
layer, pooling layer, fully connected layer and output layer.
•The pooling layer is used for reducing the dimension of the
pixels in the images which makes the model robust to
adjust to the changing input images. While the fully
connected layer is used for connecting previous layer to the
next layer for making predictions.
CNN model used in image classification
• First of all there is an input layer followed to which there are two
convolutional layer where the first layer would have 200 filters and
the second convolutional layer has 150 filters. These two layers have
3x3 kernels each with ReLU as its activation function.
• Then there is max pooling layer with a pool size of 4x4 that would be
used in reducing the spatial dimension of the images and down
sample the images.
• After the pooling layer there are additional 3 more convolutional
layer with 120, 80 and 50 filters and ReLU as activation function.
There is a flatten layer converting 2D feature maps into 1D feature
vector, there are three dense layers which acts as the fully connected
layer. There is a dropout layer that is used to avoid overfitting and
finally there is an output layer consisting of 6 units and SoftMax as
the activation function.
Image classification using transfer
learning
•Transfer learning is a common technique which is used mostly for
image classification tasks in deep learning.
•The transfer learning technique would be using the knowledge gained
from training a model on a different model but it should be relatable
task.
•When it comes to image classification transfer learning would be used
as a pre-trained CNN model that would be used in extracting the
features and fine tuning it on a new dataset.
•Most common pre-trained models are VGG16, ResNet, Inception etc. In
this research task we have used the VGG16 pre-trained model. The fully
connected layer is removed from the pre-trained model so that it could
be used for feature extraction.
•Then while fine tuning some of the layers are unfreeze and it is trained
along the newly added layers with the new dataset.
VGG16 pre-trained model
• VGG16 is a deep CNN model which is
used in image classification. It has 16
layers.
• It has an input layer with a fixed size of
224 x 224 pixels. It has 13 convolutional
layer followed by max pooling layer which
is used for down sampling. Convolutional
layer uses 3x3 kernels with stride of 1.
• It has 2x2 max pooling layer with a stride
of 2.
• It has fully connected layer with 4096 units
each followed by SoftMax layer for
classification.
VGG16 transfer learning for image
classification
•In this task we have used VGG16 pre-trained CNN model
used for image classification. Once the pre-trained model is
loaded, some of its layers are freeze which wont be trained
this is necessary to retain the learned features of the pre-
trained model. The fully connected layer is freeze and it is
replaced with new layers which is suitable for image
classification task.
•We have fine tuned the model by adding extra layers on the
top of the base pre-trained model. These new layers are
trained. This step basically helps us in adapting the pre-
trained model to the specific task of image classification.
•Keras is used for loading the pre-trained model using the
inbuilt libraries and functions.
Evaluation results
Evaluation metric CNN model VGG16 CNN model
Test Loss 0.9721 21.9762
Test Accuracy 60.86% 77.16%
•VGG16 CNN model has slightly more test loss compared to the basic CNN model. This
shows that VGG 16 model has tried a lot in reducing the errors while making the
classifications which has ultimately caused more loss value while the CNN model has less
error value which shows its performance is better.
•The test accuracy of VGG16 model is 77.17% which shows more number of test datasets are
correctly classified as compared to the CNN model having an accuracy of 60.87%.
Classified images of CNN model
If we observe the CNN model
classified images it could be found
that some of the images are wrongly
classified for example buildings are
classified as glaciers, sea is
classified as glacier. Thus it shows
that it is less accurate and has
inability in classifying most of the
images accurately.
Classified images of VGG16 CNN
model
•If we observe the VGG16 CNN
model classified images it could be
found that most of the images are
correctly classified.
•Out of the 18 images which has been
displayed after testing the model it
could be observed that all of the 18
images are correctly classified.
•Thus it shows that this VGG16 model
is more accurate compared to the
basic CNN model.
References
• [Link]
• [Link]
• [Link]
fd01c
• [Link]
• Burkapalli, V. a. (n.d.). TRANSFER LEARNING: INCEPTION-V3 BASED CUSTOM CLASSIFICATION
• Y. He, C. X. (n.d.). “Analysis of food im- ages: Features and classification,” . IEEE International Conference on Image
Processing (ICIP), , pp. 2744–2748.
• Yanai, K., & Kawano, Y. (2015). Food image recognition using deep convolutional network with pre-training and fine-
tuning. IEEE International Conference on Multimedia & Expo Workshops (ICMEW).
• Z. Zong, D. T. (n.d.). “On the combination of local texture and global structure for food classification,”. IEEE
International Symposium on Multimedia.
Thank you