0% found this document useful (0 votes)
18 views4 pages

Real-Time Facial Emotion Detection System

This document presents a project on real-time facial emotion recognition using a convolutional neural network (CNN) architecture. The system utilizes the FER2013 dataset to detect seven emotions from video input, employing tools like OpenCV and TensorFlow for implementation. The results indicate a promising approach to enhancing human-computer interaction through emotion detection.

Uploaded by

benbaby53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Real-Time Facial Emotion Detection System

This document presents a project on real-time facial emotion recognition using a convolutional neural network (CNN) architecture. The system utilizes the FER2013 dataset to detect seven emotions from video input, employing tools like OpenCV and TensorFlow for implementation. The results indicate a promising approach to enhancing human-computer interaction through emotion detection.

Uploaded by

benbaby53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2021 2nd Global Conference for Advancement in Technology (GCAT)

Bangalore, India. Oct 1-3, 2021

Real-Time Facial Emotion Recognition


Devanshu Shah Khushi Chavan
Synapse, Department of Computer Engineering Synapse, Department of Computer Engineering
Dwarkadas J. Sanghvi College of Engineering Dwarkadas J. Sanghvi College of Engineering
Mumbai, India. Mumbai, India.
devanshushah@[Link] chavankhushi07@[Link]
2021 2nd Global Conference for Advancement in Technology (GCAT) | 978-1-6654-1836-2/21/$31.00 ©2021 IEEE | DOI: 10.1109/GCAT52182.2021.9587707

Sanket Shah Pratik Kanani


Synapse, Department of Computer Engineering Synapse, Department of Computer Engineering
Dwarkadas J. Sanghvi College of Engineering Dwarkadas J. Sanghvi College of Engineering
Mumbai, India. Mumbai, India.
sanketyou8@[Link] pratikkanani1232@[Link]

Abstract – An important aspect of human beings is their The work done by us in this project enables proposes a
ability to display emotions. Emotions form the basis of system which allows for real-time emotion detection by
intimate means of communications between one another. using a video stream input from the user’s webcam. We
Recognition of emotions displayed via facial expression with initially focus on detecting the face from the video using the
the help of a computer could prove to be a very powerful too. Haar cascade classifiers. We then make use of convolutional
In recent years, an increased number of intelligent systems are neural networks for determining emotions. The dataset used
using facial emotion recognition to improve human interaction. by us in this project is the FER2013 dataset. The dataset
These systems cause constant changes in their operation based originally was in the form of raw pixels but we had to
on the emotion of humans. In this paper, we propose an
convert them into actual images and use those images for
architecture based on the convolutional neural network (CNN)
for the facial recognition of emotions. In the implementation,
training our model. This greatly helped in improving the
we use the Facial Expression Recognition 2013 dataset (FER- performance of our model. The software used by us in this
2013). Through behavioral analysis, we show how different project is OpenCV and TensorFlow. OpenCV is used for
emotions seem to be sensitive to different parts of the face. operating the webcam and for face detection, while
TensorFlow was used for training the CNN for emotion
Keywords - Haar Cascade Classifier, Convolutional Neural detection.
Network, Softmax Classification, Real-Time Detection
II. LITERATURE REVIEW
I. INTRODUCTION The The leading research group on emotion recognition
Recognizing facial emotions has become a major issue is Affectiva. They have employed various deep learning
in many applications today. The research on facial emotion methodologies as they have a humongous dataset of over 3
recognition has gained a lot of momentum over the past few million facial videos. This dataset has been gathered from
years. The state of human emotions is identified using facial over a total of 75 countries which ensures that geographical
emotion recognition (e.g. neutral, happy, sad, surprise, fear, factors don't affect their research[1]. The classifier
anger, disgust, contempt) based on the stream of images fed developed by them can classify the emotions of joy, disgust,
in by a video. There has been growing interest in making contempt, and surprise with a ROC (Receiver operating
machines act as close as possible to actual human beings. To characteristics) score of 0.8.
make their actions replicate those of humans and add a Work done by Kotsia et al. [2] threw a spotlight on the
touch of human feelings in each of these actions. It has been effect of occlusion during the classification of emotion
argued that for there to be a proper human-computer expressions. To obtain satisfactory results multiple
interaction, the computer has to interact in a natural way, classification models and feature engineering techniques
similar to that when two humans interact. Other fields where were used. Gabor filter, which is a linear filter primarily
emotion recognition can prove to be useful are online used for texture analysis which analyzes whether there is
teaching, product marketing, health industry, and several any specific frequency content in the image in specific
others to determining whether certain things are liked or not directions in a localized region. They used the support
and what is the reaction of different people towards different vector machine(SVM) and the multi-layer perceptron(MLP)
stimuli. Humans express their thoughts and feelings in for classifying these features.
multiple ways. The most important being speech, followed
up by their display of emotions. Emotions typically manifest Wang and Yin studied how the magnitude of facial
themselves via multiple means, be it touch, visual or expressions changed with the change in distortion of
physiological. The most appropriate method of detecting detected face region and eventually the change of outcome
human emotions would be to determine their emotion from in their model. They used Topographic context expression
the visual cues displayed by a human. It is widely accepted descriptors to perform feature extraction. Every image in
that the slightest of change in emotions become visible on this research was treated as a 3-d model of the face and the
the faces of humans and hence developing a system which labelling of pixels was done taking into account the terrain
detects the emotion from an individual’s face could prove to features of each image.
be an invaluable tool.

978-1-6654-1836-2/21/$31.00 ©2021 IEEE 1


Apart from facial emotion detection research has been D. Layers and Activation functions used
done to detect emotion by using biological factors. The 1) 2D Convolutional Layer: 2D Convolutional layer is a
research paper by Arturo Nakasone[3] proposes a method
common type of convolution and is written as conv2d. The
for the recognition of emotions from bio-signals. They have
used GSR(Galvanic skin response) to measure skin most important parameters in this layer are the number and
conductance and Electromyography(EMG) to measure the size of the kernels. It performs feature extraction and
muscle activity. transforms a 2D matrix of features into a different 2D
matrix of features depending upon the given parameters.
As can be seen from the literature review, the detection Thus, this layer helps in image processing.
of emotions is a very complicated task and there have been
several different approaches towards solving this problem.
III. METHODOLOGY
A. Tools and Platforms used
The models are developed using Python-3 with the
assistance of its libraries and frameworks. OpenCV-3.3.0
was used in our project for Computer Vision and Image
Processing, we leveraged TensorFlow for implementing Fig. 2. Example of 2D Convolutional operation
models of Convolutional Neural Networks, Keras for
Machine Learning. Mathematical computations are 2) Max Pooling Layer: Max Pooling layer operates such
performed using NumPy and Pandas. The models are built that it calculates the maximum value in each patch of each
on Google Colab with the use of the GPUs provided. feature map. The layer downsamples the input in such a way
that it highlights the most present feature in the patch. It
B. Dataset used reduces the dimensions of the feature map and the
There are several datasets available on facial expression. computational cost of the network. This works much better
We used the Facial Expression Recognition dataset (FER- for computer vision.
2013). This was published 5 years ago at the International
Conference on Machine Learning (ICML). This is an open-
source dataset, created by Pierre-Luc Carrier and Aaron
Courville and was shared publicly for the Kaggle
competition. This dataset contains a training set consisting of
28709 images and a test set consisting of 7178 images. This
categorical data consists of 48x48 pixels grayscale images
with 7 facial expressions (0:Angry, 1:Disgusted, 2:Fearful,
3:Happy, 4:Neutral, 5:Sad and 6:Surprised). A sample of Fig. 3. Example of Max Pooling operation
images from the FER2013 dataset is found in Fig.1.
3) Dropout Layer: Dropout means removing data from
a neural network to improve the processing and efficiency
of results. This mainly removes similar features from the
input data in the layer. Thus, the dropout layer is used for a
neural network to tackle overfitting. To avoid overfitting,
we can also use early stopping in the training phase.
Fig. 1. Images of the FER-2013 dataset.

C. Image Preprocessing
The aim of pre-processing is to improve the quality of
the image. First, we rescale the image to transform every
pixel value from the range [0,255] [0,1], which is also
referred to as Normalization. This treats high and low pixel
images in the same manner (i.e., scales every image to the Fig. 4. Example of Dropout operation
same range) such that it contributes more evenly to the total
loss. The images are then converted into 48x48 pixel gray 4) Dense Layer: The dense layer is a frequently used
images to prevent unwanted density within neural networks. layer. Since all the neurons in this layer are connected to the
neurons in the next layer, this layer is also called the fully
connected layer. It performs matrix-vector multiplications.
This is useful for changing the dimensions (like rotating,
translation, scaling) of the vector. The below example
depicts that a 3-unit input layer is connected to a 5-unit
dense layer.

2
8) Adam Optimizer: Adam optimizer is a learning rate
method, which computes learning rates for various
parameters. It is a combination of Stochastic Gradient
Descent and RMSprop. In this optimizer for momentum, we
make use of the moving average of gradient which, is
similar to Stochastic Gradient Descent that uses the
gradient. It also uses square gradients to scale the dynamic
learning rate which is similar to the RMS prop.
Fig. 5. Example of Dense operation
9) Cross-Entropy Loss Function: Cross-Entropy is the
most important loss function. It is used in optimizing the
5) Flatten Layer: Flatten layer transforms our multi-
binary/multi-class classification models such that it
dimensional input tensor to a 1-D tensor. This means that
minimizes the loss. It calculates the difference between the
the pooled feature map is converted into a single column. In
probability of network outputs and of labels and then returns
flattening, each pixel has its neuron and thus it helps in
the total loss of the network. The smaller is the loss, the
analyzing every pixel. For example, the previous layer
better is the model. A model having cross-entropy loss as 0,
outputs a (4,4,128) tensor. Flattening this tensor gives an
is said to be perfect. This leads to improving generalization
input of (2048,1) tensor in the next layer.
and gives faster training.
E. Convolutional Neural Network Architecture
All Convolutional and Fully Connected layers process
2D data using ReLU (Rectified Linear Unit) and softmax
activation function. ReLU is used due to its faster
computation and softmax is used due to its multi-variate
classification. Max Pooling layer is used to highlight the
most present features in the image. The dropout layer is a
Fig. 6. Example of Flatten operation
simple way used to prevent overfitting and is capable of
better generalization. Since there are 7 classes of emotions,
6) ReLU Activation: ReLU (Rectified Linear Unit) we use categorical cross-entropy loss. Flatten layer is used
activation is commonly used as an activation function in to change the shape of the data into the correct format for
neural networks. This adds non-linearity to the neural the dense layer to interpret. The dense layer is used for final
network. This basically, returns 0 if it receives a negative classification.
value and for a positive value, it returns the same value.
Thus, helps in decreasing the time for training.

Fig. 7. ReLU Activation function

7) Softmax Activation: The softmax function is


commonly used with cross-entropy loss in multi-class
classification problems. This is used to predict the
multinomial probability distribution of the model. This
function returns the probability of all inputs such that the
sum of all probabilities is equal to 1. So, if one input is
negative or small then, the function gives a small probability
and if the input is large then, it gives a large probability but
Fig. 9. Summary of CNN Model
these probabilities will always be between 0 and 1.
F. Network Training
The training is conducted on the CNN model. For
training the network, we found the best optimizer to be
Adam, using categorical cross-entropy loss. The Adam
optimizer used was applied with a dynamic learning rate of
1e-4 such that it gets trained more efficiently and in less
time. Each batch containing 64 images is trained in
successive order, taking into account the updated weights
Fig. 8. Softmax Activation function
coming from the appliance of the previous batch. The total
epochs of this training are 50 and the steps of an epoch are
448. The training takes about 6 hours to complete.

3
V. CONCLUSION
This paper contemplated the development of a low cost
and effective solution for the challenging task of
recognizing human emotions based on their facial
expressions. The seven emotions that the system will be
able to detect are Angry, Disgust, Fear, Happy, Neutral,
Sad, and Surprised. This was achieved by the
implementation of a Convolutional Neural network for
training a dataset and achieving high classification accuracy.
The insignificant pixels (pixels outside facial expressions)
were reduced with the help of the Haar Cascade library
present in the OpenCV module. The technology proposed
can detect the real-time emotion of the objects present in the
video frame. The assessment of facial expressions was
based on the perspectives of several scientists in this
domain. It is hoped that this paper will have an impact on
the way behavioral analysis is carried out. The proper
Fig. 10. Block Diagram of Proposed System Architecture
implementation of this technology could change the way we
do many things in the present.
G. Real-Time Testing REFERENCES
Following the development of CNN's proposed [1] Vandal, T., McDuff, D., & El Kaliouby, R. Event detection: Ultra
architecture, the formed model underwent real-time testing. large-scale clustering of facial expressions. In Automatic Face and
Using the webcam’s video and OpenCV implementation of Gesture Recognition(FG), 2015 11th IEEE International Conference
the Haar Cascade library, we detect a face region. We and Workshops on Vol. 1, pages 1–8, 2015.
[2] Irene Kotsia, Ioan Buciu, and Ioannis Pitas. An analysis of facial
extract, grayscale, and resize the face region to 48x48. Then, expression recognition under partial facial image occlusion. Image
we use the Convolutional Neural Network model to predict and Vision Computing, 26(7):1052–1067, 2008.
the probability distribution over emotions and display the [3] Nakasone, Arturo, Helmut Prendinger, and Mitsuru Ishizuka.
respective emotion above the face region. "Emotion recognition from electromyography and skin conductance."
Proc. of the 5th international workshop on biosignal interpretation.
2005.
[4] Mehendale, N. Facial emotion recognition using convolutional neural
networks (FERC). SN Appl. Sci. 2, 446 (2020).
[Link]
[5] Li, Chieh-En James and Zhao, Lanqing, "Emotion Recognition using
Convolutional Neural Networks" (2019). Purdue Undergraduate
Research Conference. 63.
[Link]
[6] M. A. Ozdemir, B. Elagoz, A. Alaybeyoglu, R. Sadighzadeh and A.
Akan, "Real Time Emotion Recognition from Facial Expressions
Using CNN Architecture," 2019 Medical Technologies Congress
(TIPTEKNO), 2019, pp. 1-4, doi:
10.1109/TIPTEKNO.2019.8895215.
[7] Tanner Gilligan, Baris Akis Emotion AI, Real-Time Emotion
Fig. 11. Working of the model Detection using CNN " [Link]
prev_projects_2016/[Link]
IV. RESULT AND DISCUSSION [8] R. Pathar, A. Adivarekar, A. Mishra and A. Deshmukh, "Human
Emotion Recognition using Convolutional Neural Network in Real
In this paper, we noted the significant interest of Time," 2019 1st International Conference on Innovations in
researchers in FER via deep learning over recent years. The Information and Communication Technology (ICIICT), 2019, pp. 1-7,
automatic FER task goes through different steps like data doi: 10.1109/ICIICT1.2019.8741491.
processing, proposed model architecture, and finally [9] Huang, Y.; Chen, F.; Lv, S.; Wang, X. Facial Expression
Recognition: A Survey. Symmetry 2019, 11, 1189
emotion recognition. [Link]
This implementation automatically senses emotions on [10] Saravanan, Akash & Perichetla, Gurudutt & K.s, Gayathri. (2019).
Facial Emotion Recognition using Convolutional Neural Networks.
each face in the webcam stream. Using a simple 4-layer [11] G. A. R. Kumar, R. K. Kumar and G. Sanyal, "Facial emotion
CNN with 50 epochs, the train dataset was 85% accurate analysis using deep convolution neural network," 2017 International
and the test dataset was 63.2% accurate. Conference on Signal Processing and Communication (ICSPC), 2017,
pp. 369-374, doi: 10.1109/CSPC.2017.8305872.
[12] Dachapally, Prudhvi. (2017). Facial Emotion Detection Using
Convolutional Neural Networks and Representational Autoencoder
Units.
[13] Helaly, Rabie & Hajjaji, Mohamed & Faouzi, M'Sahli & Abdellatif,
Mtibaa. (2020). Deep Convolution Neural Network Implementation
for Emotion Recognition System. 261-265.
10.1109/STA50679.2020.9329302.
[14] Ghaffar, Faisal. (2020). Facial Emotions Recognition using
Convolutional Neural Net.
Fig. 12.

View publication stats

You might also like