0% found this document useful (0 votes)

5 views7 pages

Comparative Machine Learning for Lipreading

Paper 2

Uploaded by

shambhuteja27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views7 pages

Comparative Machine Learning for Lipreading

Paper 2

Uploaded by

shambhuteja27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lipreading Using a Comparative Machine Learning

Approach
Ziad Thabet Amr Nabih Karim Azmi
Faculty of Computer Science Faculty of Computer Science Faculty of Computer Science
MISR INTERNATIONAL UNIVERSITY MISR INTERNATIONAL UNIVERSITY MISR INTERNATIONAL UNIVERSITY
Cairo, Egypt Cairo, Egypt Cairo, Egypt
Ziad1407174@[Link] Amr1410718@[Link] Karim1405338@[Link]

Youssef Samy Ghada Khoriba Mai Elshehaly

Faculty of Computer Science Faculty of Computers and Information School of Computing
MISR INTERNATIONAL UNIVERSITY Helwan University University of Leeds
Cairo, Egypt Cairo, Egypt Leeds, UK
Youssef1410209@[Link] ghada khoriba@[Link] [Link]@[Link]

Abstract—Lipreading is the process of interpreting spoken The recent advent of novel machine learning and signal
word by observing lip movement. It plays a vital role in human processing approaches have increased researchers’ interest
communication and speech understanding, especially for hearing- in automating the process of lipreading. This attention is
impaired individuals. Automated lipreading approaches have
recently been used in such applications as biometric identifi- motivated by the promising results of lipreading in application
cation, silent dictation, forensic analysis of surveillance camera areas such as human-computer interaction, forensic analysis
capture, and communication with autonomous vehicles. However, of surveillance camera capture, biometric identification, silent
lipreading is a difficult process that poses several challenges to dictation, and autonomous vehicles [1].
human- and machine-based approaches alike. This is due to the However, the recognition of lip motion presents several
large number of phonemes in human language that are visually
represented by a smaller number of lip movements (visemes). challenges to linear classifiers. Mainly because the features
Consequently, the same viseme may be used to represent several used in the classification are calculated from a sequence of
phonemes, which confuses any lipreader. In this paper, we shapes that the lip takes, also known as “visemes”. The number
present a detailed study of the machine learning approach for of visemes that the lip can take is between 10 and 14 [2],
the real-time visual recognition of spoken words. Our focus whereas the number of phonemes (i.e. acoustic sounds) that
on real-time performance is motivated by the recent trend of
using lipreading in autonomous vehicles. In this paper, machine can be produced by these visemes exceeds 50. This mismatch
learning approaches are applied to recognize lip-reading and nine between visual and audio signals creates new horizons in
different classifiers has been implemented and tested, reporting machine learning research. It motivates the quest for improved
their confusion matrices among different groups of words. The visual features and classifiers to bridge the gap between what
classification process went on more than one classifier but these has been spoken and what is visually perceived.
three classifiers got the best results which are GradientBoosting,
Support Vector Machine(SVM) and logistic regression with In this paper, we present LipDrive: a novel system for
results 64.7%, 63.5% and 59.4% respectively. visual speech recognition that targets autonomous vehicles
Index Terms—Lipreading, Classification, Autonomous Vehi- as an application. The focus here is on the application area
cles, Speech Recognition. of autonomous vehicles due to its thriving nature and the
possibilities that lipreading can offer. Human-computer inter-
I. I NTRODUCTION action approach is taken to characterize the challenges and
Lipreading, widely known as visual speech recognition opportunities of lipreading in facilitating the communication
(VSR), is a process that aims to interpret and understand between humans and autonomous vehicles, especially in noisy
spoken words by using only the visual signal produced by car environments. Furthermore, a comparative analysis of
lip movement. Lipreading plays a crucial role in both human- nine different linear classifiers that we tested in LipDrive is
human and human-computer interaction. For example, people presented . Their performance were studied in lipreading using
use lipreading in their daily conversations to understand one raw visual features as well as using a preprocessed feature
another in noisy environments and in situations where the set. Through presenting our experimental results, we aim to
audio speech signal is not readily comprehensible. Therefore, propose a set of guidelines for researchers working in the
the skill of lipreading has long been mastered by individuals area of lipreading that can steer their choice of classification
with hearing impairment. It enables them to understand speech method and preprocessing steps.
and maintain social activities without relying on the perception The main contribution of this paper can be summarized as
of sounds. follows:

978-1-5386-5083-7/18/$31.00 ©2018 IEEE

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
19
• A novel lipreading system called LipDrive that is to be algorithm passes by two main steps which are features extrac-
deployed in an autonomous vehicle setting tion and classification of the word. The features extraction
• A comparative analysis of nine classifiers for lipreading process passes by five steps: Video Acquisition, Face and
• Experimental results using preprocessed and raw visual Mouth Detection, Intensity Equalization, Keypoint Extraction
features for classification and Geometric Feature extraction. The word classification is
• A set of design guidelines for visual speech recognition done using Learning Vector Quantization neural network.
In Section II, A full description of the state-of-the-art in
lipreading research is provided. The rest of the paper is Lesani et al. [7] introduced a new method for mobile phones
organized as follow, the description of LipDrive system in Sec- security which is lip-authentication. This method could be
tion III. The experimental approach is outlined in Section IV, used in mobile banking application to ensure the security of
and the results of our comparative analysis are presented in customer’s accounts. The mobile phone camera has the ability
Section V. Finally, Section VI offers our concluding remarks to extract the lip movements and send them to lip-reading
and lays the foundation for our future work. algorithms to classify the security word like password.

III. S YSTEM OVERVIEW

II. R ELATED W ORK
In order to reach high accuracy with real-time recognition
Assael et al. [3] showed that lips movements can be of spoken words, LipDrive system constitutes of six different
extracted while speaking and convert those movements into data processing stages. Figure 1 depicts these six stages in the
written text and researchers also showed that the conversion form of a pipeline and a detailed description of each stage is
process could be based on sentence level instead of working on described in this section.
word level. There were different problems cited by researchers
during there experiments such as designing and learning the
facial features, and the prediction of sentence itself. They
worked on different deep learning approaches to extract the
lips movements and to classify the spoken word.

Chung et al. [4] showed that lip recognition systems have

the ability to understand spoken words using only visual
features and those systems could help in recognizing the
spoken words in corrupted videos without their audio files.
Researchers were aiming to build a system that read lips Fig. 1. System Overview
independently. Researchers had collected a large dataset from
TV broadcasts and they built a deep learning architectures that A. Image Acquisition
effectively learn and recognize hundreds of words. The image acquisition stage receives a raw video as in-
put. This video captures a spoken work within a specific
Garg et al. [5] have discussed different methods for words environment. This stage aims to create a sequence of frames
and phrases prediction from videos without their audio files or images from the captured video and to reduce the effect
and also they have discussed that the process of visual lip- of environmental factors on the quality of the frames in the
reading is important in Human computer interaction and it sequence.
can replace the audio speech recognition technology as it The captured video is first sliced into individual frames
may be difficult in noisy environments and the variation of using openCV Python Library. Next, the acquired frames are
inputs as different people speak different accents. Researchers converted to gray scale using openCV Python Library. The
have concatenated a fixed number of images on the pre- resulting frames are then passed along to the next stage for
trained VGGnet model, they have used the nearest neighbor feature extraction.
interpolation to normalize the number of images per sequence
and they have fed to LSTM and RNN the extracted features B. Feature Extraction
by VGGnet model to classify the word. The goal of the feature extraction stage is to reduce the size
of the images that are received from the acquisition stage. This
Rathee [6] has defined lip-reading as the recognition of lip step is motivated by the famous curse of dimensionality in
movements patterns while speaking. The author also added machine learning [8]. Namely, if we were to use the original
that the visual speech recognition has motivated researchers images as input to the classifier, each pixel would represent a
towards lip-reading. The author has mentioned that speech feature. The reliability and efficiency of reading the lips from
recognition systems are facing a major problem due to noisy the images recieved is vital, because the number of pixels is
environments and added that lip-reading can help hearing typically large and varies with image resolution and camera
impaired or dumb people to communicate normally with other quality. Furthermore, the majority of the captured pixels are
people. An automated lip-reading is proposed, the proposed irrelevant to the classifier.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
20
Therefore, instead of passing videos and pictures, we extract Where,
only the needed features from the videos. This is realized θ
through passing the gray scale images to the features extraction R=
XM ax − px
stage using ”DLib” which is a modern C++ library that
H = YM ax − py
implements a multitude of machine learning algorithms [9].
“Shape Predictor 68 Face Landmarks” is used to detect the D. Concatenation
human face in images and to extract the 68 landmarks of the Individual frames are passed through the face detection,
face. These landmarks represent points on the mouth, nose, feature extraction, cropping and normalizing processes de-
eyes and so forth as shown in Figure 2. scribed above. However, since classifying a word based on
Furthermore, the number of landmarks is reduced to twenty individual frames is rarely ever the case, we concatenate the
points from each frame, that represents the features of the lips, frames back to form a sequence of feature vectors (Figure 4).
as shown in Figure 3. These points of each frame are then This process creates a training dataset that has the sequence
translated to the Z-order, by calculating a z-value that has the of feature vectors as input and the spoken word as class label.
ability to translate a 2D point (x, y) to a one-dimensional For example, if the word ”ABOUT” is captured in 10 frames,
value. This value is calculated by interleaving the binary each of which contains 20 features, this will lead 200 features
representations of its coordinate values. that produces the sequence for that word.

Fig. 2. Face Detection using DLib

Fig. 4. Frames Concatenation

E. Training and Validation

In this stage, sequences of feature vectors are taken for a
number of words. We strive to create a large enough training
dataset, so for each word we consider a number of videos that
capture the same word as spoken by different individuals. The
feature vector sequences for a set of words is then fed into the
Fig. 3. Lip Feature Extraction classifier for training and model generation purposes. Next, a
different dataset is fed to the classifier for validation.
C. Cropping
F. Classification
Taking into consideration the various positions of the user Once a classifier model is built and validated during the
in front of the camera, which yields different positions of the previous stage, we get to the point of real time lipreading. In
speaker’s facial landmarks, we crop the image to the mouth this stage, we extract the same features from the user’s face.
level to reduce the environmental variability in the extracted We note here that videos are captured via a portable device to
features. In addition, the distance between the speaker’s face be streamed continuously to the server for feature extraction.
and the camera could vary from one speaker to another, which The extracted features are then fed to the classifier model, so
makes the features extracted ambiguous at times. In order to the classifier can predict the spoken word which then translated
unify it, all images are normalized to the same width and to a command. Finally, the server will respond by executing
relevant height to its ratio. Equation 1 defines the calculation the command intended.
of a normalized point (p∗x , p∗y ) from each point (px , py ) on the
lips, in which a normalization scale is defined as θ. IV. D ESIGN OF E XPERIMENTS
The six-stage approach is tested with ten different clas-
(p∗x , p∗y ) = ((px × R), (py × (H × R)/H)) (1) sifiers. The purpose of our experiments is to gain a deeper

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
21
understanding of the strengths and weaknesses of the classi-
ﬁers while attempting to classify different words with varying
visual and phonetic similarities.

A. Dataset

In order to cover a breadth of training words and to have a

large dictionary, we have to work on a large-scale dataset. In
our experiments, we used a benchmark dataset that consists of
about one million instances of 500 different words of different
speakers [10]. Each word is consisted of 1000 training videos,
50-100 validation videos and 50-100 testing videos. All videos
have the length of 29-frames with duration of 1.16 seconds,
the spoken word exists in the middle of the video. In addition,
the speakers’ position are not the ﬁxed, meaning that their face
is not always looking directly to the camera, some are talking
facing to someone next to them or their faces are far from the Fig. 5. Naive Bayes’ Confusion Matrix
camera which makes it more challenging.

2) Experiment 2 - Quadratic Discriminant Analysis (QDA):

B. Procedure We used Quadratic Discriminant Analysis that fits class den-
sities to the data and based on Bayes theorem and we got the
accuracy of 32.3%.
Using our large-scale dataset, composed of sequences of
vectors and labels, we tested different classifiers in order to Figure 6 depicts the confusion matrix for this experiment.
fit the data. After extracting lips features, we passed through
different stages. First, we used 5 words from our training
dataset to feed each classifier, those words are ”ABOUT”,
”AROUND”, ”ATTACK”, ”BENEFITS” and ”BETWEEN”.
These words were chosen due to their visual and phonetic
similarity. For example, the similarity between ”ABOUT” and
”AROUND”, they have almost the same lips movements which
makes it more challenging for the classifiers to detect them.
Second, we predicted different sequences for the same 5 words
but from our testing dataset. Third, we calculated the accuracy
of each classifier using predefined scoring functions. Fourth,
we visualized each classifiers’ confusion matrix.

V. E XPERIMENTAL R ESULTS

In this section, the achieved accuracy from each classiﬁer

is reported and visualized the confusion matrix resulting from
each, and discussing some general guidelines based on our
ﬁndings. Fig. 6. Quadratic Discriminant Analysis’s Confusion Matrix

1) Experiment 1 - Naive Bayes (NB): We used Naive

Bayes classifier that is based on Bayes’ theorem for objects
classification and we got the accuracy of 26.6%. Naive Bayes’.
3) Experiment 3 - SGDClassifier: We used SGDClassifier
Figure 5 depicts the confusion matrix for this experiment as
and we got the accuracy of 45.9%.
we can see the naive bayes is very weak when its dealing with
large number of features it. Figure 7 depicts the confusion matrix for this experiment.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
22
Fig. 7. SGDClassiﬁer’s Confusion Matrix Fig. 9. AdaBoost Classiﬁer’s Confusion Matrix

4) Experiment 4 - Multi-Layer Perceptron Classifier (MLP): 6) Experiment 6 - Linear Discriminant Analysis Classifier
We used Multi-Layer Perceptron which is a neural net- (LDA): We used Linear Discriminant Analysis classifier that
work classifier and that optimizes the log-loss function using fits class densities to the data and based on Bayes theorem.
LBFGS. We got the accuracy of 48.3%. We got the accuracy of 56.1%.
Figure 8 depicts the confusion matrix for this experiment. Figure 10 depicts the confusion matrix for this experiment.

Fig. 8. Multi-Layer Perceptron Classiﬁer’s Confusion Matrix Fig. 10. Linear Discriminant Analysis Classiﬁer’s Confusion Matrix

5) Experiment 5 - AdaBoost Classifier: We used AdaBoost 7) Experiment 7 - Logistic Regression Classifier (LR): We
classifier that fits the model with the training dataset and then used Logistic Regression classifier that analyzes independent
fits the model with additional copies of the pre-trained model. variables to determine an outcome. We got the accuracy of
We got the accuracy of 54.5%. 59.4%.
Figure 9 depicts the confusion matrix for this experiment. Figure 11 depicts the confusion matrix for this experiment.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD.23
Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 11. Logistic Regression Classiﬁer’s Confusion Matrix Fig. 13. Gradient Boosting Classiﬁer’s Confusion Matrix

A. Discussion of Results
8) Experiment 8 - Support Vector Machine Classifier The results have showed that most of the classifiers get con-
(SVM): We used Support Vector Machine classifier that an- fused between word ”About” and ”Between” and the accuracy
alyzes data for classification and regression analysis. We got between the classifiers are almost close to each others, this is
the accuracy of 63.5%. due to the small number of words being trained. However, on
Figure 12 depicts the confusion matrix for this experiment. increasing the number of words needed to be trained, the linear
classifiers’ accuracy starts to decrease directly proportional by
increasing the words. Thus, using neural networks classifiers
is essential for large scale dataset and this was clear when we
started to use this large data on MLP classifier. Meanwhile, the
usage of CNN is recommended to be used in order to have
promising results. In addition, to ensure high accuracy with
real-time processing, we recommend to use RNN and LSTM
classifiers.

VI. C ONCLUSION AND FUTURE WORK

Lip-Reading is a new way for enhancing speech recognition

however, there are some constraints to reach this accuracy.
One of these constraints is the variant light conditions that the
camera could face, the lip-reading process is mainly conducted
under the ideal lighting conditions. In addition, the position of
the speaker’s face to the camera matters, the speaker has to
look directly to the camera in order to ensure clear detection.
Not only the position of the speaker, but also the distance
between the speaker and the camera has to be near enough to
Fig. 12. Support Vector Machine’s Confusion Matrix detect the lips clearly. Speaker should consider not to be far to
deliver the command. Furthermore, working on the phoneme
level would widen the words being detected and would make
it more easier. We also believe that using audio-visual methods
9) Experiment 9 - Gradient Boosting Classifier: We used would increase the accuracy, meaning that we can depend on
Gradient Boosting classifier that produces prediction model in both the sound and lips recognition to ensure high accuracy.
the form of an ensemble of weak prediction models, typically By using new datasets like extracting features using DWT may
decision trees. We got the accuracy of 64.7%. give a huge boost to the accuracy of classifiers since it reduces
Figure 13 depicts the confusion matrix for this experiment. number of dimensions that is being processed.

24Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD.
R EFERENCES
[1] A. Hassanat, “Visual speech recognition,” arXiv preprint
arXiv:1409.1411, 2014.
[2] Y. Lan, B.-J. Theobald, R. Harvey, E.-J. Ong, and R. Bowden, “Improv-
ing visual features for lip-reading,” in Auditory-Visual Speech Processing
2010, 2010.
[3] Y. M. Assael, B. Shillingford, S. Whiteson, and N. de Freitas, “Lipnet:
end-to-end sentence-level lipreading,” 2016.
[4] J. S. Chung and A. Zisserman, “Lip reading in the wild,” in Asian
Conference on Computer Vision, pp. 87–103, Springer, 2016.
[5] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu, “Towards better
analysis of deep convolutional neural networks,” IEEE transactions on
visualization and computer graphics, vol. 23, no. 1, pp. 91–100, 2017.
[6] N. Rathee, “A novel approach for lip reading based on neural network,”
in Computational Techniques in Information and Communication Tech-
nologies (ICCTICT), 2016 International Conference on, pp. 421–426,
IEEE, 2016.
[7] F. S. Lesani, F. F. Ghazvini, and R. Dianat, “Mobile phone security
using automatic lip reading,” in e-Commerce in Developing Countries:
With focus on e-Business (ECDC), 2015 9th International Conference
on, pp. 1–5, IEEE, 2015.
[8] P. Domingos, “A few useful things to know about machine learning,”
Communications of the ACM, vol. 55, no. 10, pp. 78–87, 2012.
[9] R.-L. Hsu, M. Abdel-Mottaleb, and A. K. Jain, “Face detection in
color images,” IEEE transactions on pattern analysis and machine
intelligence, vol. 24, no. 5, pp. 696–706, 2002.
[10] J. S. Chung and A. Zisserman, “Lip reading in the wild,” in Asian
Conference on Computer Vision, 2016.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
25

DL Review
No ratings yet
DL Review
4 pages
AI Lipreading with 3D-CNN and BiLSTM
No ratings yet
AI Lipreading with 3D-CNN and BiLSTM
21 pages
DL Lip Arch
No ratings yet
DL Lip Arch
6 pages
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
No ratings yet
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
11 pages
LipWiz: Advancements in Lip Reading
No ratings yet
LipWiz: Advancements in Lip Reading
6 pages
Hybrid Attention in 3D CNN for Lip Reading
No ratings yet
Hybrid Attention in 3D CNN for Lip Reading
11 pages
Survey of Deep Learning Lip-Reading
No ratings yet
Survey of Deep Learning Lip-Reading
22 pages
LipNet: Automated Sentence-Level Lipreading
No ratings yet
LipNet: Automated Sentence-Level Lipreading
7 pages
LipReadNet: Deep Learning for Lip Reading
No ratings yet
LipReadNet: Deep Learning for Lip Reading
6 pages
Lip Reading Visual Speech Recognition Us-2
No ratings yet
Lip Reading Visual Speech Recognition Us-2
4 pages
Deep Learning for Lip Reading Analysis
No ratings yet
Deep Learning for Lip Reading Analysis
7 pages
Sub-word Level Lip Reading Techniques
No ratings yet
Sub-word Level Lip Reading Techniques
9 pages
Deep Learning in Lip Reading Techniques
No ratings yet
Deep Learning in Lip Reading Techniques
8 pages
Deep Learning for Lip Reading 2024
No ratings yet
Deep Learning for Lip Reading 2024
8 pages
Lip Reading with 3D CNN and LSTM
No ratings yet
Lip Reading with 3D CNN and LSTM
7 pages
Vision-Based Lip Reading with Deep Learning
No ratings yet
Vision-Based Lip Reading with Deep Learning
7 pages
Automated Lip Reading with CNNs
No ratings yet
Automated Lip Reading with CNNs
28 pages
Deep Learning in Lip-Reading: A Review
No ratings yet
Deep Learning in Lip-Reading: A Review
6 pages
Advanced Lip Reading with WLAS Model
No ratings yet
Advanced Lip Reading with WLAS Model
10 pages
LRW-1000: Benchmark for Lip Reading
No ratings yet
LRW-1000: Benchmark for Lip Reading
8 pages
Language-Independent Lip Reading System
No ratings yet
Language-Independent Lip Reading System
4 pages
Deep Learning for Audio Visual Speech Recognition
No ratings yet
Deep Learning for Audio Visual Speech Recognition
7 pages
Enhancing Lip-Reading with 3D Data
No ratings yet
Enhancing Lip-Reading with 3D Data
1 page
Phoneme-Based Lip Reading System
No ratings yet
Phoneme-Based Lip Reading System
5 pages
Phoneme-Based Lip-Reading System
No ratings yet
Phoneme-Based Lip-Reading System
10 pages
Research Paper
No ratings yet
Research Paper
10 pages
Batch9 Project Report April 2 ChangesNeeded
No ratings yet
Batch9 Project Report April 2 ChangesNeeded
95 pages
Denoising Images with GANs for Lip Reading
No ratings yet
Denoising Images with GANs for Lip Reading
6 pages
Lip Reading Techniques A Review
No ratings yet
Lip Reading Techniques A Review
6 pages
Se Project
No ratings yet
Se Project
172 pages
Enhancing Visual Speech Recognition
No ratings yet
Enhancing Visual Speech Recognition
15 pages
Deep Audio-Visual Speech Recognition
No ratings yet
Deep Audio-Visual Speech Recognition
13 pages
Visual Speech Recognition with ST3D Model
No ratings yet
Visual Speech Recognition with ST3D Model
17 pages
AI Lip Reading System Research Report
No ratings yet
AI Lip Reading System Research Report
2 pages
Deep Learning for Lip Reading in Urdu
No ratings yet
Deep Learning for Lip Reading in Urdu
5 pages
Deep Learning for Lip Reading System
No ratings yet
Deep Learning for Lip Reading System
20 pages
Hybrid CNN-ViT for Lip Reading 2024
No ratings yet
Hybrid CNN-ViT for Lip Reading 2024
11 pages
Turkish Lip Reading with Deep Learning
No ratings yet
Turkish Lip Reading with Deep Learning
14 pages
Lightweight Lipreading System Design
No ratings yet
Lightweight Lipreading System Design
11 pages
LipiCode: Machine Learning for Silent Speech
No ratings yet
LipiCode: Machine Learning for Silent Speech
18 pages
Deep Lip Reading Models and LRS2 Benchmark
No ratings yet
Deep Lip Reading Models and LRS2 Benchmark
8 pages
Turkish Lip Reading with Deep Learning
No ratings yet
Turkish Lip Reading with Deep Learning
12 pages
Personalized Lip Reading: Adapting To Your Unique Lip Movements With Vision and Language
No ratings yet
Personalized Lip Reading: Adapting To Your Unique Lip Movements With Vision and Language
10 pages
Multilingual Visual Speech Recognition Report
No ratings yet
Multilingual Visual Speech Recognition Report
36 pages
Lightweight Lipreading System Design
No ratings yet
Lightweight Lipreading System Design
14 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
CALLip: Advanced Lipreading Framework
No ratings yet
CALLip: Advanced Lipreading Framework
9 pages
Turkish Digit Lip-Reading with 3DCNN
No ratings yet
Turkish Digit Lip-Reading with 3DCNN
22 pages
A Multimodal German Dataset For Automatic Lip Reading Systems and Transfer Learning
No ratings yet
A Multimodal German Dataset For Automatic Lip Reading Systems and Transfer Learning
8 pages
Audiovisual Speech Recognition with CNN
No ratings yet
Audiovisual Speech Recognition with CNN
10 pages
Multimodal Spoken Language ID Method
No ratings yet
Multimodal Spoken Language ID Method
5 pages
Enhancing Lip Reading with Speech Distillation
No ratings yet
Enhancing Lip Reading with Speech Distillation
8 pages
Deep Learning for Lip Reading Words
No ratings yet
Deep Learning for Lip Reading Words
9 pages
Deep Learning for Automatic Lipreading
No ratings yet
Deep Learning for Automatic Lipreading
19 pages
Lip Decoder: Advanced Lip Movement Analysis
No ratings yet
Lip Decoder: Advanced Lip Movement Analysis
11 pages
Computer Scinece - 09 Answer Sheet
No ratings yet
Computer Scinece - 09 Answer Sheet
5 pages
Implementing Decision Tree Algorithms
No ratings yet
Implementing Decision Tree Algorithms
80 pages
Advanced SystemVerilog & UVM Course
No ratings yet
Advanced SystemVerilog & UVM Course
7 pages
Essential Computer Science MCQs
100% (1)
Essential Computer Science MCQs
33 pages
Compression and Encryption Techniques
No ratings yet
Compression and Encryption Techniques
6 pages
Conflation Algorithm Implementation Guide
No ratings yet
Conflation Algorithm Implementation Guide
6 pages
Matrix Chain Multiplication Guide
No ratings yet
Matrix Chain Multiplication Guide
3 pages
Jack Nolan's Computer Science Resume
No ratings yet
Jack Nolan's Computer Science Resume
1 page
Locality of Reference in Cache Memory
No ratings yet
Locality of Reference in Cache Memory
139 pages
CS2030S Practical Assessment II Guide
No ratings yet
CS2030S Practical Assessment II Guide
21 pages
EECS 3401: Design & Analysis of Algorithms
No ratings yet
EECS 3401: Design & Analysis of Algorithms
38 pages
User Input and Looping in Python
No ratings yet
User Input and Looping in Python
22 pages
Chapter 3 Problem Solutions Explained
No ratings yet
Chapter 3 Problem Solutions Explained
6 pages
Understanding Recursion in Java
No ratings yet
Understanding Recursion in Java
31 pages
IAA Mod1@azdocuments - in
No ratings yet
IAA Mod1@azdocuments - in
53 pages
Icain 2025
No ratings yet
Icain 2025
41 pages
ASCII vs Unicode: Character Encoding Explained
No ratings yet
ASCII vs Unicode: Character Encoding Explained
3 pages
Data Structures Tutorials in C
No ratings yet
Data Structures Tutorials in C
15 pages
Advanced Pointer Concepts in C++
No ratings yet
Advanced Pointer Concepts in C++
33 pages
B Tree Structure: Insertion & Deletion Guide
No ratings yet
B Tree Structure: Insertion & Deletion Guide
19 pages
Dynamic Arrays for Stacks and Queues
No ratings yet
Dynamic Arrays for Stacks and Queues
9 pages
250 Java Interview Questions Guide
No ratings yet
250 Java Interview Questions Guide
12 pages
Excel Formulas and Functions Guide
No ratings yet
Excel Formulas and Functions Guide
3 pages
DS Projects Evaluation Report 2024-25
No ratings yet
DS Projects Evaluation Report 2024-25
17 pages
Programming Languages Overview and Analysis
No ratings yet
Programming Languages Overview and Analysis
7 pages
Ph.D. Coursework Syllabus in CS
No ratings yet
Ph.D. Coursework Syllabus in CS
19 pages
B.Tech AI & ML Evaluation Scheme 2023-24
No ratings yet
B.Tech AI & ML Evaluation Scheme 2023-24
7 pages
OOP Concepts and Multithreading in Python
No ratings yet
OOP Concepts and Multithreading in Python
23 pages
SPCC Lec 22
No ratings yet
SPCC Lec 22
17 pages
Krsukal's Algorithm for Minimum Spanning Tree
No ratings yet
Krsukal's Algorithm for Minimum Spanning Tree
3 pages

Comparative Machine Learning for Lipreading

Uploaded by

Comparative Machine Learning for Lipreading

Uploaded by

Lipreading Using a Comparative Machine Learning

Youssef Samy Ghada Khoriba Mai Elshehaly

978-1-5386-5083-7/18/$31.00 ©2018 IEEE

III. S YSTEM OVERVIEW

Chung et al. [4] showed that lip recognition systems have

Fig. 2. Face Detection using DLib

E. Training and Validation

In order to cover a breadth of training words and to have a

2) Experiment 2 - Quadratic Discriminant Analysis (QDA):

In this section, the achieved accuracy from each classiﬁer

1) Experiment 1 - Naive Bayes (NB): We used Naive

VI. C ONCLUSION AND FUTURE WORK

Lip-Reading is a new way for enhancing speech recognition

You might also like