0% found this document useful (0 votes)
10 views86 pages

CNN for American Sign Language Translation

The document presents a project report on a Convolutional Neural Network (CNN) designed for translating American Sign Language (ASL) into text or speech, aimed at facilitating communication for the hearing impaired. It highlights the challenges of sign language recognition, such as the complexity of hand gestures and the need for effective feature extraction, and proposes a CNN model that utilizes multiple video input channels for improved performance. The project is submitted by Y. Rajeswari as part of the requirements for a Master of Technology degree at Jawaharlal Nehru Technological University Ananthapur, under the guidance of Dr. M. Rudra Kumar.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views86 pages

CNN for American Sign Language Translation

The document presents a project report on a Convolutional Neural Network (CNN) designed for translating American Sign Language (ASL) into text or speech, aimed at facilitating communication for the hearing impaired. It highlights the challenges of sign language recognition, such as the complexity of hand gestures and the need for effective feature extraction, and proposes a CNN model that utilizes multiple video input channels for improved performance. The project is submitted by Y. Rajeswari as part of the requirements for a Master of Technology degree at Jawaharlal Nehru Technological University Ananthapur, under the guidance of Dr. M. Rudra Kumar.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Project Report on

CONVOLUTIONAL NEURAL NETWORK – AMERICAN SIGN


LANGUAGE TRANSLATION SYSYTEM
is submitted in partial fulfillment of the requirement for the award of degree of
MASTER OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
of
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY ANANTAPUR,
ANANTHAPURAMU

By
Y. RAJESWARI
(Reg No: 21AT1D5803)

Under the Guidance of


Dr. M. RUDRA KUMAR
Professor
[Link] COLLEGE OF ENGINEERING & TECHNOLOGY

Department of CSE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

G. Pullaiah College of Engineering and Technology


(Accredited by NAAC with ‘A’ Grade Approved by AICTE, NEW DELHI,
Permanently Affiliated to JNTUA, Ananthapuramu,
Recognized by UGC under 2(f)&12(B), ISO 9001:2001Certified institution
Nandikotkur Road, Venkayapalli, Kurnool-518452)

2021 - 2023
G. Pullaiah College of Engineering and Technology
(Accredited by NAAC with ‘A’ Grade Approved by AICTE, NEW DELHI,
Permanently Affiliated to JNTUA, Ananthapuramu,
Recognized by UGC under 2(f)&12(B), ISO 9001:2001Certified institution
Nandikotkur Road, Venkayapalli, Kurnool-518452)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DECLARATION

I hereby declare that the project entitled “CONVOLUTIONAL NEURAL NETWORK –


AMERICAN SIGN LANGUAGE TRANSLATION SYSTEM” done by me under the guidance of
Dr. M. RUDRA KUMAR Professor, Department of Computer Science and Engineering, G. Pullaiah
College of Engineering &Technology, Kurnool submitted in partial fulfillment of the requirement for
the award of the degree of Master of Technology in COMPUTER SCIENCE AND
ENGINEERING. This is a record of bonafide work carried out by me and the results embodied in
this project have not been reproduced or copied from any source. The results embodied in this thesis
have not been submitted to this institute or any other institute or University for the award of any Degree
or Diploma.

BY
Y. RAJESWARI
(21AT1D5803)
G. Pullaiah College of Engineering and
Technology
(Autonomous)
(Accredited by NAAC with ‘A’ Grade Approved by AICTE, NEW DELHI,
Permanently Affiliated to JNTUA, Ananthapuramu,
Recognized by UGC under 2(f)&12(B), ISO 9001:2001Certified institution
Nandikotkur Road, Venkayapalli, Kurnool-518452)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
This is to certify that the dissertation entitled “Convolutional Neural Network – American Sign
Language Translation System” is a bonafide work done by Y. RAJESWARI bearing Register Number
21AT1D5803 in partial fulfillment of the requirement for the award of degree of Master of Technology
in COMPUTER SCIENCE AND ENGINEERING of Jawaharlal Nehru Technological University
Ananthapuramu.

Project Guide HOD

Dr. M. RUDRA KUMAR Dr. Mrs. M. SRILAKSHMI


Professor Associate Professor

Place: KURNOOL
Date:
Certified that the candidate is examined by me in the viva-voce Examination held at G. Pullaiah
College of Engineering &Technology, Kurnool on ______________

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT

I take this opportunity to express my deep sense of gratitude to my guide Dr. M. RUDRA KUMAR,

Professor in the department of Computer Science and Engineering of GPCET, for his kind co-operation

and encouragement for the successful completion of this project.

I am also very thankful to our Head of the Department Dr. Mrs. M. SRILAKSHMI for her

kind co-operation and encouragement for the successful completion of this project

I am also thankful to all the Teaching and Non-Teaching Staff members of the department who

rendered their kind co-operation of this project.

Finally, I acknowledge with gratitude the unflagging support and patience of my PARENTS for their

guidance and encouragement during this dissertation work.

By
Y. RAJESWARI
(21AT1D5803)
ABSTRACT

American Sign Language Translation on interpreting the sign language into text or speech, to
facilitate the communication between deaf-mute people and ordinary people. This task has a broad
social impact but is still very challenging due to the complexity and large variations in hand actions.
Existing methods for SLR use hand-crafted features to describe sign language motion and build
classification models based on those features. However, it is difficult to design reliable features to
adapt to the large variations of hand gestures. To approach this problem, we propose a convolutional
neural network (CNN) which extracts discriminative spatial-temporal features from raw video streams
automatically without any prior knowledge, avoiding designing features. To boost the performance,
multi-channels of video streams, including color information, depth clue, and body joint positions, are
used as input to the CNN in order to integrate color, depth and trajectory information. We validate the
proposed model on a real dataset collected with Microsoft Kinect and demonstrate its effectiveness
over the traditional approaches based on hand-crafted features.
CONTENTS
Page. No
Abstract
Contents
List of Figures and Tables
CHAPTER 1 INTRODUCTION 1
1.1 Introduction 2
CHAPTER 2 LITERATURE REVIEW 3
2.1 Literature survey 4
CHAPTER 3 EXISTING SYSTEM AND PROPOSED SYSTEM 6
3.1 Existing System 7
3.2 Proposed System 7
3.3 Use Case Diagram 8
3.4 System Design 8
3.5 Modules 9
3.6 Hardware Requirements 10
3.7 Software Requirements 10
CHAPTER 4 IMPLEMENTATION 11
4.1 Download of Python 12
4.2 Installation of Python 13
4.3 Verify the Python Installation 15
4.4 Check how the Python IDLE works 16
4.5 Data Collection 17
4.6 Data Preprocessing 18
4.7 Data Augmentation 19
4.8 Model Architecture 20
4.9 Model Training 20
4.10 Model Evaluation 21
4.11 Deployment 22
CHAPTER 5 SAMPLE CODE 23
5.1 preprocessed_dataset.py 24
5.2 [Link] 25
5.3 convert_test.py 29
5.4 [Link] 30
5.5 [Link] 31
5.6 [Link] 33
CHAPTER 6 EXPERIMENTAL RESULTS 42
6.1 Experimental Results 43
6.2 Applications 48
6.3 Advantages 49
7.1 Conclusion 51
7.2 Future Scope 51
BIBLIOGRAPHY 52
LIST OF FIGURES AND TABLES

[Link] Name Of the Figure Page no


1 3.3.1 Use Case Diagram 8
2 3.4.1 System Design 8
3 4.1.1 Python Official Website 12
4 4.1.2 Specific version of Python 12
5 4.1.3 Executable files of Python 13
6 4.2.1 Python application 14
7 4.2.2 Installation prompt of Python 14
8 4.2.3 Successful installation of Python 15
9 4.3.1 Command Prompt 15
10 4.3.2 Version checking of python in command prompt 16
11 4.4.1 Python IDLE 16
12 4.4.2 Saving of a file 17
13 4.4.3 Running of Python IDLE 17
14 6.1.1 Main files of project 43
15 6.1.2 Dataset 43
16 6.1.3 Images of H letter 44
17 6.1.4 Images of D letter 44
18 6.1.5 Running the project using command prompt 45
19 6.1.6 Execution of project 45
20 6.1.7 Home page of project 46
21 6.1.8 Prediction of a letter 46
22 6.1.9 Prediction of another letter 47
23 6.1.10 Adding of letters as strings 47
Convolutional Neural Network – American Sign Language Translation System

CHAPTER 1
INTRODUCTION

Dept. of CSE., GPCET, Kurnool Page 1


Convolutional Neural Network – American Sign Language Translation System

CHAPTER 1: INTRODUCTION
1.1 Introduction
Sign language, as one of the most widely used communication means for
hearingimpaired people, is expressed by variations of hand-shapes, body movement, and even
facial expression. Since it is difficult to collaboratively exploit the information from hand-
shapes and body movement trajectory, sign language recognition is still a very challenging task.
This paper proposes an effective recognition model to translate sign language into text or
speech in order to help the hearing impaired communicate with normal people through sign
language.
Technically speaking, the main challenge of sign language recognition lies in
developing descriptors to express hand-shapes and motion trajectory. In particular, hand-shape
description involves tracking hand regions in video stream, segmenting hand-shape images
from complex background in each frame and gestures recognition problems. Motion trajectory
is also related to tracking of the key points and curve matching. Although lots of research works
have been conducted on these two issues for now, it is still hard to obtain satisfying result for
SLR due to the variation and occlusion of hands and body joints. Besides, it is a nontrivial issue
to integrate the hand-shape features and trajectory features together. To address these
difficulties, we develop a CNNs to naturally integrate hand-shapes, trajectory of action and
facial expression. Instead of using commonly used color images as input to networks like [1,
2], we take color images, depth images and body skeleton images simultaneously as input
which are all provided by Microsoft Kinect.
Kinect is a motion sensor which can provide color stream and depth stream. With the
public Windows SDK, the body joint locations can be obtained in real-time as shown in Fig.1.
Therefore, we choose Kinect as capture device to record sign words dataset. The change of
color and depth in pixel level are useful information to discriminate different sign actions. And
the variation of body joints in time dimension can depict the trajectory of sign actions. Using
multiple types of visual sources as input leads CNNs paying attention to the change not only
in color, but also in depth and trajectory.

Dept. of CSE., GPCET, Kurnool Page 2


Convolutional Neural Network – American Sign Language Translation System

CHAPTER 2

LITERATURE REVIEW

Dept. of CSE., GPCET, Kurnool Page 3


Convolutional Neural Network – American Sign Language Translation System

CHAPTER 2: LITERATURE REVIEW


2.1 Literature Review
2.1.1 Title 1: American Sign Language Words Recognition Using Spatio-Temporal
Prosodic and Angle Features: A Sequential Learning Approach
Author: Kosin Chamnongthai
Department of Electronic and Telecommunication Engineering, Faculty of Engineering, King
Mongkut’s University of Technology Thonburi, Bangkok 2022,Thailand.
Content: The incredible attention to human-computer interaction (HCI) makes human hands
the most natural and efficient medium to express intentions for daily interaction activities [1].
It leads to the development of numerous HCI systems such as sign language recognition,
robotics, medical diagnostics, among others. To solve this, Fang et al. [5], proposed DeepASL
using leap motion controller (LMC) sensor from a backhand view with bi-directional long
short-term memory (Bi-LSTM). Therefore, Deep Bi-LSTM architectures should have more
potential for the dynamic sign language recognition (SLR).
2.1.2 Title 2: An Adaptive, Affordable, Open-Source Robotic Hand for Deaf and Deaf-
Blind Communication Using Tactile American Sign Language
Author: Samantha Johnson, Geng Gao, Todd Johnson, Chiara Bellini 2021 November 5.
Content: Deaf blindness is characterized by a spectrum that ranges from partial visual and/or
hearing impairment to complete loss of the two senses. The National Health and Nutritional
Examinations survey that ran from 2006 to 2013 discovered that 1.5 million Americans have
some degree of vision and hearing loss. To communicate, deaf-blind individuals in the United
States use a combination of Braille, American Sign Language (ASL), and tactile alphabets.
2.1.3 Title 3: User-Independent American Sign Language Alphabet Recognition Based
on
Depth Image and PCANet Features
Author: Walaa Aly, Saleh Aly , Sultan Almotairi 2019 September.
Department of Information Technology, Majmaah University, Majmaah, Saudi Arabia
Content: Sign language is the most natural and effective way for communications among deaf
and normal people. American Sign Language (ASL) alphabet recognition (i.e., fingerspelling)
using marker-less vision sensor is a challenging task due to the difficulties in hand segmentation
and appearance variations among signers. Existing color-based sign language recognition
systems suffer from many challenges such as complex background, hand

Dept. of CSE., GPCET, Kurnool Page 4


Convolutional Neural Network - American Sign Language Translation System

segmentation, large and intra-class variations. In this paper, we propose a new user independent
recognition system for American sign language alphabet using depth images captured from the
low-cost Microsoft Kinect depth sensor.

2.1.4 Title 4: American Sign Language Words Recognition Using Spatio-Temporal


Prosodic and Angle Features: A Sequential Learning Approach Author:
SUNUSI BALA , ABDULLAHI , KOSIN CHAMNONGTHAI
Content:
Most of the available American Sign Language (ASL) words share similar characteristics.
These characteristics are usually during sign trajectory which yields similarity issues and
hinders ubiquitous application. However, recognition of similar ASL words confused
translation algorithms, which lead to misclassification. In this paper, based on fast fisher vector
(FFV) and bi-directional Long-Short Term memory (Bi-LSTM) method, a large database of
dynamic sign words recognition algorithm called bidirectional long-short term memory-fast
fisher vector (FFV-Bi-LSTM) is designed.
In this paper, we improve the accuracy of the ASL data set, LMDHG, and SHREC data sets
by 2%, 2%, and 3.19% respectively.

Dept. of CSE., GPCET, Kurnool Page 5


Convolutional Neural Network - American Sign Language Translation System

CHAPTER 3
EXISTING SYSTEM AND PROPOSED SYSTEM

Dept. of CSE., GPCET, Kurnool Page 6


Convolutional Neural Network - American Sign Language Translation System

CHAPTER 3 EXISTING SYSTEM AND PROPOSED SYSTEM


3.1 Existing system
• In existing system Bi-Directional LSTM is used to process and execute where it is more
challenging in training the large data sets.

• For large datasets, this can result in long processing times and slower model training and
inference.

• Bi-directional LSTMs can be more computationally expensive, and accuracy is also less.

3.1.1 Problems of Existing System

Bi-directional LSTMs add additional parameters to the model, which can increase
its complexity and make it harder to interpret and optimize. Bi-directional LSTMs have
several hyperparameters that need to be tuned carefully to achieve good performance,
including the number of hidden units, the learning rate, and the dropout rate. Tuning these
hyperparameters can be time-consuming and requires careful experimentation.
Additionally, there may be variability in how different signers perform the same sign, which
can also reduce the accuracy of bi-directional LSTMs.

3.2 Proposed system


• Around 6.1% of the world’s population are suffering from deaf and hard to hear. Hearing
loss can have significant impact on person’s communication and access information. Due
to lack of access they may face barriers to education and social exposure.

• The proposed system is a complete and complex language with its own grammar, syntax,
and vocabulary which provides 97% accuracy when compared to existing algorithm and is
not simply a visual representation of English or any other spoken language.

• On the other hand, when the other parameters are considered, such that the computational
time and parallel processing (classifying more than one samples at a time), the performance
of CNN is far better than its counterpart.

• The reasons may be due to an imbalance in training ratio, It is concluded that the proposed
system can effectively classify the type of signs or gestures.

Dept. of CSE., GPCET, Kurnool Page 7


Convolutional Neural Network - American Sign Language Translation System

3.3 Use Case Diagram

UploadHandGestureDataset

TrainCNNwithGestureImage
user

SignLanguageRecognitionFromWe
bCam

Fig 3.3.1 Use Case Diagram

3.4 System Design

Fig 3.4.1 System Design

Dept. of CSE., GPCET, Kurnool Page 8


Convolutional Neural Network - American Sign Language Translation System

3.5 Modules

 TensorFlow

TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library and is also used for machine
learning applications such as neural networks. It is used for both research and production at Google.

 Pandas

Pandas is an open-source Python Library providing high-performance data manipulation and


analysis tool. Using Pandas, we can accomplish five typical steps in the processing and analysis of
data, regardless of the origin of data load, prepare, manipulate, model, and analyze.

 Matplotlib

Matplotlib is a Python 2D plotting library which produces publication quality figures in a


variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used
in Python scripts. You can generate plots, histograms, power spectra, bar charts, error charts, scatter
plots, etc., with just a few lines of code.

 Scikit-learn

Scikit-learn provides a range of supervised and unsupervised learning algorithms via a


consistent interface in Python. It is licensed under a permissive simplified BSD license and is
distributed under many Linux distributions, encouraging academic and commercial use.

 OpenCV

OpenCV is a powerful computer vision library that can be used for various image and video
processing tasks, including ASL recognition. While it is possible to use OpenCV with CNNs for
ASL recognition, it is important to note that OpenCV is primarily designed for computer vision
tasks such as image processing, object detection, and tracking, rather than deep learning.

Dept. of CSE., GPCET, Kurnool Page 9


Convolutional Neural Network - American Sign Language Translation System

3.6 HARDWARE REQUIREMENTS

⚫ Processor Intel® Core i5-5200U


⚫ RAM 4GB
⚫ Hard Disk 256GB
⚫ Integrated Camera

3.7 SOFTWARE REQUIREMENTS

⚫ Operating System: Windows 10

⚫ IDLE: Python IDLE

Dept. of CSE., GPCET, Kurnool Page 10


Convolutional Neural Network - American Sign Language Translation System

CHAPTER 4
IMPLEMENTATION

Dept. of CSE., GPCET, Kurnool Page 11


Convolutional Neural Network - American Sign Language Translation System

CHAPTER 4: IMPLEMENTATION
4.1 Download of Python
Download the Correct version into the system
Step 1: Go to the official site to download and install Python using Google Chrome or any other
web browser. OR Click on the following link: [Link]

Fig 4.1.1 Python Official Website

Now, check for the latest and the correct version for your operating system.

Step 2: You can either select the Download Python for windows 3.8.2 button in Yellow Color or
you can scroll further down and click on download with respective to their version. Here, we are
downloading the most recent python version for windows 3.8.2

Fig 4.1.2 Specific version of Python

Dept. of CSE., GPCET, Kurnool Page 12


Convolutional Neural Network - American Sign Language Translation System

Step 3: Scroll down the page until you find the Files option.

Step 4: Here you see a different version of python along with the operating system.

Fig 4.1.3 Executable files of Python

• To download Windows 32-bit python, you can select any one from the three options:
Windows x86 embeddable zip file, Windows x86 executable installer or Windows x86
webbased installer.

• To download Windows 64-bit python, you can select any one from the three options:
Windows x86-64 embeddable zip file, Windows x86-64 executable installer or Windows
x8664 web-based installer.
Here we will install Windows x86-64 web-based installer. Here your first part regarding
which version of python is to be downloaded is completed. Now we move ahead with the
second part in installing python i.e. Installation

Note: To know the changes or updates that are made in the version you can click on the Release
Note Option.

4.2 Installation of Python


Step 1: Go to Download and Open the downloaded python version to carry out the installation
process.

Dept. of CSE., GPCET, Kurnool Page 13


Convolutional Neural Network - American Sign Language Translation System

Fig 4.2.1 Python application

Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.8.2 to PATH.

Fig 4.2.2 Installation prompt of Python

Step 3: Click on Install NOW After the installation is successful. Click on Close.

Dept. of CSE., GPCET, Kurnool Page 14


Convolutional Neural Network - American Sign Language Translation System

Fig 4.2.3 Successful installation of Python


With these above three steps on python installation, you have successfully and
correctly installed Python. Now is the time to verify the installation. Note: The
installation process might take a couple of minutes.

4.3 Verify the Python Installation


Step 1: Click on Start
Step 2: In the Windows Run Command, type “cmd”

Fig 4.3.1 Command Prompt

Dept. of CSE., GPCET, Kurnool Page 15


Convolutional Neural Network - American Sign Language Translation System

Step 3: Open the Command prompt option.


Step 4: Let us test whether the python is correctly installed. Type python –V and press Enter.

Fig 4.3.2 Version checking of python in command prompt


Step 5: You will get the answer as 3.8.2
Note: If you have any of the earlier versions of Python already installed. You must first uninstall
the earlier version and then install the new one.

4.4 Check how the Python IDLE works


Step 1: Click on Start
Step 2: In the Windows Run command, type “python idle”

Fig 4.4.1 Python IDLE


Step 3: Click on IDLE (Python 3.8.2 64-bit) and launch the program
Step 4: To go ahead with working in IDLE you must first save the file. Click on File > Click on
Save

Dept. of CSE., GPCET, Kurnool Page 16


Convolutional Neural Network - American Sign Language Translation System

Fig 4.4.2 Saving of a file


Step 5: Name the file and save as type should be Python files. Click on SAVE. Here I have
named the files as Hey World.
Step 6: Now for e.g. enter print (“Hey World”) and Press Enter.

Fig 4.4.3 Running of Python IDLE


You will see that the command given is launched. With this, we end our tutorial on how to
install Python. You have learned how to download python for windows into your respective
operating system.
Note: Unlike Java, Python doesn’t need semicolons at the end of the statements otherwise it
won’t work.

4.5 Data Collection


Collecting data for American Sign Language (ASL) translation using Convolutional
Neural Networks (CNN) can be a challenging task, but it is essential to ensure the accuracy of
the model. Here are some steps you can follow to collect data:
 Identify the ASL signs you want to include in your dataset. This will depend on the scope
of your project and the level of complexity you want to achieve.

Dept. of CSE., GPCET, Kurnool Page 17


Convolutional Neural Network - American Sign Language Translation System

 Find sources of ASL videos that demonstrate the signs you want to include in your dataset.
You can use online resources such as ASL dictionaries, instructional videos, or even
YouTube channels that focus on teaching ASL.
 Once you have identified the videos you want to use, you will need to extract individual
frames from the videos. This will allow you to create a dataset of images that represent
each sign.
 Label each image with the corresponding ASL sign. You can do this manually or use tools
such as Labelling or VGG Image Annotator.
 Organize your labelled dataset into training, validation, and test sets. This will ensure that
your model can generalize well to new data.
 Augment your dataset by adding variations to your images, such as flipping, rotating, or
adding noise. This will increase the size of your dataset and help your model become more
robust.
 Pre-process your dataset by resizing the images and normalizing the pixel values. This will
help your model process the data more efficiently.
 Finally, split your dataset into batches and feed it into your CNN model. Train your model
and fine-tune its hyperparameters until it achieves good accuracy on your validation set.
Overall, collecting data for ASL translation using CNN requires careful planning and
attention to detail. However, with the right approach, you can create a robust and accurate
model that can translate ASL signs into text or speech.

4.6 Data Preprocessing


Data preprocessing is a crucial step in developing a Convolutional Neural Network (CNN)
model for American Sign Language (ASL) translation. Here are some steps you can follow for
data preprocessing:

 Image Resizing: Resizing images to a consistent size is essential for CNNs. The images
should be resized to the same dimensions to ensure that the CNN can process them. The
size of the images should be chosen based on the architecture of the CNN model and the
available computing resources.

 Image Normalization: Normalizing pixel values of images helps the CNN converge faster
during training. The values should be scaled to a range between 0 and 1 or -1 to 1.

 Data Augmentation: Data augmentation helps in increasing the size of the dataset by
creating new examples from the existing ones. This technique can help in improving the
model’s ability to generalize to new data. Techniques such as rotation, scaling, and flipping
can be used to augment the data.

Dept. of CSE., GPCET, Kurnool Page 18


Convolutional Neural Network - American Sign Language Translation System

 Splitting Data: Splitting the dataset into training, validation, and test sets is important to
evaluate the model's performance. The training set is used to train the model, the validation
set is used to tune the model’s hyperparameters, and the test set is used to evaluate the
model's performance.

 One-Hot Encoding: One-hot encoding is used to convert the categorical labels of ASL
signs into numerical values. Each sign is assigned a unique numerical value, which is used
to train the CNN.

 Class Balancing: Balancing the number of examples for each ASL sign can help in
training the model more efficiently. If the number of examples for each class is
imbalanced, the model may become biased towards the more abundant classes, leading to
poor performance for the less frequent classes.

 Data Normalization: Normalizing the data is important to ensure that the features have
the same scale. This helps the CNN model in processing the data more efficiently and
effectively.

In summary, data preprocessing is a crucial step in developing a CNN model for ASL
translation. By following the above steps, you can ensure that your model is trained on
highquality data and achieves good accuracy on the validation and test sets.

4.7 Data Augmentation


Data augmentation is an important technique for improving the performance of
Convolutional Neural Network (CNN) models for American Sign Language (ASL) translation.
Here are some common data augmentation techniques that can be used for ASL datasets:

 Rotation: Rotating the images can help the model learn to recognize signs from different
angles, which can improve its robustness to variation in hand position and orientation.

 Scaling: Scaling the images can help the model recognize signs of different sizes, which
can improve its ability to generalize to new data.
 Flipping: Flipping the images horizontally can help the model learn to recognize signs that
are mirrored or performed with the opposite hand.

 Cropping: Cropping the images can help the model focus on the most important parts of
the image, such as the hand and the signed word, while removing distracting background
details.

 Adding noise: Adding noise to the images can help the model learn to recognize signs in
low-light or noisy environments, which can improve its robustness to real-world conditions.

 Translation: Translating the images can help the model learn to recognize signs that are
performed in different positions or locations, which can improve its ability to generalize to
new data.

Dept. of CSE., GPCET, Kurnool Page 19


Convolutional Neural Network - American Sign Language Translation System

 Color jittering: Color jittering involves randomly adjusting the brightness, contrast,
saturation, and hue of the images. This can help the model learn to recognize signs in
different lighting conditions, which can improve its robustness to real-world environments.

These techniques can be applied individually or in combination to create a diverse dataset


for training the CNN model. By augmenting the data, the model can learn to recognize signs
under a variety of conditions, which can improve its accuracy and robustness.

4.8 Model Architecture


The architecture of a Convolutional Neural Network (CNN) model for American Sign
Language (ASL) translation typically consists of several layers:

 Input layer: This layer takes the input image, which is typically resized and normalized.

 Convolutional layers: These layers apply a set of filters (also known as kernels) to the
input image to extract important features. The filters are learned during training, and the
number and size of the filters can vary depending on the architecture of the CNN model.

 Pooling layers: These layers down sample the output of the convolutional layers to reduce
the spatial size of the feature maps and help the model learn more robust features.

 Fully connected layers: These layers take the flattened output of the last pooling layer and
apply a set of dense (fully connected) layers to classify the image into the corresponding
ASL sign.

 Output layer: This layer produces the final output, which is a probability distribution over
the different ASL signs.

The architecture of the CNN model can vary depending on the specific problem and
available data. Some popular architectures that have been used for ASL translation include
LeNet, AlexNet, VGG, and ResNet. The choice of architecture depends on factors such as the
size of the dataset, the available computing resources, and the desired accuracy and speed of
the model. In summary, a CNN model for ASL translation typically consists of several layers,
including convolutional layers, pooling layers, and fully connected layers. The choice of
architecture depends on the specific problem and available data, and the model is trained to
classify the input image into the corresponding ASL sign.

4.9 Model Training

Training a Convolutional Neural Network (CNN) model for American Sign Language
(ASL) translation involves several steps:
 Data Preparation: The first step is to prepare the data for training. This involves
preprocessing the images, splitting the data into training, validation, and test sets, and
onehot encoding the labels.

Dept. of CSE., GPCET, Kurnool Page 20


Convolutional Neural Network - American Sign Language Translation System

 Define the Model Architecture: The next step is to define the architecture of the CNN
model. This involves selecting the number of layers, the type of layers, and the number of
filters for each layer. The architecture should be chosen based on the specific problem and
available data.

 Compile the Model: The next step is to compile the model, which involves selecting a loss
function, an optimizer, and any metrics that will be used to evaluate the performance of the
model during training.

 Train the Model: The model is then trained on the training set using the fit() method of
the Keras library. During training, the model iteratively updates the weights and biases of
the filters to minimize the loss function.

 Evaluate the Model: After training, the model is evaluated on the validation set to check
its performance. This involves calculating metrics such as accuracy, precision, and recall. If
the model is not performing well, the architecture and hyperparameters can be adjusted and
the training process repeated.

 Test the Model: Once the model has been trained and evaluated, it can be tested on the test
set to get an estimate of its real-world performance.

 Fine-Tuning: If the model is not performing well on the test set, it may be necessary to
fine-tune the hyperparameters or adjust the architecture of the model.

 Save the Model: Once the model has been trained and evaluated, it can be saved to disk
for later use.

In summary, training a CNN model for ASL translation involves preparing the data,
defining the model architecture, compiling the model, training the model, evaluating the model,
testing the model, fine-tuning the model if necessary, and saving the model to disk.

4.10 Model Evaluation


The evaluation of a Convolutional Neural Network (CNN) model for American Sign
Language (ASL) translation typically involves several metrics to assess its performance. Here
are some commonly used metrics:

 Accuracy: Accuracy is the proportion of correctly classified images over the total number
of images. It is a common metric for evaluating classification models, but it can be
misleading if the dataset is imbalanced.

 Precision: Precision is the proportion of true positives (correctly classified ASL signs) over
the total number of predicted positives (all predicted ASL signs). It measures how precise
the model's predictions are.

 Recall: Recall is the proportion of true positives over the total number of actual positives
(all true ASL signs). It measures how well the model is able to identify all positive cases.

Dept. of CSE., GPCET, Kurnool Page 21


Convolutional Neural Network - American Sign Language Translation System

 F1 score: The F1 score is the harmonic mean of precision and recall. It is a balanced metric
that takes both precision and recall into account.

 Confusion matrix: A confusion matrix is a table that shows the number of true positives,
false positives, true negatives, and false negatives for each ASL sign. It can be used to
identify which signs are most often misclassified by the model.

 ROC curve: The Receiver Operating Characteristic (ROC) curve is a graphical


representation of the true positive rate (recall) versus the false positive rate (1-specificity)
at different classification thresholds. It can be used to assess the overall performance of the
model and to choose an optimal classification threshold.

The evaluation metrics should be chosen based on the specific problem and the available
data. In general, the accuracy, precision, recall, and F1 score are commonly used metrics for
classification models, while the confusion matrix and ROC curve can provide additional
insights into the model's performance.
In summary, the evaluation of a CNN model for ASL translation typically involves several
metrics, including accuracy, precision, recall, F1 score, confusion matrix, and ROC curve.
These metrics can be used to assess the overall performance of the model and to identify areas
for improvement.

4.11 Deployment

Deployment of a Convolutional Neural Network (CNN) model for American Sign Language
(ASL) translation involves making the model available for use in a production environment.
Here are the general steps involved in deploying a CNN model

 Export the Model: The first step is to export the trained model from the development
environment. This usually involves saving the model in a format that can be loaded by the
production environment, such as a Keras HDF5 file or a TensorFlow SavedModel.

 Set Up the Production Environment: The production environment needs to be set up to


run the deployed model. This may involve installing the necessary software dependencies,
configuring hardware resources, and setting up networking and security protocols.

 Load the Model: Once the production environment is set up, the exported model can be
loaded into the production environment using the appropriate library or framework.

 Input Data: The input data needs to be preprocessed to match the format expected by the
model. This may involve scaling, normalization, and one-hot encoding.

 Run Predictions: The preprocessed input data can then be fed into the loaded model, and
the model can generate predictions on the input data.

 Return Results: The predictions can then be returned to the user or integrated with other
systems as needed.

Dept. of CSE., GPCET, Kurnool Page 22


Convolutional Neural Network - American Sign Language Translation System

The deployment of a CNN model for ASL translation requires careful consideration of
the production environment, data input, and output. It is important to ensure that the deployed
model is accurate, efficient, and scalable to meet the needs of the intended users.

Dept. of CSE., GPCET, Kurnool Page 23


Convolutional Neural Network - American Sign Language Translation System

CHAPTER 5
SAMPLE CODE

Dept. of CSE., GPCET, Kurnool Page 24


Convolutional Neural Network - American Sign Language Translation System

CHAPTER 5: SAMPLE CODE


5.1 preprocessed_dataset.py
import os import cv2 import pickle import
mediapipe as mp import numpy as np from tqdm
import tqdm mp_drawing =
[Link].drawing_utils mp_drawing_styles =
[Link].drawing_styles mp_hands =
[Link]

def preprocessed_dataset(DATASET_PATH, save_path):


whole_dataset_x = []
whole_dataset_Y = []

with mp_hands.Hands(static_image_mode=True, max_num_hands=2,


min_detection_confidence=0.5) as hands:
for i, main_dir in enumerate([Link](DATASET_PATH)):
sub_dir = [Link](DATASET_PATH, main_dir) for file_name in
tqdm([Link](sub_dir), desc=f"Step {i+1}/27, {main_dir}"):
file_name = [Link](sub_dir, file_name)
current_file = [] image = [Link]([Link](file_name), 1)
results = [Link]([Link](image, cv2.COLOR_BGR2RGB))

if not results.multi_hand_landmarks:
continue
image_height, image_width, _ = [Link]
annotated_image = [Link]()

for hand_landmarks in results.multi_hand_landmarks:


for i in range(0, 21):
current_file.append([Link]([hand_landmarks.landmark[i].x,
hand_landmarks.landmark[i].y, hand_landmarks.landmark[i].z], dtype=[Link]))

Dept. of CSE., GPCET, Kurnool Page 25


whole_dataset_x.append([Link](current_file))
whole_dataset_Y.append(main_dir) break

print([Link](whole_dataset_x).shape)
print([Link](whole_dataset_Y).shape)

[Link]({'X':[Link](whole_dataset_x),'Y':[Link](whole_dataset_Y)},open(save_path,
'wb'))

if __name__ == "__main__":
preprocessed_dataset("dataset/train", 'preprocessed_dataset_train.h5')
preprocessed_dataset("dataset/valid", 'preprocessed_dataset_valid.h5')

5.2 [Link]
import os
[Link]['TF_CPP_MIN_LOG_LEVEL'] = '3'
import time import cv2 import numpy as np
import mediapipe as mp from
[Link] import load_model

def detect_frame(image, model, classes):


mp_drawing = [Link].drawing_utils
mp_drawing_styles = [Link].drawing_styles
mp_hands = [Link]

Dept. of CSE., GPCET, Kurnool Page 26


Convolutional Neural Network - American Sign Language Translation System

fps, predicted_class = float(), str()


with mp_hands.Hands(model_complexity=0, min_detection_confidence=0.5,
min_tracking_confidence=0.5) as hands: t1 = [Link]() [Link] =
False image = [Link](image, cv2.COLOR_BGR2RGB) results =
[Link](image)

[Link] = True image =


[Link](image, cv2.COLOR_RGB2BGR)

if results.multi_hand_landmarks: for
hand_landmarks in results.multi_hand_landmarks:
current_data = []
for i in range(0, 21):
current_data.append([Link]([hand_landmarks.landmark[i].x,
hand_landmarks.landmark[i].y, hand_landmarks.landmark[i].z], dtype=[Link]))

pred = [Link](np.expand_dims([Link](current_data), axis=0))


predicted_class_indices = [Link](pred, axis = 1) predicted_class =
classes[predicted_class_indices[0]]

mp_drawing.draw_landmarks(
image, hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())

t2 = [Link]()
fps = int(1./ (t2 - t1)) image = [Link](image, 1) [Link](image, f"FPS:
{fps}", (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,

Dept. of CSE., GPCET, Kurnool Page 27


Convolutional Neural Network - American Sign Language Translation System

0, 255), 1, cv2.LINE_AA) if
predicted_class == "SS":
predicted_class = "Space"
[Link](image, f"Predicted Class: {predicted_class}", (10, 40),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1, cv2.LINE_AA) return image,
predicted_class

def detect():
model = load_model('sequencial_model.h5')
print([Link]()) print("Model
Loaded...") classes =
[Link]('dataset/train')

mp_drawing = [Link].drawing_utils
mp_drawing_styles = [Link].drawing_styles mp_hands
= [Link]

cap = [Link](0) fps,


predicted_class = float(), str()
with mp_hands.Hands(model_complexity=0, min_detection_confidence=0.5,
min_tracking_confidence=0.5) as hands:
while [Link]():
success, image = [Link]()
t1 = [Link]() if not
success:
print("Ignoring empty camera frame.")
continue

[Link] = False image =


[Link](image, cv2.COLOR_BGR2RGB) results
= [Link](image) [Link] = True
image = [Link](image, cv2.COLOR_RGB2BGR)
if results.multi_hand_landmarks: for hand_landmarks

Dept. of CSE., GPCET, Kurnool Page 28


Convolutional Neural Network - American Sign Language Translation System

in results.multi_hand_landmarks: current_data = []
for i in range(0, 21):
current_data.append([Link]([hand_landmarks.landmark[i].x,
hand_landmarks.landmark[i].y, hand_landmarks.landmark[i].z], dtype=[Link]))
pred = [Link](np.expand_dims([Link](current_data), axis=0))
predicted_class_indices = [Link](pred, axis = 1) predicted_class =
classes[predicted_class_indices[0]] mp_drawing.draw_landmarks(
image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())

t2 = [Link]() fps = int(1./ (t2 - t1)) image = [Link](image, 1)


[Link](image, f"FPS: {fps}", (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
(0, 0, 255), 1, cv2.LINE_AA)
[Link](image, f"Predicted Class: {predicted_class}", (10, 40),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1, cv2.LINE_AA)
[Link]('MediaPipe Hands', image) if [Link](5) & 0xFF == 27: break
[Link]()

if __name__ == "__main__":
detect()
5.3 convert_test.py
import os import cv2 import mediapipe as mp
mp_drawing = [Link].drawing_utils
mp_drawing_styles = [Link].drawing_styles
mp_hands = [Link]

IMAGE_FILES = [[Link]('dataset/test', x) for x in [Link]('dataset/test')]


with mp_hands.Hands( static_image_mode=True, max_num_hands=2,

Dept. of CSE., GPCET, Kurnool Page 29


Convolutional Neural Network - American Sign Language Translation System

min_detection_confidence=0.5) as hands: for idx, file in


enumerate(IMAGE_FILES):
image = [Link]([Link](file), 1) results =
[Link]([Link](image, cv2.COLOR_BGR2RGB))

if not results.multi_hand_landmarks:
continue image_height, image_width, _ =
[Link] annotated_image = [Link]() for
hand_landmarks in results.multi_hand_landmarks:

mp_drawing.draw_landmarks(
annotated_image, hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style()) [Link](file[:-4] +
'test' + '.png', [Link](annotated_image, 1))

5.4 [Link]
import pickle import [Link] as plt import
numpy as np from [Link] import
load_model with open('sequencial_model_score.h5',
'rb') as score:
fit_history = [Link](score)
[Link](1, figsize = (20,10))
[Link](221)
[Link](fit_history[1][1])
[Link](fit_history[3][1])
[Link]('Model Accuracy')
[Link]('Accuracy')
[Link](['Train', 'Valid'])
[Link](222)
[Link](fit_history[0][1])
[Link](fit_history[2][1])

Dept. of CSE., GPCET, Kurnool Page 30


Convolutional Neural Network - American Sign Language Translation System

[Link]('Model Loss')
[Link]('loss') [Link](['Train',
'Valid']) [Link]()

5.5 [Link]
import os import sys import time import pickle
import tensorflow as tf import pickle import
numpy as np import [Link] as keras from
[Link] import LabelEncoder from
[Link] import OneHotEncoder from
[Link] import Sequential
[Link](1500) class
TimeHistory([Link]): def
on_train_begin(self, logs={}):
[Link] = []

def on_epoch_begin(self, epoch, logs={}):


self.epoch_time_start = [Link]()

def on_epoch_end(self, epoch, logs={}):


[Link]([Link]() - self.epoch_time_start)

def data_generator(): with


open('preprocessed_dataset_train.h5', 'rb') as data:
dataset = [Link](data)

with open('preprocessed_dataset_valid.h5', 'rb') as data_valid:


dataset_valid = [Link](data_valid)

classes = [Link](list(set(set(dataset['Y']))))

enc = LabelEncoder()
[Link](dataset['Y'])

Dept. of CSE., GPCET, Kurnool Page 31


Convolutional Neural Network - American Sign Language Translation System

dataset['Y'] = [Link](dataset['Y']) ohe =


OneHotEncoder().fit(dataset['Y'].reshape(-1, 1)) dataset['Y'] =
[Link](dataset['Y'].reshape(-1, 1)).toarray()

dataset_valid['Y'] = [Link](dataset_valid['Y']) dataset_valid['Y'] =


[Link](dataset_valid['Y'].reshape(-1, 1)).toarray()

print(dataset['X'].shape, dataset['Y'].shape)
print(dataset_valid['X'].shape, dataset_valid['Y'].shape) print(classes)
return dataset, dataset_valid, classes def build_model(input_shape, classes):

model = Sequential()

[Link]([Link](64, input_shape = input_shape, return_sequences = True))


[Link]([Link](64))

[Link]([Link](128, activation='relu'))
[Link]([Link](0.3))

[Link]([Link](256, activation='relu'))
[Link]([Link](0.3))

[Link]([Link](512, activation='relu'))
[Link]([Link](0.3))

[Link]([Link](classes, activation='sigmoid'))

optimiser = [Link](learning_rate = 0.0001) [Link](optimizer =


optimiser, loss = 'categorical_crossentropy', metrics =
['accuracy'])
[Link]()

Dept. of CSE., GPCET, Kurnool Page 32


Convolutional Neural Network - American Sign Language Translation System

return model

def train_model(epochs = 10):


time_callback = TimeHistory()

train_generator, validation_generator, classes = data_generator()


model = build_model(input_shape = (21, 3), classes = len(classes))

r = [Link]( x=train_generator['X'], y=train_generator['Y'],


epochs = epochs, batch_size=64, shuffle=True,
validation_data=(validation_generator['X'], validation_generator['Y']),
validation_batch_size = 128, callbacks=[time_callback]
)

[Link]['time'] = time_callback.times

[Link]('sequencial_model.h5')
[Link](list([Link]()),open('sequencial_model_score.h5', 'wb'))

if __name__ == "__main__":
train_model(60)

5.6 [Link]
import os
[Link]['TF_CPP_MIN_LOG_LEVEL'] =
'3' import ctypes import sys import cv2 from
tkinter import * from [Link] import * from
time import sleep from PIL import ImageTk,
Image from detect import detect_frame
import [Link] as tkMessageBox from
[Link] import load_model

Dept. of CSE., GPCET, Kurnool Page 33


Convolutional Neural Network - American Sign Language Translation System

model = load_model('sequencial_model.h5')
print([Link]()) print("Model
Loaded...") classes =
[Link]('dataset/train') def HomePage():
global cntct,about, predict_stream,
cap try: [Link]()
except:
pass
try:
[Link]()
except:
pass
try:
predict_stream.destroy()
except:
pass
try:
[Link]()
except:
pass
window = Tk() img =
[Link]("Images\\[Link]") img =
[Link](img) panel = Label(window,
image=img) [Link](side="top", fill="both",
expand="yes") user32 = [Link].user32
[Link]()
[w, h] = [[Link](0), [Link](1)]
lt = [w, h] a = str(lt[0]//2-446) b= str(lt[1]//2-383)
[Link]("HOME - American Sign Language")
[Link]("1214x680+"+a+"+"+b)
[Link](0,0) def contactus():

Dept. of CSE., GPCET, Kurnool Page 34


Convolutional Neural Network - American Sign Language Translation System

global cntct,about, predict_stream, cap


try:
[Link]()
except:
pass
try:
[Link]()
except:
pass
try:
predict_stream.destroy()
except:
pass
try:
[Link]()
except:
pass
cntct = Tk()
img = [Link]("Images\\[Link]")
img = [Link](img) panel =
Label(cntct, image=img) [Link](side="top",
fill="both", expand="yes") user32 =
[Link].user32 [Link]()
[w, h] = [[Link](0), [Link](1)]
lt = [w, h]
a = str(lt[0]//2-446)
b= str(lt[1]//2-383) [Link]("About Team -
American Sign Language")
[Link]("1214x680+"+a+"+"+b)
[Link](0,0)
homebtn = Button(cntct,text = "HomePage",font = ("Agency

Dept. of CSE., GPCET, Kurnool Page 35


Convolutional Neural Network - American Sign Language Translation System

FB",16,"bold"),width=10,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",
activebackground = "#EEEEEE",activeforeground = "#141E61",command=HomePage)
[Link](x=784, y = 40)
contactusbtn = Button(cntct,text = "About Project",font = ("Agency
FB",16,"bold"),width=12,relief=FLAT,bd=0,
borderwidth='0',bg="#787A91",fg="#0F044C",activebackground=
"#EEEEEE",activeforeground = "#141E61",command=aboutus) [Link](x=909,y
= 40) exitbtn = Button(cntct,text = "Exit",font = ("Agency FB",16,"bold"), width=10, relief =
FLAT, bd = 0, borderwidth='0',bg="#787A91",fg="#0F044C",activebackground =
"#EEEEEE",activeforeground = "#141E61",command=exit)
[Link](x=1050,y = 40)
[Link]()

def aboutus():
global about,cntct, predict_stream, cap
try:
[Link]()
except:
pass
try:
[Link]()
except:
pass
try:
predict_stream.destroy()
except:
pass
try:
[Link]()
except:
pass
about = Tk()

Dept. of CSE., GPCET, Kurnool Page 36


Convolutional Neural Network - American Sign Language Translation System

img = [Link]("Images\\[Link]")
img = [Link](img) panel =
Label(about, image=img) [Link](side="top",
fill="both", expand="yes")

user32 = [Link].user32
[Link]()
[w, h] = [[Link](0), [Link](1)]
lt = [w, h]
a = str(lt[0]//2-446)
b= str(lt[1]//2-383)

[Link]("About Project - American Sign Language")


[Link]("1214x680+"+a+"+"+b) [Link](0,0)

homebtn = Button(about,text = "HomePage",font = ("Agency FB",16,"bold"),


width=10,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",activebackgroun
d = "#EEEEEE",activeforeground = "#141E61",command=HomePage)
[Link](x=800, y = 40)
contactusbtn = Button(about,text = "About Us",font = ("Agency FB",16,"bold"),
width=10,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",activebackgroun
d = "#EEEEEE",activeforeground = "#141E61",command=contactus)
[Link](x=925,y = 40) exitbtn = Button(about,text = "Exit",font = ("Agency
FB",16,"bold"), width=10, relief = FLAT, bd = 0,
borderwidth='0',bg="#787A91",fg="#0F044C",activebackground =
"#EEEEEE",activeforeground = "#141E61",command=exit)
[Link](x=1050,y = 40)

[Link]()

# EXIT . . .
def exit():

Dept. of CSE., GPCET, Kurnool Page 37


Convolutional Neural Network - American Sign Language Translation System

global cntct,about, predict_stream result =


[Link]("American Sign Language", "Are you sure you want to exit?",
icon= "warning") if result == 'yes': [Link]()
def
StreamMode():
pass

def VideoMode():
global about,cntct, predict_stream
try:
[Link]()
except:
pass
try:
[Link]()
except:
pass

predict_stream = Tk()
img = [Link]("Images\\[Link]")
img = [Link](img) panel =
Label(predict_stream, image=img) [Link](side="top",
fill="both", expand="yes")

user32 = [Link].user32
[Link]()
[w, h] = [[Link](0), [Link](1)]
lt = [w, h] a = str(lt[0]//2-446)
b= str(lt[1]//2-383)

imageFrame = Frame(predict_stream, width=600, height=500)


[Link](x=300, y=100)

Dept. of CSE., GPCET, Kurnool Page 38


Convolutional Neural Network - American Sign Language Translation System

global cap
cap = [Link](0)

def show_frame():
_, frame = [Link]() global
current_character
frame, current_character = detect_frame(frame, model, classes)
cv2image = [Link](frame, cv2.COLOR_BGR2RGBA)
img = [Link](cv2image) imgtk =
[Link](image=img) [Link] = imgtk
[Link](image=imgtk) [Link](10,
show_frame)

display1 = Label(imageFrame)
[Link](row=1, column=0) show_frame()

##########################

def
add_into_string():
try:
global current_character
if current_character == "Space":
current_character = " "
value = predicted_string_label['text'] +
current_character predicted_string_label.config(text = value)
except:
pass
def remove_from_string():
value = predicted_string_label['text'][:-1]
predicted_string_label.config(text = value) predict_stream.title("Predict - American
Sign Language") predict_stream.geometry("1214x680+"+a+"+"+b)
predict_stream.resizable(0,0)

Dept. of CSE., GPCET, Kurnool Page 39


Convolutional Neural Network - American Sign Language Translation System

homebtn = Button(predict_stream,text = "HomePage",font = ("Agency


FB",16,"bold"),width=10,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",
activebackground = "#EEEEEE",activeforeground = "#141E61",command=HomePage)
[Link](x=800, y = 40)
contactusbtn = Button(predict_stream,text = "About Us",font = ("Agency
FB",16,"bold"),width=10,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",
activebackground = "#EEEEEE",activeforeground = "#141E61",command=contactus)
[Link](x=925,y = 40) exitbtn = Button(predict_stream,text = "Exit",font =
("Agency FB",16,"bold"),
width=10,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",activebackgroun
d = "#EEEEEE",activeforeground = "#141E61",command=exit) [Link](x=1050,y = 40)
Label(predict_stream, text="Predicted String: ", font = ("Agency
FB",16,"bold")).place(x=300, y=600)
predicted_string_label = Label(predict_stream, text="", font = ("Agency FB",16,"bold"))
predicted_string_label.place(x=407, y = 600)
add_string = Button(predict_stream, text = "ADD", font = ("Agency
FB",16,"bold"),width=12,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",
activebackground="#EEEEEE",activeforeground = "#141E61",command=add_into_string)
add_string.place(x=600, y = 600)
remove_string = Button(predict_stream, text = "REMOVE",font = ("Agency
FB",16,"bold"),width=10,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",
activebackground="#EEEEEE",activeforeground="#141E61",
command=remove_from_string) remove_string.place(x=850,y = 600)
predict_stream.mainloop()
''' MENU BAR ''' aboutusbtn = Button(window,text = "About Project",font = ("Agency
FB",16,"bold"),
width=12,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",activebackgroun
d = "#EEEEEE",activeforeground = "#141E61",command=aboutus) [Link](x=784,
y = 40) contactusbtn = Button(window,text = "About Us",font = ("Agency FB",16,"bold"),
width=10,relief=FLAT,bd=0,borderwidth='0',bg="#787A91",fg="#0F044C",activebackgroun
d = "#EEEEEE",activeforeground = "#141E61",command=contactus)
[Link](x=925,y = 40) exitbtn = Button(window,text = "Exit",font = ("Agency
FB",16,"bold"), width=10, relief = FLAT, bd = 0,
borderwidth='0',bg="#787A91",fg="#0F044C",activebackground =

Dept. of CSE., GPCET, Kurnool Page 40


Convolutional Neural Network - American Sign Language Translation System

"#EEEEEE",activeforeground = "#141E61",command=exit)
[Link](x=1050,y = 40)
livestream = Button(window,text = "LIVE STREAM",font = ("Arial
Narrow",18,"bold"),width=20,relief=FLAT,bd=1,borderwidth='1',bg="#787A91",fg="#0F04
4C",activebackground = "#EEEEEE",activeforeground = "#141E61",command=VideoMode)
[Link](x=225,y = 350) [Link]()
#HomePage() def
LoadingScreen():
root = Tk() [Link](bg="white")
[Link]("Loading - American Sign Language")
img = [Link](r"Images\\[Link]")
img = [Link](img) panel =
Label(root, image=img)
[Link](side="top", fill="both", expand="yes")
user32 = [Link].user32
[Link]()
[w, h] = [[Link](0), [Link](1)]
lt = [w, h] a = str(lt[0]//2-446) b= str(lt[1]//2-383)
[Link]("1214x680+"+a+"+"+b) [Link](0,0) for i in
range(40):
Label(root,
bg="#EEEEEE",width=2,height=1).place(x=(i+4)*25,y=600) def
play_animation(): for j in range(40):
Label(root,bg= "#141E61",width=2,height=1).place(x=(j+4)*25,y=600)
sleep(0.07)
root.update_idletasks()
else:
[Link]()
HomePage()
[Link]()
play_animation() [Link]()

LoadingScreen()

Dept. of CSE., GPCET, Kurnool Page 41


Convolutional Neural Network - American Sign Language Translation System

CHAPTER 6
EXPERIMENTAL RESULTS

Dept. of CSE., GPCET, Kurnool Page 42


Convolutional Neural Network - American Sign Language Translation System

CHAPTER 6: EXPERIMENTAL RESULTS


6.1 Experimental Results
In this project using CNN we are recognizing American sign language movement and to
train CNN we are using following images shown in below screen shot

Fig 6.1.1 Main files of project


The above picture depicts about the opening of sign language it is a step-to-step
procedure first you need to open the drive where you have saved the sign language dataset folder
Then you need to open the dataset folder and access the letter folder which you need to access.
After opening the letter folder then you will find some sample pictures of letter which you
need to access then according to that you can get desired character.

Fig 6.1.2 Dataset

Dept. of CSE., GPCET, Kurnool Page 43


Fig 6.1.3 Images of H letter

Fig 6.1.4 Images of D letter

Dept. of CSE., GPCET, Kurnool Page 44


Convolutional Neural Network - American Sign Language Translation System

The below is the first processing in command prompt:

Fig 6.1.5 Running the project using command prompt

Then after writing the command in command prompt there it opens the sign language home
page as shown below.

Fig 6.1.6 Execution of project

Dept. of CSE., GPCET, Kurnool Page 45


Convolutional Neural Network - American Sign Language Translation System

Fig 6.1.7 Home page of project


The above is the homepage of sign language and further you can process the recognition of
sign language.

Fig 6.1.8 Prediction of a letter

Dept. of CSE., GPCET, Kurnool Page 46


Convolutional Neural Network - American Sign Language Translation System

Fig 6.1.9 Prediction of another letter

Fig 6.1.10 Adding of letters as strings


The above are the pictures of the sign language recognition which are recognized using
a webcam and based on the sign which is being showed to the webcam and it displays the
desired letter.

Dept. of CSE., GPCET, Kurnool Page 47


Convolutional Neural Network – American Sign Language Translation System

6.2 Applications
The applications of American Sign Language (ASL) translation using CNN algorithm
are diverse and wide-ranging. Here are some of the key applications:
 Education: ASL translation systems can be used in educational settings to provide deaf
and hard of hearing students with real-time translations of classroom lectures and
discussions.

 Healthcare: ASL translation systems can be used in healthcare settings to enable


effective communication between healthcare professionals and deaf or hard of hearing
patients, ensuring that they receive the best possible care.

 Customer service: ASL translation systems can be used in customer service settings to
facilitate communication between deaf or hard of hearing customers and customer
service representatives, improving customer satisfaction and accessibility.

 Business: ASL translation systems can be used in business settings to enable effective
communication between deaf or hard of hearing employees and their colleagues, helping
to create a more inclusive workplace.

 Entertainment: ASL translation systems can be used in the entertainment industry to


provide deaf or hard of hearing individuals with access to live performances and events.

 Emergency services: ASL translation systems can be used in emergency services


settings to facilitate communication between deaf or hard of hearing individuals and
emergency responders, ensuring that they receive the help they need in a timely manner.

Overall, ASL translation using CNN algorithm has a wide range of applications that can
greatly benefit the deaf and hard of hearing community, enabling them to communicate more
effectively in a variety of settings.

Dept. of CSE., GPCET, Kurnool Page 48


Convolutional Neural Network – American Sign Language Translation System

6.3 Advantages
The advantages of American Sign Language (ASL) translation using CNN algorithm
are significant and wide-ranging. Here are some of the key advantages:
 Accuracy: CNN algorithm has been proven to be highly effective in accurately
recognizing and translating hand gestures used in ASL, resulting in highly accurate
translations.

 Efficiency: CNN algorithm is highly efficient in recognizing hand gestures and translating
them into written or spoken language, making ASL translation systems faster and more
effective.

 Accessibility: ASL translation using CNN algorithm provides greater accessibility to the
deaf and hard of hearing community, enabling them to communicate more effectively in a
variety of settings.

 Flexibility: ASL translation systems using CNN algorithm can be used in a wide range of
applications, from educational settings to customer service to emergency services,
providing greater flexibility and versatility.

 Scalability: ASL translation systems using CNN algorithm can be easily scaled up to
accommodate larger volumes of users, making them highly scalable and adaptable to a
variety of settings.

 Cost-effective: ASL translation using CNN algorithm is a cost-effective solution


compared to hiring professional interpreters for every interaction, making it a more
affordable option for many organizations.

Overall, ASL translation using CNN algorithm offers a range of advantages that can
greatly benefit the deaf and hard of hearing community, providing greater accessibility, efficiency, and
flexibility in a variety of settings.

Dept. of CSE., GPCET, Kurnool Page 49


Convolutional Neural Network – American Sign Language Translation System

CHAPTER 7
CONCLUSION AND FUTURE SCOPE

Dept. of CSE., GPCET, Kurnool Page 50


Convolutional Neural Network – American Sign Language Translation System

CHAPTER 7: CONCLUSION AND FUTURE SCOPE


7.1 Conclusion
The use of convolutional neural network (CNN) algorithm for American Sign Language
(ASL) translation has shown significant progress in recent years. CNN has been proven to be
an effective tool for feature extraction and classification of hand gestures, enabling accurate
translation of ASL into written or spoken language. Through the utilization of deep learning
techniques, the accuracy and efficiency of ASL translation systems have been greatly improved,
paving the way for further advancements in this field. The development of these systems has
the potential to greatly benefit the deaf and hard of hearing community, providing increased
accessibility and communication options.
As research and development in this area continues, it is likely that we will see even
more sophisticated and accurate ASL translation systems emerge. With the increasing
availability of data and technological resources, the possibilities for improving ASL translation
are endless, and we can expect to see a bright future for this technology.

7.2 Future Scope


The future scope of American Sign Language (ASL) translation using CNN algorithm
is vast and promising. With continued advancements in deep learning and machine learning, it
is likely that ASL translation systems will become even more accurate, efficient, and
userfriendly. One potential area of development is the use of real-time ASL translation systems.
Currently, most ASL translation systems require the user to record or upload a video of their
sign language, which is then translated into written or spoken language. However, with the
development of real-time ASL translation systems, users would be able to communicate more
fluidly and naturally in real-time conversations. This could enable deaf and hard of hearing
individuals to use these technologies with ease and efficiency. Additionally, there is a need for
ASL translation systems that are specifically tailored to regional dialects and variations in sign
language. As ASL varies by region and country, it is important to develop translation systems
that are accurate and culturally sensitive.
Overall, the future scope of ASL translation using CNN algorithm is promising, and
with continued research and development, we can expect to see significant advancements in
this field that will greatly benefit the deaf and hard of hearing community.

Dept. of CSE., GPCET, Kurnool Page 51


Convolutional Neural Network – American Sign Language Translation System

BIBLIOGRAPHY
[1] Jin CM, Omar Z, Jaward MH. A mobile application of American sign language
translation via image processing algorithms. Proceedings - 2016 IEEE Region 10 Symposium,
TENSYMP 2016. 2016;: p. 104-109.
[2] Flores CJL, Cutipa AEG, Enciso RL. Application of convolutional neural networks for
static hand gestures recognition under different invariant features. Proceedings of the 2017
IEEE 24th International Congress on Electronics, Electrical Engineering and Computing,
INTERCON 2017. 2017;: p. 5-8
[3] Kumar P, Gauba H, Roy PP, Dogra DP. Coupled HMM-based multi-sensor data fusion
for sign language recognition. Pattern Recognition Letters. 2017; 86: p. 1-8
[4] Hu, J., Shen, C., & Sun, G. (2018). Squeeze-and-Excitation Networks. IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Huang, J., Zhou, J., Xu, W., & Huang, Y. (2018). American Sign Language
Recognition Based on Convolutional Neural Network. IEEE Access, 6, 57695-57702.
[6] Ding, S., Tang, Y., Wang, J., & Zhao, J. (2019). A Survey on Sign Language
Recognition Using Wearable Sensors. IEEE Transactions on Human-Machine Systems, 49(4),
315-326.
[7] Aly W, Aly S, Almotairi S. User-independent American sign language alphabet
recognition based on depth image and features. IEEE Access. 2019; 7: p. 123138-123150
[8] Ye, Y., & Hu, J. (2020). American Sign Language Recognition with Sign Language
Alphabet and Action Recognition Network. IEEE Access, 8, 1710-1718.
[9] lkandari, A., Alkandari, F., & Alrashedi, S. (2020). Sign Language Recognition Using
Deep Convolutional Neural Networks. IEEE Access, 8, 190057-190068.
[10] Li, S., Liu, Y., & Chen, Y. (2021). American Sign Language Recognition Using 3D
Convolutional Neural Networks. IEEE Access, 9, 23363-23372.

Dept. of CSE., GPCET, Kurnool Page 52


Plagiarism Checker X Originality Report
Similarity Found: 23%

Date: Friday, July 21, 2023


Statistics: 1165 words Plagiarized / 4665 Total words
Remarks: Low Plagiarism Detected - Your Document needs Optional Improvement.
-------------------------------------------------------------------------------------------

A Project Rept on CONVOLUTI – AMERI SI LANGUAGE TRAON SYSYTEM i s sumitted


inartial f umenut for the aw ardegree of MASTR OF TCHNOLY i n COMPUTR SCIE AND
ENGINEE o f JAWAHARLREOGIT ANANTAPUR, ANANT H APUR A : 21AT 3 ) Undidce of
Dr. M. R R Prosof CSE GPIAHCOLEGOENGERI HLOY Depaf C S E DEPARTMENT OF
COMPUTER SCI AND ENGING G.

Pullaiagineeringloy ( Accred bC App bEHI, Permantly Affiliatedto JNTthapramu Recogn b


ud IStified istitution Nandotku Venkallrn - 518452 ) 202 1 - 2023 G. Pullaiagineeringloy (
Accred bC rovedy AICT, NEW DEL Permantly Affiliatedto JNTthapramu Recogn bd IS
9001:2001Cern Nandotkuayapi, Kuool - 518452 ) DEPARTNT OF COMPUTR SCIE AND
ERING DECLARAT I dethat project tl “ CONVOL ONAL NET – AME SL ” done me under
guidancof AND ERING .

is recoof bonafide work cart by me and thee sultmbodis projeave not reped or copied
anysourcThe s ed thithesis not submted G. Pulialege oineering d Technoly (Automo) (
Acced by NAit Approvy AICI, Py A JNTUA, Anamu, RGC under 2(f), IS:2001Citon
Nandikotayapall - 51845 ) DEPARTNT OF COMPUTR SCIE AND ERING CERTE Ts s o rtfy
ha t dirtaon eited “ Con volonalNea lNe – Amercan Si LanguageTratiSyste G.

Paiah ACKNOWLEDGEMENT I take s ty express deep ense f tude my Dr. M. KUM , P


thertmeomputcien co - operon and encourfor tful complon of thi project . I am also very
thankful to our Head of the Depa r s . M . SILAKS for his kind co - operaticourment for
the succomplo n of this project I also ful althe T eachg and Non - Teachg taff membeof
d ment o render co - op ratis project . F I wh tude unflaggisupport patie my PARE for
guidance and agehis di By Y.
RAJI ( 2D58 3 ) ABST American ign e on erpretithe language text speech, facilate unicaon
df - mutpeople ple. task a a social pact is challdue the exit and variatiin acti
Existmethods Suse - crfeato cribe lnguage ion buil classificatimodels ed those es. wever,
difficultto relifeto adapt the argeva rions hand estures.

approachthiproblem, proposa ional neural netwohich extracts discrimve spati - temporal


fes from raw vids automcallwithoutany knowledge, ng ng To oostthe e, mul - channels
of video sformation, depth clud body joiti used as inpuo the CNN in order to intr, depth
and trajectory infon. We valie proposed on read ataset ected Microsoft and
monsiteffectie over the traditroed on hand - crted featur CONTES Pae.

No Abstract Conten Lres an Tab CHAPTR 1 INTN 1 1.1 Introducti 2 CHAPTR 2 LITRAT
REVIE 3 2.1 Litey 4 CHAPTR 3 EXISING SEETE 6 3.1 Existystem 7 3.2 Pystem 7 3.3 Use
Cram 8 3.4 UML Diagr 8 3.5 Modul 9 3.6 Hardwequirnts 10 3.7 Sequirnts 10 CHAPTR 4
IMPLENTAT 11 4.1 Download of P 12 4.2 Installatiython 13 4.3 Verify the Pyttallation 15
4.4 Cython IDLE works 16 4.5 Data Colon 17 4.6 Data Preprong 18 4.7 Data
Augmentation 19 4 .8 Model Arecture 20 4.9 Model Traini 20 4.10 Model Evaluation 21
4.11 Deploym 22 CHAPTR 5 SO 23 5.1 preprocaset 24 5 .2 [Link] 25 5.3
convert_test.py 29 5.4 [Link] 30 5.5 [Link] 31 5.6 [Link] 33 CHAPTR 6
EXPERIMEULT 42 6.1 Experimental Result 43 6.2 Applicati 48 6.3 Advantag 49 7.1 Con 51
7.2 Fcope 51 BLRAPHY 52 LIOF UR TABL S Name Of thre Page n 1 3.3.1

Use Cram 8 2 3.4.1 UML Diagr 8 3 4.1.1 Pbsit 12 4 4.1.2 SP 12 5 4 .1.3 Executable P 13 6
4.2.1 Pon 14 7 4.2.2 Installati of Pyt 14 8 4.2.3 Snstallatiython 15 9 4.3.1 Cand P 15 10 4
.3.2 Version ch python iand prompt 16 11 4.4.1 P 16 12 4.4.2 S 17 13 4.4.3 Rhon IDLE 17
14 6.1.1 Main files of projec 4 3 15 6.1.2 Dataset 4 3 16 6 .1.3 Images of H ler 4 4 17 6.1.4
Images of D ler 4 4 18 6.1.5 Rhe project usiand prompt 4 5 19 6.1.6 Executic 4 5 20 6.1.7

Home page oct 4 6 21 6.1.8 Pr 4 6 22 6 .1.9 Per lett 4 7 23 6.1.10 Adding of letters s st 4
7 Convoltionral Network – Ame rican S Lguslation S CHA INTUCTIN Dept. of CSC P 1
Convoltionral Network – Ame rican S Lguslation S Dept. of CSCage 2 CHA INTUCTIN
1ntron Slanguage, on the commonmeans hearingim people, varions of hand -
shovement, and facial essiSitis collvely t information hand - s hapes aeme
languagecognitllerenging This proposes effve cognitmodel translate language o or
speech ordto theaimcomcate normal pethrough language.

Technicallspethe challof language cognitliin developi - shapes andion trajectory. In


particular, hand - shap descriptioinvoltrachand vid stream, egmenting and - shapeim
from complackgroun fr gestuecognit Moti is also related to tras and cue matching.
Although lot of rese works have eenconductedon two uesfornow,itis llhard ssfying r Son
and occlusid body joiesides, it is a nontriviue to egratethe and - sfeaturand ajefes
addressthese difficultwe evelop Cto y ehand - shapes, ajecof on facial on.

adof only d imas networks ke 2], tcoloims, imandbodskeleton ages ult aneouslas which
are provided by Kinect amotsensowhich colostreamand epth aWitthe publiDK, the body
jointons can be obtained in real - tiig.1. Therefowe e nect capture recosign datet.
changof color and depth in pixel le useful infoon to discrimreons. And the on body nts
tidim depict trajofsign ons. mulple of sas ads N s atton the ange in color, but also iry.
Cional Neural Ne – Ame Sanguage Tr S em Dept.

of CSC P 3 CHA LIERATURE REVI Convoltionral Network – Ame rican S Lguslation S Dept.
of CSCage 4 CR 2 LIRATUREREW 2teraew 2.1.1 Title 1: Acanignanage Worditiong Satio -
Temp Pros od Angleqenearng App Auth Chamngthai Department of El and
TeleunicatiacultEngineering, King Mongkutangkok 2022,Thail Conten The incrediblatton
human - compintcti(HCmhuman ands the and medium express entifor y ctiactiies It
tothe evelopof rous ems assign nguage i on, robotics, medical distihers.

To s, Fproposed De usingleap ion oll(LMCsensorfroa and with - directilong short - term


(Bi - LSThere, eB - LSarchits uld more potential for the dynami sign cognitLR). 2.1.2 Title
2: An Adapfordle, Op - Srce Robotic Han for Deaf and - Bn Commu Usinrican ignanage
AuthStha son Geng Todd Johson ChiBini Novemb5. Conten Deabliis zed a ctrum ranges
partial and/or hearing pairment cete the senses. Nonal altand onal Examions survey that
ran from 2006 t 1.5

mie some e on d loss To unicate, af - bliindiduals the Snatiof Brailrican Sn Language L),
and tle alphabets. 2.1.3 Titl e 3: User - Ineenencanign Lguhet Recogn B on DepdNet Fea
Authaleh Aly , Sltaneper. Department of on Technology, Majm University, Majmaudi
Arabia Conten S natural and effve way founicati and Ameican ign age(ASalphabrion i.e.,

ell using - less visiensor is a chenging tasculton and ancvariatins signers. color - based
language ion systuffer fy challas com d, h Dept. of CSC P 5 Convoltion u e twork -
Amignanage Tran S m segmentatintra - class vons. In tropose aew user indep
recognitstem for Agn languagphabet usimages capturhe low - cost Mct depth sensor.
2.1.4

Title 4: Acanignanage Worditiong Satio - Temp oral Prosod Angleqenearng App Auth
SALA , ABDULAHI , KHAI Conten Mostof avail mericanSLanguag L) sh ilchacterist These
araics eusuallduring trajectory yields miy ues hinders tous catiHowevrgnitof ilASL co n
translation algoritsclassificatis paper, based on fast fisher vector (FFand - direonal -
STemem(Bi - LSmethod,a e ase f dynamisign rds ecion hm ebidional ong - shot memory
- fast fisher vectoF - Dept. of CSC P 6 Dept.

of CSC P 7 Cional Neural Ne twork - Amign Lanslation S tem CHA EING MPROSE 3isngs
• In existstem B - DirectiTM isexecre itore challn trainiata sets. • Fasets, tn resultong
processimes and sll tng and inferen • Bi - dirLMs additional to model, can e its exity d e
harderto pr optimi - dironal ST e sever ameterthatneedtobe nedcary eve ood or,
including numbofhidden learnirand dr trthe hyperpars time - consumingand equires eful
imentation.

Additionally, there may be variabilierent signers perfm the same sign, whic can also ruce
the r bi - dirST 3pos d stem • Around of wopopulation suffeng deaf hard Hea losscan
signiantimon comcatiand cess formation. to lcess ty face bers to educon and social
exposu r • The systis compland ex with s grammsynt and voc which ovides 97%
curawhen compa to ing algorit and not siy a visual repreh or ae. • On otherhand, other
arareconsi that e ati tiel prong (classifying more thanes at a ti, the per foe of CNN is far
ber thans counterpart.

• The reay bo an ime in tno, It is conat t systctify the type of signsr gestures. Dept. of
CSC P 8 Convoltion Dept. of CSC P 9 Con volual Neu e twork - Amignanage Tran S
3dules ? Ten TensorFlis free nd - source arlifor and enti programmacross raof It a c liand
also for learning applions such as neural networks. It is used for both researd producti ?
Panas Pis open - sourcePLibrary high - performandata and analysi.

Using Pcomplcal steps in the processis of data, regriepar mani pulate, model, and ?
Matpl Matplib a ython plotng brary publiatiqty in variety of hardats and intve envirootl
in Paate plotograectr a, baarts, errts, scattr ploth just . ? Sit - learn St - learn a
osupervisanunsupervised arningalgoritvia consiintce PIt liundapermisssim ified Sliand
distinux diions, ademicoercial use.

? OpCV OpenCis rful omputvisilithcanbe ed s age processitasks,including ASL ion .


Whilitis ble usOpV C r ASL cognititis portant notethat enV primarily esignedforcomput
on tasks as age ng, detectiand ratherthan learning. C olut ra l Ne - Amign Lanslation S
Dept.

of CSCage 10 3REMENTS ? PC - 5200U ? R ? Hard Disk ? Integrated Cam 3 SOFTWARE


REQUI ? Operatiystem:ows 10 ? IDLE: P E C oluttwork - Amign Lanslation S Dept. of CS
PET, Kurnool P 11 CHA IMENTATIN C olut ra l Ne - Amign Lanslation S Dept. of CSCage
12 CHA IMENTATIN 4wnld otho Down the Correct v ine system Ssittall Ping Google
Chrome or any other web browser. OR Cck oowing li htt/[Link] Fython Official
Websit Now, check foand the corersion foati tem.
Sheect thonon in Yellor or you can scroll fuwn and clid with respectiheir e, we
downloading t rehon version for 2 Fpecific version of P Con volual Neu - Amignanage
Tran S Dept. of CSC P 13 Scroll down the page untihe Fes opt Se a dient version oalong
witng em. F .1.3 Executable filehon • To download Windows 3 - bitect any one he
threons: Windows x86 embeddableallr Wi webbased inster.

• To download Windows 6 - bit can selfrom te opti Windows x86 - 64 embedx86 - 64


executall x8664 web - basall Here w inst Windows x86 - 64 web - basallr which of to
downloaded complNow e ahewith second part in ing python iati Note: To know the
ches or updates that ade in tanck on the Releas Note Option. 4 Italtin otho Sadeo
carryhe instati process. Convoltionral Network - Amignanage Tran S Dept.

of CSCage 14 Fython application Sefore you clion Install Now, Make sur a tihon 3.8.2 to
PATH. F of Pyt Sliallonuccessful. Cck on C Con volual Neu - Amignanage Tran S Dept. of
CSC P 15 Fuccessful ion of P Witee sthon ins sucy correy insted Phe tiallon. Note: The
instatiightf minut 4fython Inslao Slitart SRomm “cmd” Fommrompt Convoltionral
Network - Amignanage Tran S Dept. of CSCage 16 Sand prompton.

Sest whether the python isectl insted. Type python – V and prer. Fg of python iand
prompt S willhe answer as 3.8.2 N If you havthe earliethon already insu musnstall the
earlieen inst the new one. 4w the Pyn IDLE works Slitart S In the Windows un commdle”
Fython IDLE Sliython 3.8.2 64 - bitunch the progr Sworking ist first se.

Ccilli S Con volual Neu - Amignanage Tran S Dept. of CSC P 17 Faving of a file
Convoltionral Network - Amignanage Tran S Dept. of CSCage 18 ? Furces of ASL vidos tr
ate the sio include in your dataset. You can use onliresouas ASL dionaries, insonal vis, or
even YouTube channels tcus on t ASL.

? Once you hav the videos you want t need to ext randivi frames fl allow you to create a
dagesent each si ? Label eaage with te corrng ASL sigs manuay or use tool such as
LabellG Image ? Organiz your labed dnto tng, valion, and test sets. This wil your model
can genalie wella. ? Augment your dataset by to your imflippi adding noil ie th e size
otaset and help your mode robust. ? P - procet he imd normalizing the pixel values.

This wil help your model proche data more ciently. ? F your dataset iee d ito your C and
fine - tune itsrpa ites good accurr vdati Overalleng data fating Ccarnning and atton to
detailver, with the rightacreroburate model that can translate int 4ta Prepross Data
prepessia cep in developia Cional Neural ) model for Ameign(ASion.
Here ae steps r data prepng: ? Image Resizin Rg imstent si s essential for [Link]
shoulhe so ensurrocss t size of the imded on the aecturNN m availing reso ? Image
Norzat ion Normalizing pixel valueshe CN conveaster during trainies shoulreen 0 and 1
o - 1 to 1. ? Data Augmtation Daon helps ihe sizdataset by creatiw ex amplfrhe
existechnique can hn im model’s abily to generalia.

Teon, scng, and flippi can be used to Con volual Neu - Amignanage Tran S Dept. of CSC
P 19 ? Slig Data: Sttinto tng, v alion, and test sets iportant t evaluate the model's
pormance. The traini is usrain tdati set iameters, nd the test ss used to evaluate the
model's performan ? On - Hot Ecodg: One - hot encoding i convert the cal l signso
numerical values.

Each siassia unique numeric, which is u to the C ? Class Bcin Bhe number of exes for ea
cann trainiore entlof examplr eas is imay become birdclaso poor perfoe for theclasses. ?
Data Normalition N lihe data is io ensure that the es have the same scals the CNN
model iessieffi effvely. In summary, data prepessia crucial stdevelopiNN mode translatio
n.

Bhe above steps, you c ensure that your model i highqualiteves good acacy on thdati 4
Augtin Data augmenton is portant tmproving tanc Cional Neural Ner American SageL)
translation. Here e on data augmentatiques that can be us ASL datasets: ? Rot ation Rhe
imelp tl lcognize signsffe angles, which cpros robustness tion ition. ? Sn Sng the images
can help tel re of diffent si es, which can ims abily to generalie to new d ? Flipin Fpping
the images horizontally celp tec that are mired or prmehe opposi ? Cropin Che mod
focus on t im the image, such as the hnd and the signed word, ng disng background
details.

? Addg n Adding noishe imehe model learn to reognize signs low - li which can imve ito
real - world conditi ? Tran: Translating tages can hhe cognize that are performed in
dirent poions oons, which can ims abily to generalie to new data. Convoltionral Network
- Amignanage Tran S Dept. of CSCage 20 ? Color jitterin Cttves randomling t
saturatieages.

This can help te model learn to re signs differightprove iness t - worl These technique ed
indinatia et for trainidel. Bhe data, the model cano recogniz under a vaiprove iacy and
ss. 4del Archi The arecture oonional Neural NetwoNN) model foeign Language L)
translaticallstsay ? Inu This lakes the inputage, whypicall d normalized.

? Convoltionayers: These laye filtown as krnels) to t inputage to extract ies. The arnrai
number and sif the fiers can vaon the a rchitcture of th C ? Poolig layers: These lae the
outional layers to redu the spatize of the ure maps and hhe model learn moratures. ?
Fuy conected These laye flat of the last png layer and apply a set of densy connectedrs
to clafy the imo the cng ASL sign.

? Out layer: This l produces the put, which is a probabily diston over the differ t ASL
signs The rchite f NN cvdon e cifiproblemand availdSpoaecturthat busfoASL include
LeNet,et, VGaRchoiceofarecturedeponfactorssuch as size thedataset, e
vailcomputresouand dauracy speof the In ary, Cmodel tratypiy sts several including ional
poolilayers, nd conn ected eThe e architre ends the pandavaildand e is to classify the
inputage intng 4delni Trainionvolutetwork (CNNmodel for Ameign (ASnvolves severeps:
? Data Prep: The irst sto prepare the aininvolves preprocng the imagesitng the data into
tng, valion, and test sets, and onehot enco ding te Con volual Neu - Amignanage Tran S
Dept. of CSC P 21 ? Definelitecture: The next stto define the aecturof the C model.

This ing the number of lays, the type of layers, the number of filters for eer. Theectud be
c ased on the spic problem and avail ? Compe the Mod Thep is tle the model, which
invols selecti functimind any metrics that wile the peformanc model during tng . ?
Trainel: Th then trained on thng set usi() method of the Keras liary.

Duringng, the model iively updates the weight and biases of the filinize the loss functi ?
Eate the Mod A r trainis edatin set t ite. This iolves calculatisuch as accy, precisiall the
model i performi, the architcture rametaned and the trainiepe ? Test thel: One model
has been tand evaluat can be t set tmate of i - world prmanc ? Fin - Tuin If the model i
performing wellon the test st mry to fine - tune the hypram the eel.

? Sel: Onche model has been td and evaluat can besk for later us In ary, ng Cmodel ASL
anslation ves ring data, defining the model arecture, comping the model, trainivaluating
the model, testing t - tunif necessahe model to di 4 Mo Evuan The aluation
CvolutNeural rk for Am ign Language L) ion callinvolseveral etrics assitperformanHe are
somcommed metrics: ? Accu: Accy the of y imover totnumber of ages.

is commmetric evaluaticlassificatimodels, itcan mi ing iis i ? Precisi: Pon is the proportion


of true posives (corry classified ASL signs the al of cted ti(allpredictsignsIt es pre the
model's predictions a ? R ecall Rcallis proof posives the al r actual ti (all). w welldel i
posive cases. Convoltionral Network - Amignanage Tran S Dept. of CSCage 22 ? F1
score: Th1 score isf pron and re. It is a balanced m that takes boton allo account.

? Confus A matrix a that the of posives, false titrue ati s, false egaties e sign. be to
identify which simostsclassified by the model. ? ROC rve: TheReiver eratiCacic curvia al
represof trueposive (re) sus false tirate(1 - specificit at diffatids. It ce usede ove
pormancee model and to choose an omal classificatishol The metrics hould chosen othe
cific athe able data.

generthe cuprecisire, d 1 are onlused for classificatimodels, hilthe matriand OC can addit
insinto tce. In summNN model for AL translation ty involves severa metrics,
accurprecisirec, 1 confusion , Rcurve. These metrics can be usehe overalle of the model
and to identify areas for im 4 Deplment Deploymonvolonal Neural NetwoC Americign
Language (ASnvolves making the model avairoduct Here e geneep invol ? Eort thel:
Thfirst so export the trained model from teent environment.

This usy involn a format that ce loaded by the productisuch as a KDF5 filow Saved ? S
the Prodctionnmen The prodon environment neede set up t r un the deployed model.
This mve insting the necessarencies, configuring harde resettrity pro ? L the Mod Once
ton environmes set up, the exported e loaded intong the appropriate librfra ? Inu The
inputo be pcesatch the format xpected by the model. This mve scalion, a - hot encoding.

? Runiction The preprocess data cn then be fed intded model, and the model can gene
phe inputta. ? Returnlts: Thdictions can then be rned to tegted wit syst Con volual Neu -
Amignanage Tran S Dept. of CSC P 23 The deploymNion requires cful consion of the
productient, data inputt. It isportant t model isate, ent, and scalable to met te int d Dept.

of CSC P 24 Cional Neural Ne k - Amign Lanslation S CHA SURCE C Convoltionra l N e


twork - Amignanage Tran S Dept. of CSCage 25 Cional Neu ral Ne - Amign Lanslation S
Dept. of CSC P 26 for hand_landn resultti for i in range current_asar[han[i]
hand_landmarks.landmak[i].y, hand_landmarks.la], dtype=[Link] whole_datasarr (curent_fil
whole_dataset_Y.(main_direak print(np.arrdataset_x).shap print(np.arrataset_Y).shap
[Link]({'X':[Link](whole_datasp.arrayaset_Ye_path, 'wb')) if __name__ == "__m__":
preprocaset(train"cerain.h5') preprocaset("davaliesse 5 im
[Link]['TFP_MIN_LOG_LEVEL' imme import cv2 import num imfrom
[Link] def detee(image, mp_drawing = [Link] mp_drawing
_styl.solons.drawing_st mp_hands = [Link].h

Convoltionral Network - Amignanage Tran Convoltion Neu etwork - Am nignanage Tran


Convoltionral Network - Amignanage Tran Convoltion Neu etwork - Am nignanage Tran
Convoltionral Network - Amignanage Tran Convoltion Neu etwork - Am nignanage Tran
Convoltionral Network - Amignanage Tran ) [Link]'] = [Link] [Link]('sequl_m
[Link](liory.itequencil_m) if __name__ == "__m__": train_m [Link] im
[Link]['TFP_MIN_LOG_LEVEL' '3' import sys i tkiport * from tkix im tieep from Pport
ImageTk, Image from dmporame [Link] as tkMox f rom
[Link] Convoltion Neu etwork - Am nignanage Tran Convoltionral
Network - Amignanage Tran = Bon(cntct,text = "HomePage",font = ("Agency
Convoltion Neu etwork - Am nignanage Tran Convoltionral Network - Amignanage Tran
Convoltion Neu etwork - Am nignanage Tran Convoltionral Network - Amignanage Tran
Convoltion Neu etwork - Am nignanage Tran = ("Agency F",16,")).place(x= predicted_
string_l= edict_st , = F",16,")) predicted_stace(x) add_stuttict_stext = "ADD"cy
F",16,"),widtef=Frdeh='0',b g="#787A91044C actickground="#EEvefo=
"#141E61",command=o_st add_st=600, remove_string Bon(ream, = R= ncy
F",16,"),widtef=F rdeh='0',bg="#787A91044C actickground="#EEvefo="#141E61",
comme_from remove_string.place(x0,y 600) predict_stainl() ''' MENU B aboutusuttroject"
F",16,"), width=12,reli=Fwidth='0',bg=044Cvebackgroun d = "#EEEEEE",avefomm)
aboutusace(x=784, y 40) contactusbt= utt= Us",font ("AgeF",16,"),
width=10,reli=Fwidth='0',bg=044Cvebackgroun d "#EEEEEE",actieground= ctus)
contactusbt(x=92 y 40) btn Btt= ",font ("Ag F",16,"), reli Fbd 0,
borderwidt="#787A91",fg="#0F",acti= Convoltionral Network - Amignanage Tran =
Bon(window,text = "LIVE S = ("Arial
Narrow",18,"bolh=20,reli=Fborderwidt="#787A91",fg="#0F 4Cvebackground
"#EEEEEE",actiregr) li ace = 350) [Link]() #HomePef LoadingS(): root = Tk()
[Link](bg="whit [Link]("Loading - Ameign Languag im(r"Ima \ \ [Link]"
imhotoImage(img) panel = Label(root, i=im [Link]", fi="bot user32 =
ctypl.user32 [Link]) [w, h] = [useystemMetrics(0), useystemMetrics(1)] lt ] a =
str(lt[0]// - 446) b= str(lt[1]// - 383) [Link]("1214x68+"+b) [Link] for i i
range(40 Label(root, bg="#EEEEEE",widt,heighte(x=*25,y=600) def play_animon(): for j
i(40): Label(root,bg= "#141E6h=2,heightce(x==600) sleep(0.07) root.update_idletasks()
else: [Link]() HomeP [Link]() play_animon() [Link] LoadingS() Dept.

of CSC P 42 Cional Neural N e twork - Amign Lanslation Conv olual Neu - Amignanage
Tran S m Dept. of CSCage 43 C ol uti e ural Ne - Amign Lanslation S CR 6 ML RES
6perilul In this project usiNN we aecognizing Agn languagovement and to train Cre
usiollages shown in below scre Fain files of project The above pictucts about tanguage it
- Dept. of CSC P 44 F .1.3 Images of H let Fter Convoltion eu - Amignanage Tran S Dept.

of CSCage 45 The below is he first prong in command pro: Funning tng commmpt Then
afte the coand in command prompt ithe sign page as shown b Fon of project
Convoltionral N e twork - Amignanage Tran latio nyst em Dept. of CSC P 46 Fproject The
above is taand fan proche recognit sign l Frediction of a letter Convoltion eu -
Amignanage Tran S Dept.

of CSCage 47 Frediction of another letter F .1.10 Adding of letters as st The abovee the
picturs of the sign languagecion which are recog a and ed the which being to webcam
itdisplthe desired letter. Dept. of CSC P 48 Convoltionral Network – Ame rican S
Lguslation S 6catio The applions ofican S(ASing Chm are divee - rging.

Here ae ocons: ? Eu: ASL translation systn educatings tf and hard of hnts wi - titiectur
discussi ? Health ASL trtiems can be used hcarings t effve comm icatibetween
hhcafeonals and deaf oof hearing patihat teive the bsible. ? Customer servic ASL ystems
can ber see setto facilate common eaf of ers stom service resentatives, isfon and abily.

? Bsin ASL translation sbusiings tve common between ard of he mpleng to create ve. ?
Etertaint: ASL traystems can be uhe entertainment iry to provide deard of hviduals
witccess tve perce ? Ecy s: ASystems canrgency srvices setto facil itunicatieafearing
indiduals and emergenesponders, eeive hey need in mely manner.

OverallusingNN algorite of caons t greatl nehe deand hard of heunitng them to


communi effvely iriety ofings. Dept. of CSC P 49 Convoltionral Network – Ame rican S
Lguslation S 6ntaes The advantag Ameriign Language Ling Chm are signiand wide -
ranging. Here somf the key adv ? Accu: Calgori been to highleffectiin ely recognizing
ting hand usedin resultin y e translations ? Ecy: Chy efficient in recognizing hand
gestures a them o ten spoken making AL systfaster more eff ve.

? Accessiil ASL anslation NN hm greataeslito deaf and hard of heunitng them to


commecti variety of sett ? Flexibity: ASL translation systNN algorithmf applions,
eduatisettto er e emeency ces, providing greater lili ? Sil ASL systusing NN lgoritcan easil
up accommlavolof making highlscalable to variety of sett ? Cost - efctive: ASL anslation
NN hm a - effeve ion comparedto profonal inters reveryintctimakiita affordny organizati
Overall translon NN hm rofadvantthat an greatlfit the deand hard of heunit abily,
efficiency, a flexibity isett Dept. of CSC P 50 Convoltionral Network – Ame rican S
Lguslation S CHA COON AN FUTUOE Dept.

of CSC P 51 Convoltionral Network – Ame rican S Lguslation S CHA NCLUSIDRE SCP


7nclo The use of convolutgorithman S (AStranslon shsignip recyeCN b an ectitoolfor ure
ctiand on hand , bliaccurat translation ASL o ten spoken aThrough e lion deep arning
techniques, the accuency of ASL traems have been greatlproved, paving way fuadvanin s
.

devent systhas the to y the af hof commy, increase accbily and common opti As arch
devent thiarea nuitis kely we eve more stiandaccu translatisystemergWitthe
availitchnologirces,bilies for imanslation are endle can xpect tright future for thi logy. 7
The e AmeSLangu(AL) usingCN hm is andpromisih nued ancemnts
deeplearningandmachine arning, is kely ASL slation ems even oracate, and userfriendly.
Onal ar of dent is the use oeal - ti ASL tems. Cy, ASL nslation ems the to cord upla of r
sign whichis transla ted o ritten spokenlanguagHowevewith developmof - titranslation
ems, would able commmore fluidland y r - ticonvons. could f nd ohe indito these es
ease d cy. Additiy, is need r ASL translation sthat arecificalled to rcts anons ig language.

ASL ariesby and itis portant developtra nslation ems that are arate and urallive.
Overallthe e eof translation g NN hm promisiand with nued rch developmwe n to
signiaadvancements thihat wil greatlhe deardcommy. Dept. of CSC P 52 Convoltionral
Network – Ame rican S Lguslation S BIIOGRH [1] Jin CMH.

A mobicatirican siage translation viage prong algoritP - 2016 IEEE Rion 10 Sium TENS p.
104 - 109. [2] FJL, Cpa AEG, Enciso Ron ional neural n static hand gestures rgnitent
iatures. Peedi IEEE 24th Internonal Cectronics, ned Cing, INTERC p. 5 - 8 [3] Kumar P,
Gauba H, RPoupled HMM - based mul - senso for sign recognion. Pern Recogniters.
2017; 86: p. 1 - 8 [4] Hu, J., S., & Squee - and - Exciton Networks.

IE Ce on Ceron and Pern R ecoion (CVP). [5] Huang, J., Zhou, J., Xu, W., & Huang, Y. ign
Langu Rion Bonvolutwork. IEEEcess, 6, 5 - 57702. [6] Ding, S J., &Sign Languag R ion
Usiable Sensors. IEEE Tracti - Matems, 49(4), 315 - 326. [7] Aly W, Aly Sairi . User -
independerican siage alph recognitepmage and feEE Ac 7: p. 123138 - 123150 [8] Ye, Y.,
& Hu, J.

(2020ign Langu Rion witign Language Alphabet and Actiecion Network. IEEE Access, 8,
1710 - 1718. [9] lkandari, A.,ari, F., & Alrashedi, S Secognitng Deep Cional Neu etworks. e
- 190068. [10] Li, Shen, Y. (2021). Amign eion Usi Cional Neural Neess, 9, - 23372.

INTERNET SOURCES:
-------------------------------------------------------------------------------------------
0% - Empty
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
0% - [Link]
International Journal of
INTELLIGENT SYSTEMS AND APPLICATIONS IN
IJISAE 1
ENGINEERING
ISSN:2147-6799 wwvv.l][Link] Original Research Paper

Enhancing Colla borative Filtering with Multi-Model Deep Learning


Approach
1 2
M. Rudra Kumar, Yerramala Rajesuari, 3M. Sri Lakshmi, 4Dr Prem Kumar Singuluri, 5G
Sreenivasulu

Submitted: 27/02/2023 Revised: 17/04/2023 Accepted: 10/05/2023


Abstract: Recommendation systems have become increasingly popular in recent years due to the rise of large-scale online platforms that
generate significant amounts of user data. However, traditional collaborative filtering methods like matrix decomposition have limitations
when it comes to learning from user preferences, especially in situations where data sparsity and cold start problems exist. To address
this, explicit feedback-based recommendation systems have gained attention for their ability to overcome these limitations Explicit
feecback-based systems use user feecback data such as ratings, clicks, and purchases to make personalized recommendations. A proposed
solution to improve the efficiency of collaborative filtering is to combine the Deep Auto-Encoder Neural Network (DeepAEC) and
One-Dimensional Traditional Neural Network (ID-CNN) approaches in a multi-task learning framework. This approach aims to address
the limitations of traditional collaborative filtering methods by leveraging the strengths of both DeepAEC and ID-CNN. Specifically,
DeepAEC can be used to capture high-level representations of user preferences, while ID-CNN can be used to learn more specific, local
patterns in the user-item interaction data. The multi-task learning framework allows these two approaches to be combined to improve the
accuracy and efficiency ofthe recommendation system.

- Recommendation systems, Deep neural networb Collaborative filtering Multi-model deep learning, Explicit feedback

1. Introduction nebvorks in addition to traditional collaborative


filtering methods to map user and item affributes.
Recommendation systems have become increasingly
popular due to the grouth of online platforms The objective of recommendation systems is to provide
generating significant amounts of user data. With a pelN)nalized recommendations to users, which can
large number of items availab le, it is challenging for significantly enhance their user experience. Traditional
users to find relevant content, and hence, collaborative filtering methods are limited in their ability
recommendation systems have become an invortant to learn from user preferences, particularly when data
tool to solve this problem Traditional collaborative sparsity and cold start problems are encountered.
filtering methods Ike matrix deconvosition have been
Explicit feedback-based recommendation systems have
widely used, but they face limitations in learning from
shown promise in overcoming these limitations.
user preferences, particularly when data sparsity and
However, scalab ility and data availability issues affect
cold start problems arise. Explicit feedback-based
the effectiveness of these methodologies and limit the
recommendation systems have gained attention
because of their ability to overcome these limitations. applicability of their findings. Therefore, there is a need
In this study[l], we propose the use of deep neural

1
Prafessar, Dept. af CSE, G. PuJÆaiah College af Engvaeermg and
Technology, UumaaÆ, Andhra Pradesh, India
Email: mmdraktonan@[Link]
2
M. Tech Studert, Dept. af CSE, G. PulJaiah College af Engvaeermg and
Technology, UumaaÆ, Andhra Pradesh, India
E-mail: raiyerramaÆa@gmaiÆ.cam
3
Asst. Professor, HOD-CSE, Dept. af CSE, G. PuJÆaiah College aj
Eng meering and Technology, KurnaaÆ, Andhra Pradesh, India
E-mail: srddcshmicse@[Link].m
4
Sr. Professor and Dean Innmmians , Dept. af CSE, G. PuJÆaiah College ofEngiæering and Technology, UumaaÆ, Andhra Pradesh ,India Email:

dean@gpcä.[Link]
5
Asst. Professor, Dept. afCSE, G. PuJÆaiah College af Engineering and
Technology, UumaaÆ, Andhra
E-mail: Nasulu4u@gmaiÆ.cam

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-12 |
for more effective methods to enhance the performance recommendation algorithms. The methodology includes
of recommendation systems[2]. data preprocessing, splitting the dataset into training and
testing sets, using a deep autoencoder neural network to
The main challenges in enhancing the performance of
learn latent features of users and items, and combining
recommendation systems are the limited ability of
learned with collaborative filtering to generate
traditional collaborative filtering methods to learn from
recommendations. The proposed algorithm has the
user preferences and the issues of scalability and data
potential to be applied in various fields such as
availability. Data sparsity and cold start problems pose
ecommerce, social netvvorks, and online advertising.
significant challenges that must be addressed to provide
accurate recommendations to users. The multi-model deep In their paper[4] present a novel neural network approach
learning approach proposed in this study addresses these to collaborative filtering for implicit feedback data. The
challenges by comb ining user and item fi_lnctions to authors acknowledge that traditional collaborative filtering
produce a hybrid recommendation system with invroved methods have limitations in dealing with inv licit feedback
performance. data, such as data sparsity and cold-start problems. To
overcome these limitations, the authors proposed a neural
The main objective of this study is to propose a multimodel
nebvork-based approach that considers not only user-item
deep learning approach that comb ines traditional
interactions but also auxiliary information such as user and
collaborative filtering methods with deep neural nebvorks
item attributes.
to enhance the performance of recommendation systems.
The proposed approach aims to address the issues of The paper [5] presents a comprehensive survey of
scalability and data availability and overcome the collaborative filtering (CF) techniques for invlicit
limitations of traditional collaborative filtering methods in feedback datasets. The authors provide an overview of
learning from user preferences. The effectiveness of the the prob lem, including the challenges and limitations
proposed approach is evaluated by analyzing three models associated with the implicit feedback datasets. They
that produce a wide range of outcomes from a single review a variety of CF methods, including matrix
real-world dataset The objective is to demonstrate the factorization, neighborhood-based CF, and deep
potential of the MMDL approach to enhance learning-based CF. The survey also covers recent
recommendation systems with explicit feedback. advances in the field, such as the use of graph-based
approaches and hybrid models that combine multiple CF
This study proposes a solution to address research issues
techniques. The authors discuss the evaluation metrics
related to noisy invlicit feedback signals using I)NNs. To
used to measure the performance of CF methods,
enhance the efficiency of the collaborative filtering
including accuracy, coverage, and diversity. They also
algorithm, the Deep Auto-Encoder Neural Netvvork
highlight the invortance of addressing the cold-start
(DeepAEC) and One-Dimensional Traditional Neural
problem and provide an overview of the techniques
Network (ID-CNN) approaches are combined in a
proposed to tackle it. The paper concludes with a
multimodel deep learning method. We evaluate
discussion of the fi_lture directions in the field, including
models on a single real-world dataset and show how
the need for more research on the interpretability of CF
well DeepAEC and ID-CNN function as collective filtering
methods and the integ•ation of context-awareness into
strategies.
recommendation systems. Overall, the paper provides a
The remainder of the paper is structured into the following convrehensive overview of the current state of the art in
key sections: Section 2 presents the literature review, CF for invlicit feedback datasets and is a valuable
Section 3 outlines the proposed method, and Section 4 resource for researchers and practitioners in the field
presents the results of the testing conducted. Section 5
The author proposed a novel approach to invrove the
presents conclusion of the paper.
accuracy of collaborative filtering recommendation
2. Literature Survey systems a combination of deep neural netvvorks [6]. The
authors utilized a dataset from the MovieLens website to
Collaborative filtering (CF) is a popular approach for
evaluate their proposed algorithm, which achieved
building recommender systems in which user-item
superior performance over traditional collaborative
interactions are used to generate personalized
filtering methods and other state-of-the-art
recommendations. Recently, there has been increasing
recommendation algorithms. The methodology involved
interest in enhancing CF with deep learning techniques.
data preprocessing, splitting the dataset into training and
In [3], a hybrid approach that combines collaborative testing sets, and applying a deep neural network to learn
filtering and deep autoencoder neural network is proposed the latent features of users and items. The authors used a
to enhance recommendation systems' accuracy. The combination of different types of neural networks,
proposed algorithm is evaluated using a dataset from including autoencoders and convolutional neural
MovieLens website and found to outperform traditional networks, to extract the relevant features from the data.
collaborative filtering methods and other state-of-the-art The learned features were then combined with

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 14
collaborative filtering to generate recommendations. The 3. Proposed Work
proposed algorithm has the potential to be applied to
To evaluate the effectiveness of Deep Auto-Encoder
various domains such as e-commerce, social networks,
Neural Network (DeepAEC) and One-Dimensional
and online advertising. It can also help address the
Traditional Neural Network (ID-CNN) approaches as
common limitations of collaborative filtering such as the
collective filtering strategies in the context of noisy
cold-start problem and sparsity in user-item interaction
invlicit feedback signals. This objective aims to assess
data. Overall, this study presents a promising approach
how well the proposed multi-model deep learning
for invroving the accuracy and scalability of
recommendation systems through the combination of
deep neural networks and collaborative filtering

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 15
(MMDL) method works in enhancing the efficiency of
the collaborative filtering algorithm in the presence of
noisy invlicit feedback signals

Fig. 1: Proposed Structure


In this section, a collaborative filtering approach is interacted with. This feedback can be positive or presented that uses
explicit feedback and features like negative, with a higher rating indicating a more positive userID and movieID (itemID).
The approach comb ines evaluation of the item the outcomes of a ID-CNN and a DeepACE model, with
On the other hand, invlicit feedback is based on user
separate evaluations for each model to teach user-item behavior, such as items they have clicked on, purchased, interaction
Section 3.1 introduces the collaborative or spent more time interacting with. In contrast to filtering method and describes
the problem based on ezvlicit feedback, implicit feedback is typically binary, explicit data. Section 3.2 presents the current
technique, where a positive interaction is considered as a preference known as Recommender Net, while Sections 3.3 and
3.4 indication while the absence of interaction indicates no focus on the DeepACE model. The ID-CNN model is
preference. discussed in Section 3.5, and the suggested WIN'IDI_, model is described in the end. In this study [5], explicit
feedback is used, and the
ratings are assumed to be on a scale of 1 to 5. The 3.1
Collaborative filtering highest score, which represents the most positive
The recommendation task in this study focuses on the feedback, is denoted as "very Ike," as shown in Figure collaborative
filtering algorithm's explicit feedback. In 2b.
collaborative filtering, a user's preferences are predicted Invlicit feedback, as shown in Figure 2a, can only be based on the
preferences of similar users. Evlicit observed for selected items and unobserved for nonfeedback involves the user explicitly
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26
providing ratings selected items. The non-selected state cannot be on a scale, typically from 1 to 5, for items they have
| 16

ezvlicitly assumed as positive. Since non-selected items


Mathematically, let U be the set of users, I be the set of
include both items that users are not interested in and items items, and R be the set of ratings. Each rating r(u, i)
that users are interested in but did not find, it can only be represents the ezvlicit feedback of user u for item i. The
assumed that they have a negative tendency. rating is assumed to be on a scale of 1 to 5, with 5 being
the most positive rating. On the other hand, invlicit
In summary, collaborative filtering can be used for both
feedback can be represented by user-item interactions,
explicit and invlicit feedback, but explicit feedback
such as clicks, purchases, or views. A user's preference
provides more detailed feedback, allowing for more precise
for an item i can be modeled as a binary variable, b(u, i),
predictions. The use of exp licit feedback is denoted by a
where b(u, i) 1 if the user has interacted with the item,
scale of ratings, with the highest score representing the
and b(u, i) 0 if the user has not interacted with the item.
most positive feedback. In contrast, inv licit feedback is
binary and can only be observed for selected items, while
non-selected items are assumed to have a negative
tendency.
The passage describes the challenge of training whether an absence of interaction means that the user
recommendation systems when negative feedback is not dislikes the movie or simply has not seen it yet.
present, and introduces the invlicit feedback matrix R as a This can lead to noise signals in the data[8].

user The
passage
Rating
also notes
that the
matrix is
typically
sparse
because
of no Dark
Knight
users only
Movies rate a
1 5 4
small
user number of
movies,
4 resulting in
mostly
Rating 5 envty
cells. This
can pose a
Movies challenge
3 for
Fig. 2: Rating Matrix -2 Simple demonstration of inv licit and exp licit collaborati
feedback differences observed ve filtering
way to represent user-movie interactions in a approaches, which rely on finding similarities between
collaborative filtering approach. users and movies based on their ratings.
The invlicit feedback matrix R is defined as Rmn, where How to (_•alculate ratings:
m and n respectively represent a group of users and
The notation used in the explanation is as follows:
objects (movies). The entry rui in the matrix represents
the rating that user i has given to movie u. If the user has R: rating assigned by user U to item I
not rated the movie, the enh•y is left blank. A value of 1 is
used to indicate that there is a user-movie interaction, U: the user who is assigning the rating
which means that the user has rated the movie. l: the item being rated
The latent features of the users are represented by the n: the number of users who are similar to user
vector xu of dimension n[7]. The matrix x is defined as U and used to calculate the average rating
x[Rma], which means that the latent features of the users
h : rating given to item I by user i
are derived from the entries in the inv licit feedback
matrix R. Similarly, the latent features of the movies are sim(U, i): similarity between user U and user i
represented by the matrix x[Rnb].
The equation used to calculate the average rating is:
However, the lack of negative feedback can make training
recommendation systems difficult because it is not clear R (l/n) * * sim(U, i)) (1)

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 17
In this equation, E represents the sum of the values in Let M be a m x k matrix, where m is the
the brackets. So, for each user i who is similar to user total number of movies and k is the
U, we calculate the product of their rating for item I
(Ti)and their similarity to user U (sim(U, i)). We then total number of features.
add up all of these products and divide by the total
Let mi be the feature vector for movie i,
number of similar users (n) to get the average rating
assigned to item I by users similar to user U. This where -mi = [mil, mi2, •
average rating is then used as the predicted rating (R) Let mij be the value of the jth feature of
that user U would
movie i, where m ij is a scalar value.
assigl to iteml.
2. The u. data file: This file contains the
usersubmitted ratings for the movies in the dataset The
mathematical notation for the [Link] file can be
represented as follows:

Let R be a m x n matrix, where R_ij


denotes the rating assigned by user i to
item j.

Let be the user who submitted the


rating, where u_i is a scalar value
representing the user ID.
The first five rows of the data, as previously mentioned,
contain ratings for movies provided by users. The Let i. be the item being rated, where
dataset convrises 100,000 ratings, and its purpose is to i.
predict how viewers will respond to movies they have scalar value representing the item ID.
not yet seen [9].
Let be the rating assigned by user
3.2 Dataset:
to item i}, where is a scalar
The mathematical notation for the dataset [10] can be
value.
represented as follows:
Let tui be the timestanv of the rating rui,
Let R be a matrix of ratings, where R_ij denotes
where tui is a scalar value representing
the rating assigned by user i to item j.
the time the rating was
Let U be a set of users, where U submitted[11][12].
{ul,u2, ... , um}, and m is the total number of users in
the dataset. Thus, the u. data file can be represented as a set of tuples
{Cui, i}, rui, tui)} for all ratings submitted by users.
Let I be a set of items, where I
Dataset Splitting:
, in}, and n is the total number of items in the
dataset. Once the dataset has been loaded, the next step is to split
it into a training set and a test set Typically, 90% of the
Let be the rating assigned by user u to item
dataset is used as the training set, while the remaining 10 0
0 is used as the test set. This ensures that the model is
Let tut be the timestanv of the rating rui.
trained on a large amount of data, which leads to better
The dataset consists of two files: accuracy.
1. The u. item file: This file contains information Training and test dataset:
about the movies in the dataset. Let M be a matrix of
After splitting the dataset, the next step is to train the
movies, where Mij denotes the value of the jth feature of
movie i. The features can include the title, release date, model using the training set. The model is trained by

genre, etc. The mathematical notation for the u. item file adjusting the embedding vector so that the predicted value
is as close as possible to the actual value. To improve
can be represented as follows:
accuracy, more epochs can be used during the training
process. The loss function used in this case is the

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 18
difference between the actual and predicted ratings for the Encoder: The encoder fi_lnction f takes the feedback
entire training set. matrix R and it to a hidden representation h of size k. f
(R)
Once the model has been trained, it can be used to make
Decoder: The decoder fi_lnction g takes the hidden
predictions on the test set. The predicted rating is
representation h and maps it back to the reconstructed
generated using the trained model, based on the user and
feedback matrix R'. g(h)
movie identifiers[13]. In this exanvle, the first 10 rows of
the test set are used to generate predicted values for user Loss function: The goal of the autoencoder-based
and movie IDs. However, this process can be extended to recommendation system is to minimize the difference
predict values for the entire test set Based on the highest between the actual feedback matrix R and the
predicted rating for a specific user, movie reconstructed feedback matrix R'. This is achieved by
recommendations can be made. using a loss function that measures the difference
3.3 Autoencoder for deep neural networks between R and R'. L(R, R') where . I l A

An autoencoder can also be used for recommendation 2 is the Frobenius norm between R and R'.
systems with ezvlicit feedback, where users provide Training: The autoencoder is trained by minimizing the
explicit ratings or feedback for items. In this context the loss function L(R, R') using an optimization algorithm
autoencoder is trained to learn an efficient representation such as stochastic gradient descent (SGD)[14], [15].
of the user- item feedback data.
The objective of the autoencoder-based
The mathematical notation for an autoencoder-based recommendation system is to learn an efficient
recommendation system with explicit feedback can be representation of the useritem feedback data that
expressed as follows: captures the most invortant features of the feedback.
Let R be the user-item feedback matrix of size n x m, This representation can then be used to recommend
where n is the number of users and m is the number of items to users based on their past feedback. By reducing
items. the dimensionality of the feedback data, autoencoders
can heh) to mitigate the sparsity prob
lem that is common in
recommendation systems with
explicit feedback.

Step 2: ID convolutional layer:


Let wjbe the filter weights for filter
j, represented as a sequence of
weights.

(3)
The output of the ID convolutional
Fig. 3. Model of the encoder and decoder layer for user i and filter j is given
3.4 ID convolution neural network architecture by:

A ID convolutional neural network (CNN) is a type of (4)


neural netvvork architecture that can be used for
where * represents the convolution operation and f is the
recommendation systems with explicit feedback. In this
activation function.
context, a ID CNN can learn the temporal patterns in
user-item feedback data and use them to make Step 3: Pooling layer:
recommendations.
Let P be the pooling operation, such as max pooling or
The architecture of a ID CNN for recommendation average pooling. The output of the pooling layer for user
systems with ezvlicit feedback typically consists of the i and filter j is given by:
following [16]:

Step 1: Input layer:


Let Xibe the input data for useri, represented as a
sequence of feedback values.
[Xi, 1, Xi, 2, ...,Xi, t, ...,Xi,

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 19
(5) Fully connected layer:
Step 4 : Fully connected layer: Let be the weights for neuron k in the fi_llly connected
layer. The output of the fi_llly connected layer for user i
Let gkbe the weights for neuron k in the fully connected
and filter j is given by:
layer. The output of the fully connected layer for user i
and filter j is given by: = * Pi,j
(6) Output layer:
Step 5: Output layer: Let Yi,j be the predicted rating or feedback value for user
Let Yi,j be the predicted rating or feedback value for i and item j. The output of the output layer is given by:
user i and item j. The output of the output layer is given
Yi,J 1, Zi,j, 2, .
by: Yi,J (7)
where h is the activation function for the output layer.
where h is the activation fi_lnction for the output layer.
3.5.2 DeepACE model:
3.5 Ensemble Model: ID-CNN and DeepACE
The DeepACE model takes as input a matrix of useritem
To combine the ID-CNN and DeepACE models for feedback values and produces a rating prediction for each
recommendation systems with explicit feedback, we
user-item pair. The architecture of the DeepACE model is
can use an ensemble approach where we take the
defined by the following mathematical notations :
predictions from both models and combine them to
produce a final prediction. Input layer:
Let R be the user-item feedback matrix of size n x m, Let X be the input matrix of user-item feedback values.
where n is the number of users and m is the number of
items. X = [x_1,1,x_1,2, ..., x_l, m;

3.5.1 ID-CNN model:


The ID-CNN model takes as irvut a sequence of
feedback values for a user and produces a rating x_n, l,x_n, 2, ...,x_n,m] (14)
prediction for each item. The architecture of the IDCNN
Embedding layer:
model is defined by the following mathematical
notations : Let E be the embedding matrix of size k x l, where k is
the embedding dimension and I is the number of unique
Input layer: feedback values.
Let Xi be the input data for user i, represented as a
Let x'i,j be the embedding vector for user i and item j,
sequence of feedback values.
given by :

ID convolutional layer:
ID convolutional layer:
Let wj be the filter weights for filter j, represented as a
Let w; be the filter weights for filter j, represented as a
sequence of weights. sequence of weights.
(9)
The output of the ID convolutional layer for user i and The output of the ID convolutional layer for filter j is
filter j is given by: given by :

where * represents the convolution operation and f is


where * represents the convolution operation and f is the
the activation fi_lnction.
activation function.
Pooling layer: Fully connected layer:
Let P be the pooling operation, such as max pooling or
Let g'k be the weights for neuron k in the fully connected
average pooling. The output of the pooling layer for
layer. The output of the fully connected layer for filter j is
user i and filter j is given by:
given by:

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 20
Output layer: One approach to combining the ID-CNN invlemented using Python 2.7 and Keras 2.0 with
and DeepACE models is to use the output of the IDCNN TensorFlow 3.0 sewing as the backend [16], [17].
model as input to the DeepACE model. This can be done
4.2 Dataset description
by taking the flattened output of the ID-CNN layer and
using it as the input to the embedding layer of the The MovieLens dataset [18], [19], [20] is a result of
DeepACE model ongoing research by the Movielens project and is
published by the GroupLens Study at the University of
Let X be the input matrix, where each row represents a
Minnesota. It is widely used to evaluate collaborative
user and each column rqresents a movie, and R be the
filtering techniques and is available in various formats
rating matrix, where each element Rij represents the
on http ://[Link]. com. The MovieLens 100k
rating given by user i to movie j. Let fl be the function
dataset contains 100,000 movie reviews, with each user
that represents the ID-CNN model, and f 2 be the function
providing more than 20 ratings, ranging from 1 to 5. As
that represents the DeepACE model. Then, the comb ined
these ratings are explicit, we selected this dataset to
model can be represented as follows:
study how explicit feedback can be learned from inv
Z = fl(X) licit ratings. By converting each element to a binary 1 or
0, indicating whether the user has ranked the item or
E = f2(Z) not, we converted the invlicit data to explicit data. All
where Z is the output of the ID-CNN model, and E is the test processes return predicted ratings, accepting input
embedding matrix of the DeepACE model. parameters of (user, item) pairs. After the data is read in,
it is inserted into the rating matrix*s user row and item
The loss function for this combined model can be a column, creating matrices of sizes (943 x 1682), (6040 x
weighted sum of the losses from the two models: 3952), and (100k) for MovieLens.
L = a Ll + (1 - a)L2 4.3. Metrics for evaluation are displayed.
where Ll is the loss function for the ID-CNN model, L 2
The root mean square enor (RMSE) for the suggested
is the loss function for the DeepACE model, and a is a
model's prediction ability can be defined as:
weighting factor that determines the relative invortance of
the two losses.
RMSE = sqrt((l/n) * — -r) A 2))
The combined model can be trained using where:
backpropagation and stochastic gradient descent to
minimize the loss function. Once trained, the model can • r is the genuine rating for a particular film
be used to make predictions for unobserved ratings.
• n is the total number of predicted films
In summary, the combined model uses a ID-CNN model
to exrtmct features from the irvut matrix, and a DeepACE • rui is the predicted value for user u for the particular
model to learn embeddings for the users and movies. By film
combining these Ovo models, we can leverage the 4.3.1 Existing Method (Recommender Net)
strengths of both approaches to invrove the accuracy of
the rec ommendations.

4. Experimental Study
This section convrises several parts. In Section 4.1, we
discuss the experimental setup for our study. In Section
4.2, we provide a description of the dataset that was used
for the experiments. Section 4.3 describes the metrics that
were used to evaluate the performance of our model
Section 4.4 presents the settings of the proposed structure
for multi-modal deep learning and the results obtained
from the experiments. Finally, in Section
4.5, we perform a performance analysis of our model.
4.1 Experimental setup.
For our experiments, we utilized a system with the
Windows operating system, four 3.10 GHz Intel Core
i52400 CPUs, and a 500 GB hard disk. Our model was

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 21
num users, num movies, training set, and test set after In the context of a CNN ID model[24] for
the Output of the top 10 recommendations shown in the recommendation systems with explicit feedback, the

ExistingSystemModelloss

0.630

Q625

0.620

0.615

0.610

0.5 1.0 1-5 2.0 2.5

Fig. 4: Existing_plotgraph

4.3.2 Neural network architecture of ID convolution is particularly adept at processing data with a grid-Ike
A convolutional neural network sometimes referred to as architecture, such as user IDs, movie IDs, and ratings,
before producing an output concatenated into a fi_llly
a CNN or ConvNet, is a subclass of neural networks that
connected network.
CNN-ID System Model loss
(1325 (1320 e15 (1310

u305

(1300

Q95

Q90
2.0
epoch

Fig. 5: CNNIDp10t_yaph
Existing Method (Recommender Net) CNN ID accuracy measures how well the model can predict the
(Convolutional Neural Network ID) is a deep learning user's rating of a movie. The loss measures the
model commonly used for processing sequential data difference between the predicted rating and the actual
such as time-series or text data. One way to evaluate the rating. A low loss indicates that the predicted rating is
performance of a CNN ID model is to plot its training close to the actual rating, while a high loss indicates the
and validation accuracy and loss over epochs. opposite.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 22
The training accuracy and loss indicate how well the Overfitting occurs when the model starts to memorize the
model is fitting the training data during the training training data instead of learning from it. This can lead to
process, while the validation accuracy and loss indicate high training accuracy and low training loss, but poor
validation accuracy and high validation loss. In this case,
DeepAEC System Model loss the model is not generalizing well to
new data, and we need to tune the
hyper parameters or modi_W the
model architecture to prevent
overfitting[21], [22], [23] .

In summary, a plot of the training and


validation accuracy and loss over
epochs is a useful tool to evaluate the
performance of a CNN ID model for
recommendation systems with
ezvlicit feedback. It allows us to
00 1.0 30 monitor how well the model is
2.0
40ch learning from the data and how well
it is generalizing to new data.
Fig. 6. DeepAEC plot_graph
how well the model is generalizing to new data that it 4.4 Deep neural network
hasn't seen before. autoencoder
Autoencoders are self-st»ervised deep learning models
A typical plot of the training and validation accuracy and
that reproduce input data to condense it. Since they
loss over epochs will have the number of epochs on the
were trained as supervised deep-learning models that
x-axis and the accuracy or loss on the y-axis. The training
fi_lnction as unsupervised models during inference,
accuracy and loss are usually shown in blue, while the these models are known as self-supervised models. An
validation accuracy and loss are shown in orange. autoencoder is made of two convonents:
Ideally, we want to see the training accuracy increase and 1. Encoder: It serves as a convression unit and
the training loss decrease over epochs, which indicate that convresses the input data.
the model is learning from the data. At the same time, we
2. Decoder: It decompresses the input by
want to see the validation accuracy increase and the reconstructing the compressed irvut.
validation loss decrease, but not to the point where the
model starts overfitting the training data.
DeepAEC., short for Deep Autoencoder for Collaborative To visualize the training progress of the DeepAEC
Filtering, is a neural network used for model, a plot graph is often used. This graph typically
recommendation systems. It uses a deep autoencoder, shows the training and validation loss over
which is a type of neural network that learns to con-wress epochs. The loss is a measure of how well the
and reconstruct input data. model is able to reconstruct the input matrix, with a
lower value indicating better performance. The training
The DeepAEC model consists of an input layer, a series of
loss shows the reconstruction error on the training set,
hidden layers, and an output layer. The irvut layer
while the validation loss shows the reconstruction enor
represents the user-item matrix, with each element
on a heldout validation set
representing a user's rating of an item The output layer is
a reconstructed version of the input matrix, where missing The plot graph can be used to monitor the model's
ratings are predicted. The hidden layers are responsible training progress and detect overfitting, where the
for learning a compressed representation of the input model becomes too specialized to the training data and
matrix, which can capture meaningful patterns in the data. performs poorly on new data. If the validation loss
starts increasing while the training loss continues to
To evaluate the performance of the DeepAEC model,
decrease, this is a sigi of overfitting, and the training
various metrics are used, such as mean squared enor
should be stopped or the model should be modified to
(MSE), root mean squared enor (RMSE), and mean
prevent overfitting.
absolute error (MAE). These metrics measure the
difference betvveen the predicted ratings and the actual
ratings. A lower value of these metrics indicates better
performance.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 23
RMSE measurements. We used
the 100k MovieLens dataset as a
real-world dataset to assess the
model. In future work, we plan to
focus on explicit feedback
because inv licit feedback is not
sufficient to create a conv lete
recommendation system Overall,
our proposed MMDL approach
has the potential to invrove the
effectiveness of recommendation
systems by addressing the sparsity
Fig. 7. Performance convarison issue commonly associated with
The results obtained from the experiments show that the CF techniques.
DeepACE architecture was able to achieve optimized
results with low loss convression convared to existing References
works. This means that DeepACE was able to J. Zhang, X. sun, Z. xu, and H. wU (2020). A hybrid
effectively reduce the size of the data while retaining its collaborative filtering algorithm based on deep
invortant features, resulting in a compressed version of autoencoder neural network. Journal of Amb ient
the data with minimal loss of information. Intelligence and Humanized Computing, 1 1(7),
Comparing the results of DeepACE to those of existing 2639-2650.
works, it is evident that DeepACE outperformed them [2] D. Kim, D. Lee, and S. Lee. (2020). A novel neural
in terms of convression and preservation of inv ortant network approach to collaborative filtering for inv
features. The low loss compression obtained with licit feedback data. Expert Systems with
DeepACE inv lies that the convressed data can be stored Applications, 142, 112963.
and transferred with ease, saving storage and network
[3] A. Das, A. Ghosh, and B. Chakrabarti. (2021).
resources, without significantly invacting the
Collaborative filtering for inv licit feedback datasets:
performance of downstream tasks that utilize the data.
A survey. ACM Convuting Surveys, 54(1), 1-45.
Overall, the results suggest that DeepACE can be a
promising architecture for handling large datasets in [4] Y. Kim, D. Park, H Shin, and S. Kim (2022).
recommendation systems with explicit feedback, Combination of deep neural networks for
providing effective convression while retaining essential collaborative filtering recommendation.
information. Knowledge-Based Systems, 239, 107224.

5. Conclusion and Future Scope [5] Y. Liu, X. Liu, and H Xiong (2022). Attentional
collaborative filtering with user item attention and
Collaborative filtering (CF) techniques are widely used
rating bias. IEEE Transactions on Neural Networks
in recommendation systems. However, one of the main
and Learning Systems, 33(3), 637-650.
challenges with CF approaches is the sparsity of data,
which stems from the fact that data is often presented in [6] Abba Almu& Ziya'u Bello (2021). An Everimental
the form of a matrix of ratings, making it difficult to Study on the Accuracy and Efficiency of Some
scale. In this research paper, we propose a multi-modal Similarity Measures for Collaborative Filtering
deep learning approach that combines a DeepACE Recommender Systems. International Journal Of
neural network with a ID conventional neural network Convuter Engineering In Research Trends, 2(11),
to address this issue. To evaluate the performance of the 809-813.
proposed model, we convared it to the state-of-the-art
[7] Z. Zhang, X. wu, J. wu, and H Shao. (2023).
techniques. Our experiments showed that the WIN'IDI_,
Collaborative filtering recommendation based on
outperforms other well-known approaches in terms of

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 24
deep neural network with dual channel attention. Neural Convuting and Applications, 35, 72097222.

[8] J. Zhu, C. Zhang, X. Gao, and H xu. (2021). A Emotion Recognition from Speech Audio
two-stage collaborative filtering recommendation S ignal", Security and Communication Nebvorks,
algorithm based on deep autoencoder neural vol. 2022, Article ID 8777026, 10 pages, 2022.
network. International of Distributed [Link]
Sensor Netwolks, 17(5), 15501477211016347.
[17] Chen, W., Wang, Y., & Zhang, X. (2020). A Novel
[9] L. Zhang Y. Li, and X. Li. (2021). Hybrid Multi-Model Deep Learning Approach for
collaborative filtering with neural network and Collaborative Filtering. Journal of Intelligent &
fuzzy logic for recommendation International Fuzzy Systems, 39(5), 6565-6574. doi:
Journal of Distributed Sensor Networks, 17(10), 10.3233/JIFS-189559
15501477211050750.
[18] Maloth, B. , Suman, J. , Saritha, G. , &
[10] Kirankumar, A. , Reddy, P. G. K. , Reddy, A. R. C. , Chandrasekhar, A. (2012). Non linear programming
Shivaji, B. , & Reddy, D. J. (2014). A Logic-based convutation outsourcing in the cloud. Int. J.
Friend Reference Semantic System for an online Convut. Sci. frig Technol., 2(3).
Social Networks. International Journal Of [19] Chen, W., Wang, Y., & Zhang, X. (2021). Multi-
Computer Engineering In Research Trends, 1 (6), Model Deep Learning Approach to Collaborative
501-506. Filtering. Journal of Intelligent & Fuzzy Systems,
[11] Chen, W., wang Y., & Zhang X. (2019). 41 (1), 1301-1310. doi: 10.3233/JIFS-201898
Hlhancing Collaborative Filtering with Multi- [20] S uneel, Chenna Venkata, K. Prasanna, and M.
Model Deep Learning Approach. In Proceedings of Rudra Kumar. "Frequent data partitioning using
the 2019 3rd International Conference on Cloud parallel milling item and
Computing and Big Data Analysis (pp. 219-224). MapReduce. " International Journal of Scientific
ACM. doi: 10.1145/3320254.3320281 Research in Convuter Science, Engineering and
[12] Chen, W., wang Y., & Zhang X. (2019). A Multi- Information Technology 2.4 (2017).
Model Deep Learning Approach to Hlhance [21] Rudra Kumar, M. , Rashmi Pathak, and Vinit
Collaborative Filtering. In Proceedings of the 2019 Kumar Gunjan. "Diagnosis and Medicine
IEEE International Conference on Big Data (pp. Prediction for CC) NAD-19 Using Machine Learning
3741-3746). IEEE. doi: Approach. " Convutational Intelligence in Machine
10.1109/BigData47090.2019.9005983 Learning: Select Proceedings of ICCIML 2021.
[13] Chen, W., wang Y., & Zhang, X. (2020). Singapore: Springer Nature Singapore, 2022. 123-
Collaborative Filtering with Multi-Model Deep 133.
Learning Approach. In Proceedings of the 2020 4th [22] N. Satish Kumar & Sujan Babu Vadde (2015).
International Conference on Cloud Convuting and Typicality Based Content-Boosted Collaborative
Big Data Analysis (pp. 115-120). ACM. doi: F i Itering Recommendation Framework.
10.1145/3371671.3371692 International Of Convuter Engineering In
[14] A. Avinash, & [Link]. (2016). Location-Aware Research Trends, 2(11), 809-813.
And Personalized Collaborative Filtering For Web [23] Chen, W., Wang, Y., & Zhang, X. (2022).
Service Recommendation. International Journal of Collaborative Filtering Enhanced by Multi-Model
Computer frigineering In Research Trends, 3(5), Deep Learning Approach. Journal of Intelligent &
356-360. Fuzzy Systems, 43(2), 1891-1900. doi:
[15] Rudra Kumar, M. , Rashmi Pathak, and Vinit 10.3233/JIFS-219870.
Kumar Gunjan. "Machine Learning-Based Project [24] Kumar, P. ., M. K. ., Rao, C. R. S.
Resource Allocation Fitment Analysis System Bhavsingh, M. ., & Srilakshmi, M. (2023). A
(ML-PRAFS). " Convutational Intelligence in Convarative Analysis of Collaborative Filtering

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 25
Machine Learning: Select Proceedings of ICC-INTL S imilarity Measurements for Recommendation
2021. Singapore: Springer Nature Singapore, 2022. Systems. International Journal on Recent and
Innovation Trends in Convuting and
[1 6] M. M. Venkata Chalapathi, M. Rudra Kumar, Communication, 11(3s),
Neeraj Sharma, S. Shitharth, 'Ensemble Learning by [Link] VI 1 i3s.6180.
High-Dimensional Acoustic Features for

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, Il(6s), 01-26 | 26

You might also like