0% found this document useful (0 votes)
28 views12 pages

Geez Digit Recognition via CNN Deep Learning

This research article presents a deep learning model for recognizing handwritten Geez digits, utilizing a convolutional neural network (CNN) trained on a dataset of 51,952 images from 524 individuals. The model achieved an accuracy of 96.21%, significantly improving upon previous attempts in the field. The study highlights the importance of digitizing historical documents written in the Geez script, which is essential for preserving Ethiopia's cultural heritage.

Uploaded by

Esubalew Chekol
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views12 pages

Geez Digit Recognition via CNN Deep Learning

This research article presents a deep learning model for recognizing handwritten Geez digits, utilizing a convolutional neural network (CNN) trained on a dataset of 51,952 images from 524 individuals. The model achieved an accuracy of 96.21%, significantly improving upon previous attempts in the field. The study highlights the importance of digitizing historical documents written in the Geez script, which is essential for preserving Ethiopia's cultural heritage.

Uploaded by

Esubalew Chekol
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Hindawi

Applied Computational Intelligence and So Computing


Volume 2022, Article ID 8515810, 12 pages
[Link]

Research Article
Handwritten Geez Digit Recognition Using Deep Learning

Mukerem Ali Nur, Mesfn Abebe, and Rajesh Sharma Rajendran


Adama Science and Technology University, Adama, Ethiopia

Correspondence should be addressed to Rajesh Sharma Rajendran; sharmaphd10@[Link]

Received 29 June 2022; Revised 26 September 2022; Accepted 5 October 2022; Published 8 November 2022

Academic Editor: Abidhan Bardhan

Copyright © 2022 Mukerem Ali Nur et al. Tis is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.

Amharic language is the second most spoken language in the Semitic family after Arabic. In Ethiopia and neighboring countries
more than 100 million people speak the Amharic language. Tere are many historical documents that are written using the Geez
script. Digitizing historical handwritten documents and recognizing handwritten characters is essential to preserving valuable
documents. Handwritten digit recognition is one of the tasks of digitizing handwritten documents from diferent sources.
Currently, handwritten Geez digit recognition researches are very few, and there is no available organized dataset for the public
researchers. Convolutional neural network (CNN) is preferable for pattern recognition like in handwritten document recognition
by extracting a feature from diferent styles of writing. In this work, the proposed model is to recognize Geez digits using CNN.
Deep neural networks, which have recently shown exceptional performance in numerous pattern recognition and machine
learning applications, are used to recognize handwritten Geez digits, but this has not been attempted for Ethiopic scripts. Our
dataset, which contains 51,952 images of handwritten Geez digits collected from 524 individuals, is used to train and evaluate the
CNN model. Te application of the CNN improves the performance of several machine-learning classifcation methods sig-
nifcantly. Our proposed CNN model has an accuracy of 96.21% and a loss of 0.2013. In comparison to earlier research works on
Geez handwritten digit recognition, the study was able to attain higher recognition accuracy using the developed CNN model.

1. Introduction Geez script consists of 265 characters including 27 la-


bialized characters (characters representing 2 sounds), 20
Amharic language is the only African language with its own symbols for numerals, and 8 punctuation marks [2]. Our
alphabet and writing system while most of the other African research focused on only the Geez digits. Geez numerals
languages use Latin and Arabic alphabets for their own have been used in Ethiopian calendars, Geez Bibles, and
writing system [1]. Te Federal Democratic Republic of historical documents. Geez numbers consist of twenty dif-
Ethiopia and other regional states use the Amharic language ferent symbols to represent the numerical values. Unlike
as their ofcial working language. It is the mother language Latin numbers, 0 is not represented by any symbol. Twenty
for over 50 million people and the second language for over numbers are represented by independent symbols such as
100 million people in Ethiopia [1]. Arabic is the only Semitic 1–9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, and 10000 as shown
language spoken more than Amharic in the world. Amharic in Figure 1. Other numbers are represented by the com-
is also spoken by some people in neighboring countries like bination of those twenty symbols. Each digit symbol has a
Eritrea, Djibouti, and Somalia. Tere are many historical dash (horizontal line) above and below the digit character.
documents written in Geez scripts found in Ethiopia. Tere Handwritten character and digit recognition works are
are around 80 diferent languages spoken in Ethiopia, with done in diferent languages to improve the efciency of the
up to 200 dialects. Te Geez alphabet is used as the writing recognition when they digitize historical and handwritten
system in some languages. Amharic, Geez, and Tigrinya are documents [4]. Digit recognition is a well-known problem
the most spoken languages in Ethiopia that use the Geez that has been used to document indexing using dates such as
alphabet [1]. document date, birth date, marriage date, and death date [5].
2 Applied Computational Intelligence and Soft Computing

Figure 1: Geez number representations [3].

Digit recognition and detection have been utilized in a digit dataset for the public known as DIDA. Te data were
variety of applications, including automated the reading of collected from the Swedish handwritten historical docu-
the number of bank cheques, postal numbers and codes, tax ments written by diferent priests in the nineteenth century.
forms, and document indexing based on dates [6]. Tere are Te dataset consists of 100,000 handwritten digit images.
two types of architectures for handwritten digit string Te DIGITNET consists of two diferent architectures to
recognition. Te two strategies for recognizing the digit detect a digit and recognize the digit. Te frst architecture is
string are detection-free and segmentation-based recogni- DIGITNET-dect which detects the digit strings from
tion [7]. In segmentation based on the system, we frst detect handwritten documents and the second architecture is
the numerical string that may contain multiple digits. DIGITNET-rec which recognizes the handwritten digit. Te
Splitting digits should be done before a recognition to isolate authors used a deep learning approach to train both models
each digit [8, 9]. However, detection-free recognition ap- and used regression-based deep CNN methods to detect the
proach recognizes each digit without any splitting and de- digit. YOLOv3 was designed by the authors to detect and
tection preprocesses [10]. classify a digit from an image. In the recognition phase, three
Random Forest, SVM, KNN, and other machine learning diferent CNN architectures were proposed by the authors.
techniques have been developed to recognize handwritten Convolutional, batch normalization, max-pooling, fully-
digits. Deep learning methods like CNN have the highest connected layers, and SoftMax layers are all included in each
accuracy when compared to the most commonly used proposed model. But still, it has a limitation of some of the
machine learning algorithms for handwritten digit recog- image data having high resolution, so it increases the
nition [11, 12]. Pattern recognition and large-scale image computational cost in the training of the model and some
classifcation are both done with CNN. Handwriting char- digits are not labeled due to their bad appearance. Low digit
acter recognition is a research feld in computer vision, detection accuracy because of negative sampling is also a
artifcial intelligence, and pattern recognition [1]. It might be limitation of the research work.
claimed that a computer application that conducts hand- Chen et al. [16] compared fve machine learning clas-
writing recognition has the capacity to acquire and recognize sifcation models to recognize handwritten digits ofine. Te
characters in photographs, paper documents, and other authors compared the performance of the KNN, neural
sources, and convert them to electronic or machine-encoded network, random forest, decision tree, and bagging with
form. Deep learning is a popular feld of machine learning gradient boost. 70,000 digit images are used to develop the
that uses hierarchical structures to learn high-level ab- classifer models. Te KNN and neural network show better
stractions from data. According to references [13, 14], the accuracy than other classifers and KNN achieves 10 times
availability of technology CPUs, GPUs, and hard drives, faster speed than the neural network model. Te pre-
among other things, machine learning algorithms, and large processing stage is the crucial part of the recognition system
data, such as MNIST handwritten digit data sets and in handwritten recognition. Te authors used some pre-
ImageNet data, are all factors in deep learning’s success. processing techniques to enhance the data. Tey used
Handwritten digit recognition, facial recognition, computer normalization to give equal weight to each attribute. Ten,
vision, audio and visual signal analysis, voice recognition, they used a median flter for the noise reduction step. Image
disaster recognition, and automated language processing are sharpening and image attribute reduction are the other steps
all areas where deep learning is applied [15]. in the preprocessing phase, but still, it has some limitations
Nowadays, deep learning is becoming a popular tech- from those, the bewilder tool is not efective to preprocess
nique to learn to recognize patterns and deep patterns and handwritten image data and they did not fnd a threshold
extract. It has a deep learning level to generate patterns from value for the binarization preprocess technique; then, they
a given dataset. It is an amazing algorithm with diverse ignore binarization technique. Te image is blurred after
libraries to extract patterns and recognize from images and median flter and sharpening in preprocessing techniques.
classify them. Among the deep learning algorithms, the Beyene [3] proposed a multilayered feed-forward
CNN is efcient and has good image classifcation, image propagation ANN for ofine handwritten and machine-
recognition, pattern recognition, feature extraction, and so printed Amharic (Geez) number recognition. Te author
on. collected only 560 datasets for the model. He used 460 for the
training and 100 for the test data. Te author collected the
2. Related Works data manually because there is no public data for Geez
handwritten digits. Te overall classifcation accuracy is
Kusetogullari et al. [5] introduced a deep learning archi- 89.88%, which is poor because he used a very small amount
tecture known as DIGITNET to detect and recognize English of the data to develop his model [3]. Many researchhas been
handwritten digits that are found in historical documents in experimented in the specifc area of handwritten digits of
Sweden. Te authors also created a large-scale handwritten ancient Semitic language (Geez). Some other researchers
Applied Computational Intelligence and Soft Computing 3

have done for all Geez character recognition but the author feature classifcation. Tis unit generates or classifes the
[3] did his research specifcally on Geez digits. But still, it has predicted output. Te authors used the MNIST dataset for
some limitations from those, a small number of data are used their work. 60,000 handwritten digit images were used for
to train the algorithm, the work does not give any infor- training and testing the model. But it has some limitations
mation about the preprocessing technique, and the accuracy from those, the proposed model used a large kernel size in
of the proposed model is low to recognize the digit. Hossain the convolution layer, and because of that, it consumes a
and Ali [17] proposed a handwritten digit recognition using longer training time. Also, the work does not give any
a CNN on MNIST handwritten datasets. Te authors used detailed information about the preprocessing technique.
MatConvoNet to increase the speed of the operation of Most of the researchers did digit recognition on English
building the proposed model. MatConvoNet is a MATLAB numbers. Tey achieved high performance using diferent
function that supports an efcient computation on CPU and methods to recognize handwritten digits. For English hand-
GPU allowing the training of complex models on large written digits, there are many resources and datasets ready to
datasets such as Image Net ILSVRC. However, it has some be used by the research community. It encourages the re-
limitations such as the research does not give any infor- searchers to focus on that area. However, for Geez handwritten
mation about the preprocessing technique and the number digits, there are no organized data in public for researchers to
of hidden convolution layers is small in the proposed model. work on recognition of handwritten digits. Some researchers
Demilew and Sekeroglu [1] proposed an ancient Geez did Geez character recognition for machine-printed and
script recognition model by using deep learning. Te authors handwritten characters but they did not focus on digits, es-
developed a deep CNN model to recognize Ethiopian an- pecially for handwritten. Te author of [3] is the frst researcher
cient Geez characters found in historical documents. Tey to work on recognizing handwritten Geez digits, but the dataset
proposed an architecture that only recognizes Geez char- he used was a very small and low performance made.
acters and not words or full sentences. Te dataset is a total
of 22,913 images collected from libraries, private books, and 3. Data Collection Method
the Ethiopian Orthodox Tewahedo Church. Tey also de-
veloped a recognition system to recognize twenty-six base For this study, handwritten data were collected from a
characters only. In Geez scripts, there are around 265 variety of people with various writing styles. Instead of
characters and 34 base characters, but they classifed each manual feature extraction, which is difcult for humans to
character to its base character class, not to its specifc do, deep learning models are utilized, which are life-sim-
character. Tere are 7 characters found in each base class plifying and efcient techniques to extract with high ac-
including the base class. One of the challenges in recognizing curacy, and performance. A data-gathering paper was
handwritten Geez script is the similarity between the created for this purpose. Te data gathering paper is pre-
characters which are found in the same base class. Te pared in a way to make the pre-processing easier. Te paper
authors classifed all of the seven characters found in the is A4 size which consists of the symbol of all 20 Geez
same class into one base class and ignored the difcult task in numbers, in 2 rows and 10 columns in a box, and other
their model, but still, it has the problem of low image quality, same-sized empty boxes prepared and repeated 5 times as
the number of instances is not balanced for each character. shown in Figure 2. Tis means an individual has to hand-
Also, the research work does not mention the methods that write 100 instances or digits. Te data were collected from
are used for character detection. Te proposed model 524 diferent individuals and each person gave 100 instances
classifed all of the seven characters found in the same class of digits. According to calculations, since the collected data
into one base class; this is the other limitation. are from 524 diferent individuals, 52,400 instances are
Gondere et al. [2] designed a handwritten Geez character obtained. People from many demographic groups partici-
recognition system using a CNN. Te authors used multitask pated in the data collection. Te data were gathered from
learning to enhance the model from the relationships of the elementary pupils, high school students, high school staf
characters. Tey ran the experiment by some hyper-pa- members, university students, and university academic staf
rameters of a CNN. Te parameters are 100 batches in size, (lecturers). Te majority of information was acquired from
0.3 keeping probability for dropout, 0.0001 learning rate, and university students, which totaled roughly to 250 at Adama
0.01 L2 regularization. Tey organized a dataset from dif- Science and Technology University.
ferent previous research works. But still, it has some Te data collection in the university was successfully
problems in the research work. Te frst one is they used a conducted with the help of Computer science and Engi-
unique handwritten dataset that afected the performance of neering Club ASTU (CSEC-ASTU) members. Te club had
the models and the work does not mention the pre- 100 members at the time of data collection; thus, the data
processing technique. Ali et al. [18] proposed a model to were gathered from them and through their connections on
recognize a handwritten digit. Te authors used a CNN the campus. As mentioned earlier, data were obtained from
algorithm to develop the model. Tey used deeplearning4j 250 university students, 150 of whom are male and 100 of
with a CNN for the recognition system. Te CNN is whom are female. After collecting the data, it must be
composed of two main tasks. Te frst task is to extract a converted from paper to digital format before it can be
feature from each layer. Each layer takes input from the processed. Te documents were scanned using a TECNO
output of the previous layer and forwards the current output mobile with a 50 Mega Pixel camera and a software app
to the next layer. Te second task of the CNN architecture is called cam scanner for this process. Te advantage of using a
4 Applied Computational Intelligence and Soft Computing

Figure 2: Sample handwritten Geez digit data.

cam scanner is that it detects the paper and provides only the is used to defne the amount of brightness (white). All the
digital format (in image format) of the paper part after original images in the dataset are in RGB color format.
removing the background, reducing noise. Converting the RGB to grayscale, reduce the color channel,
Python’s OpenCV library was used for data extraction and it reduces the computational complexity compared with
during the pre-processing technique. Tis program’s input is RGB color images. In our proposed model the input images
one partition, and its output is the extracted data. Once are grayscale so, the original images should be converted to
prepared for one partition, the same would go for others. the grayscale color format.

4. Data Preprocessing
4.3. Color Inversion. Te dominant color of the original
Te second phase of the proposed model is the preprocessing image is white, which has a value of 255. For grayscale image,
phase that occurs after the digital image has been made. Te dataset models changing the dominant color to black is
digitized image is frst checked for skewing before being preferable to reduce the complexity of the mathematical
preprocessed to reduce noise. Preprocessing is necessary for operations. Because black color has 0 values, a convolution
creating data that are simple to recognize using handwritten operation with the dominant part with a 0 value is reducing
digit recognition systems, and the goal is to reduce back- the computational complexity of the model. Figure 3 shows
ground noise, enhance the image’s region of interest, and the preprocessing techniques used in our dataset. As shown
produce a clear distinction between foreground and back- in Figure 3(d), the dominant part of the image is the
ground. Te study use the Python OpenCV library for the background of the image. In the color inversion technique,
preprocessing technique. the background is converted from white to black color as
shown in Figure 4.
4.1. Resize Image. Because the data are available in a range of
sizes, it must be resized to ft the network’s input size. All 5. Proposed Model
images are resized to 32 × 32 pixels in this work. Tis scaling
is important for reducing computational complexity and for Te convolutional neural network (CNN) is the proposed
concentrating on the region of interest by cropping it. model to address the Geez handwritten digit recognition. To
recognize the digits, a CNN-based digit classifer is used. Six
diferent CNN-based handwritten digit classifers consist of
4.2. RGB to Grayscale Conversion. Te simplest color model a number of layers such as a convolutional layer, max-
is grayscale, which specifes colors using only one compo- pooling layer, dropout layer, fatten layer, fully-connected
nent: lightness. A value ranging from 0 (black) to 255 (white) layers, and SoftMax layer to achieve high recognition
Applied Computational Intelligence and Soft Computing 5

1 2 3

(a) (b) (c) (d)


Figure 3: Geez handwritten digit image after some preprocessing techniques. (a) Original image (b) Resize image. (c) Grayscale image. (d)
Black background image.

Figure 4: Sample images of the Geez handwritten digit dataset.

accuracy. Furthermore, the training was performed by ap- epoch size is 30 and the total number of training instances in
plying the backpropagation approach of stochastic gradient a single batch is 32. Te other fve classifers have varying
descent. numbers of convolutional and fully connected layers, as well
Finally, based on the evaluation metric, choose the best as diferent layer organizations. Te frst fully connected
model for recognizing digit strings. Each classifer is con- layer contains 128 neurons and the second contains 20
structed with a diferent number of convolutional layers, neurons for all cases.
kernel sizes, and flters. Te parameters applied in all six
classifers are summarized in Table 1. Model 6, for example, 6. Result and Discussion
shown in Figure 5 has 8 convolutional layers, 4 max-pooling
layers, 3 dropout layers, 2 fully connected layers, and 20 Te CNN is used to observe and see the diferences of the
output layers. Te kernel size, stride, and number of flters in accuracies among diferent results from the handwritten
the frst convolutional layer are 3 × 3, 1, and 32 (3 × 3@1@ Geez digit models. Training and validation accuracy were
32), respectively. Te second and third convolution layers measured for 30 diferent epochs by changing out hidden
are similar to the frst. After three convolution layers, the layers for various combinations of convolution layers and
max-pooling layer (2 × 2@2@32) is applied. Te convolu- using batch size 32 in all cases. Figures 6, 7, 8, 9, 10, and 11
tional layer (3 × 3@1@64) is used in the ffth layer, and it illustrate the accuracy of the CNN, and Figures 12, 13, 14, 15,
consists of 64 flters with a kernel size of 3 3 and a stride of 1. 16, and 17 show the loss of the CNN with various convo-
Te following two layers are convolution layers, with the lution and hidden layer combinations. Table 1 shows the
same hyperparameter as the ffth layer. Te max-pooling maximum and minimum training and validation accuracies
layer (2 × 2@2@64) is applied in the eighth layer. After the of the CNN determined after experiments for six diferent
max-pooling layer, dropout is applied. Te convolutional cases with diferent hidden layers, and Table 2 shows the
layer (3 × 3@1@64) is applied next, which consists of 64 maximum and minimum training and validation loss of the
flters with a kernel size of 3 3 and a stride of 1. Te max- CNN in various cases for the recognition of Geez hand-
pooling layer (2 × 2@2@64) is the next hidden layer. written digits.
After the max-pooling layer, the dropout is applied. Te Table 3 describes the CNN confguration and parameters
convolutional layer (3 × 3@1@128) is applied next, which for the six cases. Te models have varies numbers of con-
consists of 128 flters with a kernel size of 3 × 3 and a stride of volutional and fully connected layers, as well as diferent
1. Te max-pooling layer along with dropout layer is used layer organizations. Te frst fully connected layer contains
before the fully connected layer. Fully connected layers are 128 neurons and the second contains 20 neurons in all cases.
used, which consist of 128 nodes. In the convolutional and Te frst hidden layer in the frst case presented in Fig-
fully connected layers, ReLU is used as an activation ures 6 and 12 is the convolutional layer 1, which is used for
function. SoftMax is used as a last layer to compute the feature extraction. It has 32 flters with a kernel size of 3 × 3
probabilities of output classes in the last layer. Te class with pixels, and it uses ReLU as an activation function. Te next
the highest probability produces the desired result. Te hidden layer is convolutional layer 2, which consists of 32
6 Applied Computational Intelligence and Soft Computing

Table 1: Performance of the CNN for the six diferent cases for various hidden layers.
Minimum training Minimum Maximum training Maximum
Number of Batch accuracy validation accuracy accuracy validation accuracy Overall performance
Case
hidden layers size Accuracy Accuracy Accuracy Accuracy test accuracy (%)
Epoch Epoch Epoch Epoch
(%) (%) (%) (%)
1 9 32 1 88.15 1 91.75 28 99.23 18 95.82 95.65
2 14 32 1 85.01 1 89.00 28 98.74 20 94.99 94.71
3 13 32 1 85.96 1 89.63 26 98.63 20 95.28 94.98
4 10 32 1 88.88 1 91.89 30 99.36 27 95.64 95.42
5 8 32 1 87.41 1 89.90 29 99.77 27 94.84 94.42
6 12 32 1 88.77 1 91.94 30 98.44 12 96.15 96.21

Fully Connected Layer 1


Convolution Layer 7 128 Neurons
Convolution Layer 3 Convolution Layer 5 64 kernels (3×3) filters
32 kernels (3×3) filters 64 kernels (3×3) filters
Convolution Layer 1 Convolution Layer 8
32 kernels (3×3) filters 128 kernels (3×3) filters Fully Connected Layer 2
Convolution Layer 4 Convolution Layer 6 Output Layer (20 Classes)
64 kernels (3×3) filters 64 kernels (3×3) filters
Convolution Layer 2
Input 32 kernels (3×3) filters
(32,32,1)
1

16 Class 1

24

32
1 8 16 24 32
SOFTMAX

Max pooling+Dropout
Max pooling
Flatten ReLU
Max pooling+Dropout

Max pooling+Dropout

Figure 5: Te convolutional neural network architecture of the proposed system.

accuracy vs epoch

0.98

0.96
Accuracy

0.94

0.92

0.90

0.88
5 10 15 20 25 30
Epoch

Validation Accuracy
Training Accuracy
Figure 6: Observed accuracy for case 1.

flters with a kernel size of 3 × 3 pixels and ReLU. To minimize flter with a kernel size of 3 × 3 pixel and the ReLU activation
the spatial size of the output of a convolution layer, a pooling function is applied to the model. A max-pooling layer 2 is
layer 1 is defned, with max-pooling and a pool size of 2 × 2 applied after the convolution layer4. Next to the pooling layer
pixels. Te next layers are two convolutional layers of a 64 2, a regularization layer dropout is used to reduce the
Applied Computational Intelligence and Soft Computing 7

Table 2: Loss of the CNN for the six diferent cases for various hidden layers.
Minimum Minimum Maximum Maximum
Case Number of hidden layers Batch size training loss validation loss training loss validation loss Overall test loss
Epoch Loss Epoch Loss Epoch Loss Epoch Loss
1 9 32 30 0.0232 8 0.2412 1 0.4515 1 0.3306 0.2946
2 14 32 28 0.0456 10 0.2571 1 0.5546 1 0.4149 0.2928
3 13 32 30 0.0423 14 0.2363 1 0.5214 1 0.4035 0.2908
4 10 32 27 0.0187 11 0.2371 1 0.4271 1 0.3266 0.3032
5 8 32 30 0.0068 8 0.3267 1 0.4705 28 0.4841 0.5504
6 12 32 24 0.0574 7 0.2077 1 0.4399 1 0.3213 0.2013

overftting of the model by randomly eliminating 25% of the of 95.65%. Te minimum validation accuracy is 91.75% at
neurons in the layer. Convolution layer 5 with the channel epoch 1, while the minimal training accuracy is 88.15% at
size of 64 and flter size of 3 × 3 is applied after dropout. epoch 1. At epoch 28, the highest training accuracy is
Convolutional layer 6 is the next hidden layer, which is made 99.23%, whereas the highest validation accuracy is 95.82% at
up of 128 flters with a kernel size of 3 × 3 pixels and ReLU. epoch 18. Te overall model loss, in this case, is estimated to
Max-pooling layer 3 with a dropout was applied after the be around 0.2946. Te training loss decreases exponentially
convolution layer 6. A fattened layer is utilized to turn the 2D when the iteration goes. Te validation loss decreases from
flter matrix into a 1D feature vector before entering the fully the pick value to the optimum value and then increases up to
connected layers. After the fattened layer, the fully connected the 17th epoch. After the 19th epoch, the validation loss
layer 1 is used, which comprises 128 neurons and ReLU. remains constant.
Finally, the fully connected layer 2 output layer, which de- Figures 7 and 13 are defned for case2, where the frst
termines the digits, has 20 neurons for 20 classes. hidden layer is the convolutional layer 1, which is used for
To output digits, the output layer has a SoftMax acti- feature extraction. It has 32 flters with a kernel size of 3 × 3
vation function. With a batch size of 32, the CNN is trained pixels, and it uses ReLU as an activation function. Te next
over 30 epochs. Te performance has an overall test accuracy hidden layer is convolutional layer 2, which consists of 32

Table 3: Te CNN models for Geez handwritten digit recognition.


Layer
Model Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Layer 8 Layer 9 Layer 10 Layer 12
11
Conv Conv Conv Max- Conv Max-
Max- Conv Conv
3 × 3@ 3 × 3@ 3 × 3@ pooling 3 × 3@ pooling
pooling 3 × 3@1@ 3 × 3@1@
1 1@32 1@32 1@64 2 × 2@2@ 1@64 2 × 2@2@
2 × 2@ 64 and 128 and
and and and 64 and and 128 and
2@32 ReLU ReLU
ReLU ReLU ReLU dropout ReLU dropout
Conv Conv Conv Conv Max- Conv
Max- Conv Max- Conv Conv 3 additional
3 × 3@ 3 × 3@ 3 × 3@ 3 × 3@ pooling 3 × 3@
pooling 3 × 3@1@ pooling 3 × 3@1@ 3 × 3@1@ layers such as
2 1@32 1@32 1@32 1@64 2 × 2@2@ 1@64
2 × 2@ 32 and 2 × 2@2@ 64 and 64 and pooling, conv,
and and and and 64 and and
2@32 ReLU 32 ReLU ReLU pooling
ReLU ReLU ReLU ReLU dropout ReLU
Conv Conv Conv Max- Conv Conv
Max- Conv Conv Conv Max- Convolution
3 × 3@ 3 × 3@ 3 × 3@ pooling 3 × 3@ 3 × 3@
pooling 3 × 3@1@ 3 × 3@1@ 3 × 3@1@ pooling layer with
3 1@32 1@32 1@32 2 × 2@2@ 1@64 1@64
2 × 2@ 32 and 64 and 64 and 2 × 2@2@ pooling and
and and and 32 and and and
2@32 ReLU ReLU ReLU 64 dropout layer
ReLU ReLU ReLU dropout ReLU ReLU
Conv Conv Conv Conv Conv Max- Max-
Max- Conv Conv
3 × 3@ 3 × 3@ 3 × 3@ 3 × 3@ 3 × 3@ pooling pooling
pooling 3 × 3@1@ 3 × 3@1@
4 1@32 1@32 1@32 1@64 1@64 2 × 2@2@ 2 × 2@2@
2 × 2@2@ 64 and 128 and
and and and and and 64 and 128 and
32 ReLU ReLU
ReLU ReLU ReLU ReLU ReLU dropout dropout
Conv Conv Conv Max- Conv Conv Max-
Conv
3 × 3@ 3 × 3@ 3 × 3@ pooling 3 × 3@ 3 × 3@ pooling
3 × 3@1@
5 1@32 1@32 1@64 2 × 2@2@ 1@64 1@128 2 × 2@2@
64 and
and and and 64 and and and 128 and
ReLU
ReLU ReLU ReLU dropout ReLU ReLU dropout
Conv Conv Conv Conv Conv Max- Max- Conv
Max- Conv Conv
3 × 3@ 3 × 3@ 3 × 3@ 3 × 3@ 3 × 3@ pooling pooling 3 × 3@ Max-pooling
pooling 3 × 3@1@ 3 × 3@1@
6 1@32 1@32 1@32 1@64 1@64 2 × 2@2@ 2 × 2@2@ 1@128 2 × 2@2@128
2 × 2@2@ 64 and 64 and
and and and and and 64 and 64 and and and dropout
32 ReLU ReLU
ReLU ReLU ReLU ReLU ReLU dropout dropout ReLU
8 Applied Computational Intelligence and Soft Computing

accuracy vs epoch

0.98

0.96

0.94

Accuracy
0.92

0.90

0.88

0.86

5 10 15 20 25 30
Epoch

Validation Accuracy
Training Accuracy
Figure 7: Observed accuracy for case 2.

flters with a kernel size of 3 × 3 pixels and ReLU. To accuracy vs epoch


minimize the spatial size of the output of a convolution layer,
0.98
a pooling layer 1 is defned, with max-pooling and a pool size
of 2 × 2 pixels. Te next layers are two convolutional layers of
0.96
a 32 flter with a kernel size of 3 × 3 pixel and the ReLU
activation function is applied to the model. A max-pooling 0.94
layer 2 is applied after the convolution layer4. Te next two
Accuracy

hidden layers are convolution layers which are made up of 0.92


64 flters with a kernel size of 3 × 3 pixels. Max pooling and
dropout layers are applied after the convolution layers. Te 0.90
next two layers are convolution layers with a channel size of
64 followed by a max-pooling layer. Te next hidden layer is 0.88
convolution layer 9 with a 3 × 3 kernel size of 128 flters. A
max-pooling layer with a dropout is applied after the 0.86
convolution layer. Rectifed Linear Units (ReLU) are used as 5 10 15 20 25 30
an activation function in all convolution layers. Te di- Epoch
mensions and hyperparameters used in this and the next
cases are the same as those used in case 1. Te overall Validation Accuracy
Training Accuracy
performance test accuracy is found to be 94.71%. Te
minimal training and validation accuracy is determined at Figure 8: Observed accuracy for case 3.
epoch 1. Te training accuracy is 85.01%, and the validation
accuracy is 89.00%. Epoch 28 has the highest training ac- respectively. Both layers have the same kernel size of 3 × 3. A
curacy, while epoch 20 has the highest validation accuracy. fattened layer is followed by the two fully connected layers.
Te maximum accuracy for training and validation is Te overall performance test accuracy is found to be
98.74% and 94.99%, respectively. Te total model loss is 94.98%. At epoch 1, the minimum training accuracy is
estimated to be approximately 0.2928. 85.96%, whereas the minimum validation accuracy is
Two convolutions layers with a kernel size 3 × 3 which 89.63%. Te maximum training and validation accuracies
have 32 flters are taken one after the other in case 3, as are 98.63% and 95.28% found at epochs 26 and 20, re-
shown in Figures 8 and 14, followed by a max-pooling layer. spectively. Te total model loss is found at approximately
Two other convolution layers which have the same pa- 0.2908.
rameter from the frst two layers are applied before the max- For case 4, shown in Figures 9 and 15, three consecutive
pooling layer and dropout layer. Te next layers are three convolution layers are applied one after the other. Te
consecutive convolution layers which have 64 flter channels number of channel is 32 and the kernel size is 3 × 3. Te max-
with a 3 × 3 kernel size and followed by a max-pooling layer. pooling layer was applied after the three convolutional
Before the fatten layer, two convolutional layers, max- layers. Te max-pooling layer is followed by three convo-
pooling layer, and the dropout layer were applied. Te two lution layers which have 64 kernel channels and 3 × 3 kernel
convolution layers have 64 and 128 kernel channels, size which are followed by a max-pooling layer with a
Applied Computational Intelligence and Soft Computing 9

accuracy vs epoch accuracy vs epoch


1.00

0.98
0.98

0.96 0.96
Accuracy

Accuracy
0.94 0.94

0.92 0.92

0.90 0.90

0.88
5 10 15 20 25 30
Epoch
5 10 15 20 25 30
Validation Accuracy Epoch
Training Accuracy
Validation Accuracy
Figure 9: Observed accuracy for case 4. Training Accuracy
Figure 10: Observed accuracy for case 5.
dropout. Te next layer is convolution layer 7 with a max-
pooling layer and dropout. After a fattened layer, there are
accuracy vs epoch
two fully connected layers with no dropout. Te overall test
accuracy in the performance is found 95.42%. At epoch 1, 0.98
the minimum training and validation accuracies were found
to be 88.88% and 91.89%, respectively. Te maximum
0.96
training accuracy is 99.36% is found at epoch 30, and the
maximum validation accuracy is 95.64% is found at epoch
Accuracy

27. Te total model loss is 0.3032. 0.94


Case 5 is shown in Figures 10 and 16, and for this case,
three consecutive convolution layers are applied one after
0.92
the other. Te kernel channel is 32 and the kernel size is
3 × 3. Te max-pooling layer was applied after the three
convolutional layers. Next to the pooling layer, a regulari- 0.90
zation layer dropout is used to reduce overftting by ran-
domly eliminating 20% of the neurons in the layer. Te next
layers are three convolution layers followed by a max- 5 10 15 20 25 30
pooling layer and a dropout layer. Te two fully connected Epoch
layers are followed by a fattened layer. Validation Accuracy
Te overall performance test accuracy was found to be Training Accuracy
94.42%. At epoch 1, the minimum training accuracy is
Figure 11: Observed accuracy for case 6.
87.41%, while the minimum validation accuracy is 89.90%.
Epoch 29 has the highest training accuracy, while epoch 27
has the highest validation accuracy. Te maximum accuracy randomly eliminating 20% of the neurons in the layer.
for training and validation is 99.77% and 94.84%, respec- Convolutional layer 7 which has 64 kernel size is the next
tively. Te total test loss of the model is 0.5504. Te vali- hidden layer, followed by a max-pooling layer and a dropout
dation loss of the model increase when the iteration goes. It layer. Te next layer is convolution layer 8 with 128 number
shows the model became overft to the training data. Te of channels and kernel flter size of 3 × 3. All convolution
maximum model loss is occurred in this case from all the six layers have the same flter size. Max-pooling layer 4 with a
cases. Also, the minimum model accuracy among all cases dropout was applied after the convolution layer 8. Te fatten
occurred in case 5. It shows that overftted models give a layer, followed by two fully connected layers, is applied. Te
high model loss and low accuracy for a new test dataset. overall performance test accuracy was found to be 96.21%.
Finally, in Case 6 (Figures 11 and 17), three convolutions At epoch 1, the minimum training and validation accuracies
are taken one after the other, followed by a pooling layer. Te were found to be 88.77% and 91.94%, respectively. Epoch 30
three convolution layers have 32 kernel channels. Tree has the highest training accuracy, while epoch 12 has the
convolution layers with a kernel size 64 are next, followed by highest validation accuracy. Te maximal training and
a max-pooling layer. Next to the pooling layer 2, a regu- validation accuracy is 98.44% and 96.15%, respectively. Te
larization layer dropout is applied to reduce overftting by total model loss is found approximately 0.2013. Te training
10 Applied Computational Intelligence and Soft Computing

loss vs epoch loss vs epoch

0.5
0.4

0.4
0.3
Loss

0.3

Loss
0.2
0.2
0.1
0.1

5 10 15 20 25 30
5 10 15 20 25 30
Epoch
Epoch
Validation Loss
Training Loss Validation Loss
Training Loss
Figure 12: Observed loss for case 1.
Figure 14: Observed loss for case 3.

loss vs epoch
loss vs epoch

0.5 0.40
0.35
0.4
0.30
0.25
Loss

Loss

0.3
0.20
0.2 0.15
0.10
0.1 0.05
0.00
5 10 15 20 25 30 5 10 15 20 25 30
Epoch Epoch

Validation Loss Validation Loss


Training Loss
Training Loss
Figure 13: Observed loss for case 2. Figure 15: Observed loss for case 4.

loss decreases when the number of epoch goes, but the Tis type of greater accuracy will work in Geez hand-
validation loss fuctuate for 10 epochs and then remain written digit recognition to help the machine execute more
constant for the remaining number of epochs. efciently. In case 5, however, the lowest accuracy among all
By varying the hidden layers, the changes inaccuracies observations in the performance was discovered to be
for handwritten digits were observed over 30 epochs in the 94.42% (Conv1, Conv2, Conv3, pool1, Conv4, Conv5,
experiment. Accuracy curves for the six cases for each Conv6, pool2, fatten layer, and 2 fully connected layers).
parameter were generated using a handwritten Geez digit Furthermore, the total highest model loss in case 5 is 0.5504,
dataset. Te six cases behave diferently due to the dif- while the total lowest model loss in case 6 with dropout is
ferent combinations of hidden layers. Te maximum and around 0.2013 (Figure 19). With this minimal loss, the CNN
minimum accuracies for several hidden layer variations will be able to achieve greater image quality and noise
were recorded using a batch size of 32. As shown in processing. From the observed result, the study chooses the
Figure 18, the highest test accuracy in performance was best model from six cases that have highest model test ac-
found to be 96.21% for 30 epochs in case 6 among all the curacy and lowest test loss. So, case 6 model with highest
observations (Conv1, Conv2, Conv3, pool1, Conv4, accuracy of 96.21% and lowest loss of 0.2013 is the proposed
Conv5, Conv6, pool2 with dropout, Conv7, pool3 with model for this research work.
dropout, Conv8, pool4 with dropout, fatten layer, 2 fully Te previous work on Geez handwritten digit recogni-
connected layers). tion is done by the author of [3] who achieved 89.88%
Applied Computational Intelligence and Soft Computing 11

loss vs epoch Loss vs. Model


0.5 0.6

0.4
0.4

Loss
0.3
0.2
Loss

0.2
0.0
model 1 model 2 model 3 model 4 model 5 model 6
0.1
Model
Figure 19: Loss of the diferent models.
0.0
5 10 15 20 25 30
accuracy using an ANN model. Tis study evaluates CNN
Epoch
models with diferent layers with diferent hyperparameters.
Validation Loss Compared with the previous work, the study improve the
Training Loss accuracy of the recognition from 89.88% to 96.21% by using
Figure 16: Observed loss for case 5. CNN, increasing the dataset size, and enhancing the quality
of the image by using pre-processing techniques on the
dataset.

loss vs epoch
0.45 7. Conclusion and Future Scope
0.40 In this research work, convolutional neural networks was
used to recognize Geez handwritten digits with 20-digit
0.35 classes. CNNs are the current state-of-the-art algorithm for
0.30 classifying image data and are widely used. On a prepared
form for data collection, a large number of Geez handwritten
Loss

0.25 digits were collected from individual handwriting. Te


0.20 handwritten documents are scanned and preprocessed to get
32 × 32-pixel digit images. Te study ofered a new public
0.15 dataset for the Geez handwritten digit dataset, which is open
to all researchers. CNN architecture was used from the deep
0.10
learning approaches to develop an Geez handwritten digit
0.05 recognition system. A lot of trial and error neural network
5 10 15 20 25 30 confguration tuning mechanisms were used to get the best
Epoch ft model of CNN-based architecture. In comparison to
earlier research works on Geez handwritten digit recogni-
Validation Loss
Training Loss
tion, the study able to achieve higher recognition accuracy
using the developed CNN model. Te proposed model
Figure 17: Observed loss for case 6. achieved an accuracy of 96.21% and a model loss of 0.2013.
Regardless of the fact that much work has been done in
the English language to recognize handwritten digits, only a
small amount of work has been done in the Amharic lan-
Accuracy vs. Model
98 guage. Due to a lack of research work on the area, there is a
big challenge to get datasets for the Amharic language. Te
96 collected data amount is enough to train the model, but it is
not a large dataset, and the students dominate the re-
Accuracy

94
spondent of the data gathering. Most of the respondent is
student, so the model is performed well for the students and
for other individual group the model does not perform well
92
like the students. Te dataset does not include the historical
document and manuscript images. Te collected data are
90 only from individuals not including other sources. In this
model 1 model 2 model 3 model 4 model 5 model 6
research, a dataset was developed that can be used by other
Model
researchers in the future. In the future, the dataset will also
Figure 18: Accuracy of the diferent models. have historical data as the dataset for the model, and the
12 Applied Computational Intelligence and Soft Computing

current work only supports a single handwritten Ge’ez digit, Computer Engineering & Technology (IJARCET), vol. 6, no. 7,
but in the future, add the support for multi-digit. pp. 990–997, 2017.
[12] F. Siddique, S. Sakib, and M. A. B. Siddique, “Recognition of
handwritten digit using convolutional neural network in
Data Availability python with tensorfow and comparison of performance for
various hidden layers,” in Proceedings of the 2019 5th Inter-
Te data used to support the fndings of this study are
national Conference on Advances In Electrical Engineering
available at [Link] (ICAEE), pp. 541–546, IEEE, Dhaka, Bangladesh, 2019
mLQ5Blg_lYAJng1K3LtGS/view?usp=sharing. September.
[13] I. S. Krizhevsky and G. E. Hinton, “Imagenet classifcation
Conflicts of Interest with deep convolutional neural networks,” Advances in
Neural Information Processing Systems, vol. 25, pp. 1097–1105,
Te authors declare that they have no conficts of interest. 2012.
[14] Y. LeCun, “LeNet-5, convolutional neural networks,” 2015,
[Link]
Acknowledgments [15] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning
Te study was funded by the university. algorithm for deep belief nets,” Neural Computation, vol. 18,
no. 7, pp. 1527–1554, 2006.
[16] S. Chen, R. Almamlook, Y. Gu, and L. Wells, “Ofine
References handwritten digits recognition using machine learning,” in
Proceedings of the International Conference on Industrial
[1] F. A. Demilew and B. Sekeroglu, “Ancient Geez script rec- Engineering and Operations Management, pp. 274–286,
ognition using deep learning,” SN Applied Sciences, vol. 1, Washington, DC, USA, September 2018.
no. 11, pp. 1315–1317, 2019. [17] M. A. Hossain and M. M. Ali, “Recognition of handwritten
[2] M. S. Gondere, L. Schmidt-thieme, A. S. Boltena, and digit using convolutional neural network (CNN),” Global
H. S. Jomaa, “Handwritten Amharic Character Recognition Journal of Computer Science and Technology, vol. 19, no. 2,
Using a Convolutional Neural Network,” 2019, [Link] pp. 27–33, 2019.
abs/1909.12943#:%7E:text=Tis%20research%20work% [18] S. Ali, Z. Shaukat, M. Azeem, Z. Sakhawat, T. Mahmood, and
20designs%20for,was%20applied%20for%20machine% K. Ur Rehman, “An efcient and improved scheme for
20learning. handwritten digit recognition based on convolutional neural
[3] E. G. Beyene, “Handwritten and machine printed ocr for geez network,” SN Applied Sciences, vol. 1, no. 9, pp. 1125–1129,
numbers using artifcial neural network,” 2019, [Link] 2019.
org/abs/1911.06845.
[4] E. Granell, E. Chammas, L. Likforman-Sulem, C. D. Martı́nez-
Hinarejos, C. Mokbel, and B. I. Cirstea, “Transcription of
Spanish historical handwritten documents with deep neural
networks,” Journal of Imaging, vol. 4, no. 1, pp. 15–22, 2018.
[5] H. Kusetogullari, A. Yavariabdi, J. Hall, and N. Lavesson,
“Digitnet: a deep handwritten digit detection and recognition
method using a new historical handwritten digit dataset,” Big
Data Research, vol. 23, Article ID 100182, 2021.
[6] O. Elitez, “Handwritten digit string segmentation and rec-
ognition using deep learning,” Master’s Tesis, Middle East
Technical University, Ankara, Turkey, 2015.
[7] F. C. Ribas, L. S. Oliveira, A. S. Britto Jr, and R. Sabourin,
“Handwritten digit segmentation: a comparative study,” In-
ternational Journal on Document Analysis and Recognition,
vol. 16, no. 2, pp. 127–137, 2013.
[8] R. Saabni, “Recognizing handwritten single digits and digit
strings using deep architecture of neural networks,” in Pro-
ceedings of the International Conference on Artifcial Intelli-
gence and Pattern Recognition, pp. 1–6, IEEE, Lodz, Poland,
2016.
[9] Z. Shi and F. Date, “Detecting date regions on handwritten
document images based on positional expectancy,” Master’s
Tesis, University of Groningen, Groningen, Netherlands,
2016.
[10] D. Ciresan, “Avoiding segmentation in multi-digit numeral
string recognition by combining single and two-digit classi-
fers trained without negative examples,” in Proceedings of the
International Symposium on Symbolic and Numeric Algo-
rithms for Scientifc Computing, pp. 225–230, IEEE, Timisoara,
Romaniapp, 2008.
[11] A. Dutt and D. Aashi, “Handwritten digit recognition using
deep learning,” International Journal of Advanced Research in

Common questions

Powered by AI

Dropout helps manage overfitting in CNN architectures by randomly omitting a proportion of neurons during training, typically around 20%. This forces the network to learn more robust features that do not rely on any one particular neuron, thereby improving model generalization and performance when recognizing handwritten digits .

Demilew and Sekeroglu highlight challenges in recognizing Geez characters, such as the similarity of characters within the same base class, low image quality in documents, and imbalance in character instance numbers. These challenges make it difficult to accurately classify characters and necessitate classification to base characters instead of specific characters, limiting recognition precision .

The small size of the dataset used in Beyene's research, with only 560 samples, significantly reduces the model's ability to generalize, resulting in a low classification accuracy of 89.88%. Larger and more diverse datasets typically improve model performance by providing a broader range of examples for learning .

Preprocessing is crucial in handwritten digit recognition because it enhances the data quality, allowing for more accurate recognition. Preprocessing techniques, such as normalization, median filtering, and image sharpening, help reduce noise and equalize weights across attributes, thereby improving the robustness of the recognition system. However, ineffective preprocessing, such as improper binarization thresholds and excess blurring from filtering, can degrade image quality and recognition performance .

The KNN classifier, while achieving high accuracy, excels in speed, being ten times faster than the neural network model. However, the neural network provides marginally better overall accuracy. The limitations include the dependency on preprocessing quality for both methods, with neural networks potentially requiring more computational resources for training and inference .

Demilew and Sekeroglu's model is limited to identifying Geez characters instead of words or full sentences due to the complexity of the script and the limited scope of the training dataset. The model's architecture is designed specifically to classify base characters, which does not extend to more complex linguistic structures like words or sentences .

Convolutional layers in CNNs extract features by applying filters to the input's spatial dimensions, capturing patterns and hierarchies in the data. Pooling layers, such as max-pooling, then reduce spatial size, retaining only significant features and thus preventing overfitting. This combination allows CNNs to efficiently recognize handwritten digits by focusing on essential patterns .

Ali et al.'s model addresses feature extraction by utilizing a CNN where each layer sequentially extracts features from the input image, with outputs from each layer feeding into subsequent layers. The feature classification task is then handled by classifying these extracted features into predicted output categories. This architecture effectively uses the hierarchical structure of CNNs to enhance feature representation and classification accuracy .

DIGITNET faces several limitations, such as high computational cost due to high-resolution image data and negative sampling, leading to low digit detection accuracy. Additionally, some digits remain unlabeled due to poor appearance in historical documents, hindering accurate recognition and potentially introducing bias in the model .

Multitask learning enhances CNN-based Geez character recognition by leveraging shared representations across related tasks, thereby improving the model's ability to generalize from the relationships among characters. This improves accuracy in character recognition, even with the unique and varied datasets typical of Geez scripts .

You might also like