0% found this document useful (0 votes)

7 views6 pages

OCR with ANN: A Comprehensive Guide

The document outlines an experiment on Optical Character Recognition (OCR) using Artificial Neural Networks (ANN) at Wadia College of Engineering, Pune. It discusses the objectives, outcomes, and theory behind OCR, including its challenges and various open-source tools like Tesseract, OCRopus, and Ocular. Additionally, it provides installation instructions for Tesseract and details on its functionality and usage in Python.

Uploaded by

AYUSH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

OCR with ANN: A Comprehensive Guide

Uploaded by

AYUSH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MES’S WADIA COLLEGE OF ENGINEERING, PUNE

Honors* in Artificial Intelligence and Machine Learning Fourth year of Engineering

410302: Machine learning Laboratory

NAME OF STUDENT: CLASS: BE

SEMESTER/YEAR: VII ROLL NO:
DATE OF PERFORMANCE: DATE OF SUBMISSION:
EXAMINED BY: EXPERIMENT NO: 02

TITLE: OCR using ANN.

PROBLEM STATEMENT: Recognize optical character using ANN

OBJECTIVES:
● To understand the different graphical and Hidden Markov models of learning

OUTCOMES:
● Select the appropriate type of neural network architecture and draw the
inferences from the different graphical and hidden Markov models.

PRE-REQUISITES:

1.Knowledge of python programming

2.Basic knowledge of machine learning and neural networks

THEORY:
OCR = Optical Character Recognition. In other words, OCR systems transform a two-
dimensional image of text, that could contain machine printed or handwritten text from its
image representation into machine-readable text. OCR as a process generally consists of
several sub-processes to perform as accurately as possible. The subprocesses are:
Preprocessing of the Image
Text Localization
Character Segmentation
Character Recognition
Post Processing
The sub-processes in the list above of course can differ, but these are roughly steps
needed to approach automatic character recognition. In OCR software, it’s main aim to
identify and capture all the unique words using different languages from written text
characters.
For almost two decades, optical character recognition systems have been widely used
to provide automated text entry into computerized systems. Yet in all this time, conventional
OCR systems (like zonal OCR) have never overcome their inability to read more than a
handful of type fonts and page formats. Proportionally spaced type (which includes virtually
all typeset copy), laser printer fonts, and even many non-proportional typewriter fonts, have
remained beyond the reach of these systems. And as a result, conventional OCR has never
achieved more than a marginal impact on the total number of documents needing conversion
into digital form.

Optical Character Recognition remains a challenging problem when text occurs in

unconstrained environments, like natural scenes, due to geometrical distortions, complex
backgrounds, and diverse fonts. The technology still holds an immense potential due to the
various use-cases of deep learning based OCR like :

● building license plate readers

● digitizing invoices
● digitizing menus
● digitizing ID cards

Open Source OCR Tools

Tesseract - an open-source OCR engine that has gained popularity among OCR developers.
Even though it can be painful to implement and modify sometimes, there weren’t too many
free and powerful OCR alternatives on the market for the longest time. Tesseract began as a
Ph.D. research project in HP Labs, Bristol. It gained popularity and was developed by HP
between 1984 and 1994. In 2005 HP released Tesseract as an open-source software. Since
2006 it is developed by Google.

google trends comparison for different open source OCR tools

OCRopus - OCRopus is an open-source OCR system allowing easy evaluation and reuse of
the OCR components by both researchers and companies. A collection of document analysis
programs, not a turn-key OCR system. To apply it to your documents, you may need to do
some image preprocessing, and possibly also train new models. In addition to the recognition
scripts themselves, there are several scripts for ground truth editing and correction, measuring
error rates, determining confusion matrices that are easy to use and edit.
Ocular - Ocular works best on documents printed using a hand press, including those written
in multiple languages. It operates using the command line. It is a state-of-the-art historical
OCR system. Its primary features are:
● Unsupervised learning of unknown fonts: requires only document images and a
corpus of text.
● Ability to handle noisy documents: inconsistent inking, spacing, vertical alignment
● Support for multilingual documents, including those that have considerable word-
level code-switching.
● Unsupervised learning of orthographic variation patterns including archaic spellings
and printer shorthand.
● Simultaneous, joint transcription into both diplomatic (literal) and normalized forms.

SwiftOCR - I will also mention the OCR engine written in Swift since there is huge
development being made into advancing the use of the Swift as the development
programming language used for deep learning. Check out blog to find out more why.
SwiftOCR is a fast and simple OCR library that uses neural networks for image recognition.
SwiftOCR claims that their engine outperforms well known Tessaract library.

Tesseract OCR
Tesseract is an open source text recognition (OCR) Engine, available under the Apache 2.0
license. It can be used directly, or (for programmers) using an API to extract printed text from
images. It supports a wide variety of languages. Tesseract doesn’t have a built-in GUI, but
there are several available from the 3rdParty page. Tesseract is compatible with many
programming languages and frameworks through wrappers that can be found here. It can be
used with the existing layout analysis to recognize text within a large document, or it can be
used in conjunction with an external text detector to recognize text from an image of a single
text line.
Fig: OCR Process Flow to build API with Tesseract from a blog post

Tesseract 4.00 includes a new neural network subsystem configured as a text line recognizer.
It has its origins in OCRopus' Python-based LSTM implementation but has been redesigned
for Tesseract in C++. The neural network system in Tesseract pre-dates TensorFlow but is
compatible with it, as there is a network description language called Variable Graph
Specification Language (VGSL), that is also available for TensorFlow.

To recognize an image containing a single character, we typically use a Convolutional Neural

Network (CNN). Text of arbitrary length is a sequence of characters, and such problems are
solved using RNNs and LSTM is a popular form of RNN. Read this post to learn more about
LSTM.
Technology - How it works
LSTMs are great at learning sequences but slow down a lot when the number of states is too
large. There are empirical results that suggest it is better to ask an LSTM to learn a long
sequence than a short sequence of many classes. Tesseract developed from OCRopus model
in Python which was a fork of a LSMT in C++, called CLSTM. CLSTM is an
implementation of the LSTM recurrent neural network model in C++, using the Eigen library
for numerical computations.

Fig: Tesseract 3 OCR process from paper

Legacy Tesseract 3.x was dependant on the multi-stage process where we can differentiate
steps:
● Word finding
● Line finding
● Character classification
Word finding was done by organizing text lines into blobs, and the lines and regions are
analyzed for fixed pitch or proportional text. Text lines are broken into words differently
according to the kind of character spacing. Recognition then proceeds as a two-pass process.
In the first pass, an attempt is made to recognize each word in turn. Each word that is
satisfactory is passed to an adaptive classifier as training data. The adaptive classifier then
gets a chance to more accurately recognize text lower down the page. Modernization of the
Tesseract tool was an effort on code cleaning and adding a new LSTM model. The input
image is processed in boxes (rectangle) line by line feeding into the LSTM model and giving
output. In the image below we can visualize how it works.

Fig: How Tesseract uses LSTM model presentation

After adding a new training tool and training the model with a lot of data and fonts, Tesseract
achieves better performance. Still, not good enough to work on handwritten text and weird
fonts. It is possible to fine-tune or retrain top layers for experimentation.

Installing Tesseract
Installing tesseract on Windows is easy with the precompiled binaries found here. Do not
forget to edit “path” environment variable and add tesseract path. For Linux or Mac
installation it is installed with few commands.
After the installation verify that everything is working by typing command in the terminal or
cmd:

$ tesseract --version
And you will see the output similar to:
tesseract 4.0.0
leptonica-1.76.0
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.8
Found AVX2
Found AVX
Found SSE
You can install the python wrapper for tesseract after this using pip.
$ pip install pytesseract

Tesseract library is shipped with a handy command-line tool called tesseract. We can use this
tool to perform OCR on images and the output is stored in a text file. If we want to integrate
Tesseract in our C++ or Python code, we will use Tesseract’s API.

Running Tesseract with CLI

Call the Tesseract engine on the image with image_path and convert image to text, written
line by line in the command prompt by typing the following:
$ tesseract image_path stdout
To write the output text in a file:
$ tesseract image_path text_result.txt
To specify the language model name, write language shortcut after -l flag, by default it takes
English language:
$ tesseract image_path text_result.txt -l eng
By default, Tesseract expects a page of text when it segments an image. If you're just seeking
to OCR a small region, try a different segmentation mode, using the --psm argument. There
are 14 modes available which can be found here. By default, Tesseract fully automates the
page segmentation but does not perform orientation and script detection. To specify the
parameter, type the following:
$ tesseract image_path text_result.txt -l eng --psm 6

**There is also one more important argument, OCR engine mode (oem). Tesseract
4 has two OCR engines — Legacy Tesseract engine and LSTM engine. There are
four modes of operation chosen using the --oem option.
0 Legacy engine only.
1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.

OCR with Pytesseract and OpenCV

Pytesseract is a wrapper for Tesseract-OCR Engine. It is also useful as a stand-alone
invocation script to tesseract, as it can read all image types supported by the Pillow and
Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others. More info about
Python approach read here. The code for this tutorial can be found in this repository
[Link]
tesseract/#:~:text=OCR%20with%20Pytesseract%20and%20OpenCV%20Pytesseract%20is
%20a,others.%20More%20info%20about%20Python%20approach%20read%20here

QUESTIONS:
1.What are popular OCR Applications in the Real World?
2.Summarize Text Recognition with Tesseract OCR in your own words.

Ocr Nanonets Tesseract
No ratings yet
Ocr Nanonets Tesseract
39 pages
Build OCR with Tesseract and OpenCV
No ratings yet
Build OCR with Tesseract and OpenCV
10 pages
Tesseract OCR: Open Source Case Study
No ratings yet
Tesseract OCR: Open Source Case Study
7 pages
Tesseract OCR Guide for Raspberry Pi
No ratings yet
Tesseract OCR Guide for Raspberry Pi
8 pages
Open Source OCR Solutions Overview
No ratings yet
Open Source OCR Solutions Overview
6 pages
Out Put 7999
No ratings yet
Out Put 7999
5 pages
OCR Technology Using Python Libraries
No ratings yet
OCR Technology Using Python Libraries
24 pages
Tesseract OCR: Open Source Text Recognition
No ratings yet
Tesseract OCR: Open Source Text Recognition
3 pages
Tesseract OCR Engine
No ratings yet
Tesseract OCR Engine
11 pages
OCR with Tesseract and OpenCV Guide
No ratings yet
OCR with Tesseract and OpenCV Guide
6 pages
Android OCR and Speech Conversion App
No ratings yet
Android OCR and Speech Conversion App
4 pages
Tesseract OCR in Java Guide
No ratings yet
Tesseract OCR in Java Guide
3 pages
Tesseract OCR: Training and Usage Guide
No ratings yet
Tesseract OCR: Training and Usage Guide
1 page
OCR and gTTS Implementation Overview
No ratings yet
OCR and gTTS Implementation Overview
49 pages
TensorFlow Handwriting Recognition System
No ratings yet
TensorFlow Handwriting Recognition System
28 pages
Mobile OCR and Speech Conversion App
No ratings yet
Mobile OCR and Speech Conversion App
104 pages
Android OCR Image to Speech App
No ratings yet
Android OCR Image to Speech App
4 pages
OCR System Project Report in Python
No ratings yet
OCR System Project Report in Python
59 pages
EasyOCR: Multilingual Text Recognition
No ratings yet
EasyOCR: Multilingual Text Recognition
11 pages
Tesseract OCR For Hindi Typewritten Documents
No ratings yet
Tesseract OCR For Hindi Typewritten Documents
5 pages
IJMIE1April24 55698
No ratings yet
IJMIE1April24 55698
7 pages
OCR System Development Project
No ratings yet
OCR System Development Project
13 pages
Understanding Optical Character Recognition
No ratings yet
Understanding Optical Character Recognition
11 pages
OpenCV OCR and Text Recognition With Tesseract - PyImageSearch
No ratings yet
OpenCV OCR and Text Recognition With Tesseract - PyImageSearch
65 pages
Handwritten Text Recognition Using Machine Learning Techniques in Application of NLP
No ratings yet
Handwritten Text Recognition Using Machine Learning Techniques in Application of NLP
4 pages
OCR Solutions for Image Text Extraction
No ratings yet
OCR Solutions for Image Text Extraction
9 pages
Handwriting Recognition with OCR & Neural Networks
No ratings yet
Handwriting Recognition with OCR & Neural Networks
6 pages
OCR Techniques and Technologies Overview
No ratings yet
OCR Techniques and Technologies Overview
7 pages
Optical Character Recognition Overview
No ratings yet
Optical Character Recognition Overview
7 pages
AI-Powered Bengali Optical Character Recognition
No ratings yet
AI-Powered Bengali Optical Character Recognition
7 pages
Understanding Optical Character Recognition
No ratings yet
Understanding Optical Character Recognition
7 pages
OCR with CNN: Image to Text Conversion
No ratings yet
OCR with CNN: Image to Text Conversion
5 pages
Pashto Handwritten Character Recognition Using GAN
No ratings yet
Pashto Handwritten Character Recognition Using GAN
16 pages
AI-Powered OCR for Handwritten Text
No ratings yet
AI-Powered OCR for Handwritten Text
20 pages
CNN for Ancient Script Detection
No ratings yet
CNN for Ancient Script Detection
7 pages
Handwritten Text Recognition Project
No ratings yet
Handwritten Text Recognition Project
32 pages
Real-Time Retrieving Vedic Sanskrit Text Into Multi-Lingual Text and Audio For Cultural Tourism Motiva
No ratings yet
Real-Time Retrieving Vedic Sanskrit Text Into Multi-Lingual Text and Audio For Cultural Tourism Motiva
6 pages
YOLOv8-Based OCR for Georgian Handwriting
No ratings yet
YOLOv8-Based OCR for Georgian Handwriting
17 pages
Handwritten Label Recognition System
No ratings yet
Handwritten Label Recognition System
64 pages
Breaking CAPTCHAs with Python OCR
No ratings yet
Breaking CAPTCHAs with Python OCR
4 pages
562 Araf Mihad
No ratings yet
562 Araf Mihad
10 pages
OCR Techniques and Implementation in Python
No ratings yet
OCR Techniques and Implementation in Python
110 pages
Arabic Handwritten OCR with Transformers
No ratings yet
Arabic Handwritten OCR with Transformers
5 pages
Survey on Intelligent Form Reader
No ratings yet
Survey on Intelligent Form Reader
5 pages
Raspberry Pi OCR and TTS Review
No ratings yet
Raspberry Pi OCR and TTS Review
5 pages
Invoice Data Digitization with OCR
No ratings yet
Invoice Data Digitization with OCR
10 pages
Accuracy Comparison: Scenario GPT-5 Accuracy Tesseract Accuracy Agent Advantage
No ratings yet
Accuracy Comparison: Scenario GPT-5 Accuracy Tesseract Accuracy Agent Advantage
12 pages
OCR Text Extraction: A Systematic Review
No ratings yet
OCR Text Extraction: A Systematic Review
6 pages
OCR and gTTS Implementation Project
No ratings yet
OCR and gTTS Implementation Project
53 pages
Telugu Text Extraction from PDF Using OCR
No ratings yet
Telugu Text Extraction from PDF Using OCR
32 pages
Neural Network OCR Project Report
No ratings yet
Neural Network OCR Project Report
15 pages
OCR Methods Compared
No ratings yet
OCR Methods Compared
1 page
Urdu Optical Character Recognition OCR Thesis Zaheer Ahmad Peshawar Its Soruce Code Is Available On MATLAB Site 21-01-09
100% (1)
Urdu Optical Character Recognition OCR Thesis Zaheer Ahmad Peshawar Its Soruce Code Is Available On MATLAB Site 21-01-09
61 pages
Tesseract OCR: Open-Source Text Recognition
No ratings yet
Tesseract OCR: Open-Source Text Recognition
6 pages
Handwritten Character Recognition with CNN
No ratings yet
Handwritten Character Recognition with CNN
8 pages
Optical Character Recognition Overview
No ratings yet
Optical Character Recognition Overview
27 pages
Optical Character Recognition - OCR Text Recognition
No ratings yet
Optical Character Recognition - OCR Text Recognition
11 pages
Optical Character Recognition Overview
No ratings yet
Optical Character Recognition Overview
21 pages
Guide To Reading Text With OCR PDF
No ratings yet
Guide To Reading Text With OCR PDF
3 pages
JBIG2 Compression with OCR Enhancement
No ratings yet
JBIG2 Compression with OCR Enhancement
57 pages
Data Flow Diagrams for Detection Modules
No ratings yet
Data Flow Diagrams for Detection Modules
10 pages
Tesseract: Open Source OCR Engine
No ratings yet
Tesseract: Open Source OCR Engine
6 pages
OCR Hindi Text with VietOCR Guide
No ratings yet
OCR Hindi Text with VietOCR Guide
7 pages
Tesseract OCREngine
No ratings yet
Tesseract OCREngine
12 pages
Overview of Tesseract OCR Engine
No ratings yet
Overview of Tesseract OCR Engine
15 pages
Amharic OCR Mobile Application Project
No ratings yet
Amharic OCR Mobile Application Project
80 pages
OCR for Drug Image Text Recognition
No ratings yet
OCR for Drug Image Text Recognition
56 pages
Raspberry Pi Vehicle Detection System
No ratings yet
Raspberry Pi Vehicle Detection System
3 pages
OCRdroid: Mobile OCR Framework Overview
No ratings yet
OCRdroid: Mobile OCR Framework Overview
21 pages
AI-Powered OCR for Handwritten Documents
No ratings yet
AI-Powered OCR for Handwritten Documents
3 pages
Welth An AI Finance Management Platform
No ratings yet
Welth An AI Finance Management Platform
7 pages
Audiveris Handbook
No ratings yet
Audiveris Handbook
519 pages
Image to Text Converter Project Report
No ratings yet
Image to Text Converter Project Report
18 pages
License Plate Recognition System Overview
No ratings yet
License Plate Recognition System Overview
27 pages
Python Toolbox: 100 Developer Scripts
No ratings yet
Python Toolbox: 100 Developer Scripts
193 pages
Bangla OCR for Android Devices
No ratings yet
Bangla OCR for Android Devices
63 pages
Tesseract vs EasyOCR Accuracy Insights
No ratings yet
Tesseract vs EasyOCR Accuracy Insights
14 pages
Best Open-Source OCR Models for RAG
No ratings yet
Best Open-Source OCR Models for RAG
8 pages
Odia OCR Training with Tesseract
No ratings yet
Odia OCR Training with Tesseract
6 pages
Write Python Code With GUI To Read Addr
No ratings yet
Write Python Code With GUI To Read Addr
63 pages
YOLO-Based Object Detection for the Blind
No ratings yet
YOLO-Based Object Detection for the Blind
29 pages
PDF Layout Analysis with Layout-Parser
No ratings yet
PDF Layout Analysis with Layout-Parser
3 pages
OpenCV Basics for Real-Time Image Processing
No ratings yet
OpenCV Basics for Real-Time Image Processing
27 pages
Convert Text Image to Handwritten Image
No ratings yet
Convert Text Image to Handwritten Image
5 pages

OCR with ANN: A Comprehensive Guide

Uploaded by

OCR with ANN: A Comprehensive Guide

Uploaded by

MES’S WADIA COLLEGE OF ENGINEERING, PUNE

Honors* in Artificial Intelligence and Machine Learning Fourth year of Engineering

410302: Machine learning Laboratory

NAME OF STUDENT: CLASS: BE

TITLE: OCR using ANN.

PROBLEM STATEMENT: Recognize optical character using ANN

1.​Knowledge of python programming

Optical Character Recognition remains a challenging problem when text occurs in

●​ building license plate readers

Open Source OCR Tools

google trends comparison for different open source OCR tools

To recognize an image containing a single character, we typically use a Convolutional Neural

Fig: Tesseract 3 OCR process from paper

Fig: How Tesseract uses LSTM model presentation

Running Tesseract with CLI

OCR with Pytesseract and OpenCV

You might also like

1.Knowledge of python programming

● building license plate readers