Spelling Error Detection in NLP

The document discusses various applications of Natural Language Processing (NLP), focusing on Information Extraction (IE), Named Entity Recognition (NER), and spell correction techniques. It outlines the goals and methods of IE, the importance of NER in classifying entities in text, and the challenges of correcting real-word spelling errors using contextual information and machine learning. Additionally, it highlights the use of language models and semantic relationships in improving spelling correction systems.

Uploaded by

janiita786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views22 pages

Spelling Error Detection in NLP

Uploaded by

janiita786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

NLP Applications

(Information Extraction, Named Entity

Recognition and Spell Corrections)
Lecture # 5
Information Extraction (IE)
IE systems extract clear, factual information
• E.g.,
• Gathering earnings, profits, board members, headquarters, etc.
from company reports
• The headquarters of BHP Billiton Limited, and the global
headquarters of the combined BHP Billiton Group, are located in
Melbourne, Australia.
• headquarters(“BHP Biliton Limited”, “Melbourne,
Australia”)
Information Extraction (IE)
Goal of IE is to map a document curpus to some structured
database/format
Benefit: – Complex searches e.g.
Find me all teaching jobs in Taxila paying at least Rs. 50K
How the job nature and requirements have changed over
the years
Information Extraction with
Natural Language
Understanding
Information Extraction
Named Entity
Recognition (NER)
• A very important sub-task: find and classify names Person
in text, for example: Date
• The decision by the independent MP Andrew Wilkie to Location
withdraw his support for the minority Labor government Organization
sounded dramatic but it should not further threaten its
stability. When, after the 2010 election, Wilkie, Rob
Oakeshott, Tony Windsor and the Greens agreed to support
Labor, they gave just two guarantees: confidence and supply
Free writing
Free writing is a task where writer writes while
ignoring grammatical and spelling mistakes.

Is Chat GPT Support Free Writing ?

Today Spell Correction
Applications
used in many of our day-to-day activities like
◦ word prediction while sending a text
◦ spell checker while writing a word document
◦ query prediction in search engines etc.
Spelling Tasks
Spelling Error Detection
Spelling Error Correction:
◦ Autocorrect
◦ hte ---> the
◦ Suggest a correction
◦ Suggestion lists
Spelling checker in Word
Processors
Nearly all word processors have a built-in Spelling checker that flags the spelling
mistakes.
It also provides the solution to correct these spelling mistakes by choosing a
possible alternative from a given list.
For identification of spelling mistakes, most spellcheckers checks each word
drawn separately from the written text against the dictionary-stored words.
If the word is found while searching the dictionary, it is considered as correct
word regardless of its context.
This approach is efficient for identifying the non-word spelling mistakes but
other mistakes cannot be identified using this method.
Types of spelling errors
Non-word Errors
• graffe ---> giraffe
• Real-word Errors
• Typographical errors
three ---> there
• Cognitive Errors (homophones)
◦ piece ---> peace,
◦ too ---> two
Non-word spelling errors
• Non-word spelling error detection:
• Any word not in a dictionary is an error
• The larger the dictionary the better
• Non-word spelling error correction:
◦ Edit Distance Algorithms (you already cover it in assignment # 1 
◦ Language Models (N-Gram) (In next lectures)
◦ Spell-checking library in Python (NLTK )
◦ Contextual Correction (Used in Chat GPT and BERT)
◦ Machine Learning Approaches
N-gram similarity measures
Models that assign probabilities to sequences of words are called language
models or LMs.
Simplest model that assigns probabilities to sentences and sequences of words,
the n-gram.
An n-gram is a sequence n-gram of n words: a 2-gram (which we’ll call bigram) is
a two-word sequence of words like “please turn”, “turn your”, or ”your
homework”, and a 3-gram (a trigram) is a three-word sequence of words like
“please turn your”, or “turn your homework”.
Will discuss in details in upcoming lectures
Real word spelling errors
Solving real-word errors in Natural Language Processing
(NLP) involves addressing issues related to
◦ Homophones
◦ Misspellings
◦ other errors that commonly occur in text
Real-word errors can affect the accuracy and
comprehensibility of NLP systems, so it's essential to handle
them effectively.
Cont.……
Real-word spelling mistakes i.e. words that are correctly spelled but are not
intended by the user.
Mistakes falling under this category go unrecognized by most spellcheckers
because they handle non-word spelling mistakes by checking against the
dictionary word list only.
To identify the real-word spelling mistakes, there is a need to utilize the
neighboring contextual information of the target word.
An example of such sentence is “I want to eat a piece of cake” and the confused
word set in this case is(piece, peace),to identify that ‘peace’ cannot be used in
this case, we utilize the neighboring contextual information ‘cake’ for word
‘piece’
Need A Solution …
Correcting real-word errors
Machine Learning
◦ Relying on the feature training set, and the learning method of annotation

Semantic Information
◦ Real-word errors checking is based on contextual semantic relations,
assuming that the right word has a strong semantic connection to its context,
while the real-word errors does not have such a semantic association.
N-Gram Statistical Language Model
◦ Correcting errors approach relies on huge N-Gram statistical model, capturing
longer semantic relations is difficult.
Solution?
An automatic spelling correction system identifies real-word
errors by semantic analysis of the surrounding context.
More complex error-detection systems may be used to
detect words that are correctly spelled but do not fit into the
syntactic or semantic context.
The neural networks can be trained on already available
large textual corpora.
Cont.…….
Correction using Trigrams
There are two ways to create confusion set. The confusion set
can be generated in advance or at the runtime.
Correction using Machine Learning Techniques
Machine learning method is one of the most widely used methods to perform
the NLP tasks e.g. Part-of-speech tagging is used in order to correct the ‘real-
word spelling errors’. In this method disambiguation of lexical resources is
considered to be the main obstacle and the ambiguity is removed using
confusion sets.
Correction using Semantic
Relationships
The semantic relationship method is the correction method in which the meaning of the word is
analyzed with respect to the sentence.
The method was actually based on a study that the meaning of the words should be in sync with
the surrounding words of the sentence.
There are some real-word errors which cannot be solved easily such ‘malapropism’ causes the
coherence of the text.
In these types of malapropism errors, the spell checkers work in two stages. In the first stage it
finds the usual suspects. The word that does not seem to be related with other word is
considered to be the suspect.
The words belong to the suspects group are discarded and the rest from the rest of the words
the most likely words are identified in second phase.
To represent the semantic relationship of the words used in text, the noun portion of
corpus/lexicon of the particular language is used.
Spell-checking library in Python

PySpellChecker
SpaCy
SymSpell
TextBlob
[Link]

Show File-6
No ratings yet
Show File-6
4 pages
Spelling Error Detection Techniques
No ratings yet
Spelling Error Detection Techniques
3 pages
Spelling Correction in NLP Using Edit Distance
No ratings yet
Spelling Correction in NLP Using Edit Distance
3 pages
Lexical Analysis in NLP Explained
No ratings yet
Lexical Analysis in NLP Explained
17 pages
Spell Correction Techniques in NLP
No ratings yet
Spell Correction Techniques in NLP
7 pages
Python Autocorrect Model Using NLP
No ratings yet
Python Autocorrect Model Using NLP
6 pages
Spelling Correction with Noisy Channel Model
No ratings yet
Spelling Correction with Noisy Channel Model
5 pages
Understanding Tokenization in NLP
No ratings yet
Understanding Tokenization in NLP
4 pages
Noisy Channel Model for Spelling Correction
No ratings yet
Noisy Channel Model for Spelling Correction
25 pages
Spell Checking Techniques and Errors
No ratings yet
Spell Checking Techniques and Errors
13 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
20 pages
Free Grammar Checker Software
No ratings yet
Free Grammar Checker Software
6 pages
NLP Techniques for Indian Languages
No ratings yet
NLP Techniques for Indian Languages
1 page
Spelling Correction in NLP Overview
No ratings yet
Spelling Correction in NLP Overview
9 pages
Spelling Error Detection and Correction Techniques
No ratings yet
Spelling Error Detection and Correction Techniques
19 pages
NLP Exam Questions and Concepts Guide
No ratings yet
NLP Exam Questions and Concepts Guide
6 pages
Minimum Edit Distance in NLP Explained
No ratings yet
Minimum Edit Distance in NLP Explained
12 pages
Vietnamese Spelling Error Detection with BERT
No ratings yet
Vietnamese Spelling Error Detection with BERT
12 pages
NeuSpell: Neural Spelling Correction Toolkit
No ratings yet
NeuSpell: Neural Spelling Correction Toolkit
7 pages
Advanced Spelling and Grammar Checker
No ratings yet
Advanced Spelling and Grammar Checker
6 pages
Spelling Correction Techniques in NLP
No ratings yet
Spelling Correction Techniques in NLP
9 pages
Malayalam Spelling Error Correction System
100% (1)
Malayalam Spelling Error Correction System
5 pages
Spell Checker Project Report
No ratings yet
Spell Checker Project Report
15 pages
Text Prediction and Correction Using NLP
No ratings yet
Text Prediction and Correction Using NLP
7 pages
Spell Correction Project Synopsis
No ratings yet
Spell Correction Project Synopsis
12 pages
Error-Tolerant Lexical Processing in NLP
No ratings yet
Error-Tolerant Lexical Processing in NLP
14 pages
Python Autocorrect Tool Guide
No ratings yet
Python Autocorrect Tool Guide
14 pages
NLP Applications: Spelling & Grammar Checker
No ratings yet
NLP Applications: Spelling & Grammar Checker
4 pages
Notes W2L28
No ratings yet
Notes W2L28
5 pages
NLP Concepts and Applications Overview
100% (1)
NLP Concepts and Applications Overview
72 pages
Types of Spelling Errors Explained
No ratings yet
Types of Spelling Errors Explained
1 page
Spelling Error Detection Methods in NLP
No ratings yet
Spelling Error Detection Methods in NLP
5 pages
Spelling Correction in NLP
No ratings yet
Spelling Correction in NLP
72 pages
Spelling Correction Techniques Explained
No ratings yet
Spelling Correction Techniques Explained
3 pages
NLP Sem 5
No ratings yet
NLP Sem 5
4 pages
Grammar Checker Applications in NLP
No ratings yet
Grammar Checker Applications in NLP
82 pages
Understanding Finite State Transducers
No ratings yet
Understanding Finite State Transducers
12 pages
NLP Sentence Correction System Report
No ratings yet
NLP Sentence Correction System Report
4 pages
8.chinese Phonemic
No ratings yet
8.chinese Phonemic
2 pages
Intelligent Spelling Corrector Project Report
No ratings yet
Intelligent Spelling Corrector Project Report
25 pages
Techniques for Scoring Short Answer Essays
No ratings yet
Techniques for Scoring Short Answer Essays
13 pages
Error Correction 1 - TeachingEnglish - British Council - BBC
No ratings yet
Error Correction 1 - TeachingEnglish - British Council - BBC
5 pages
Building a Simple Spell Corrector
No ratings yet
Building a Simple Spell Corrector
9 pages
Character-Phonetic BERT for Spelling Correction
No ratings yet
Character-Phonetic BERT for Spelling Correction
5 pages
Minimum Edit Distance in NLP
No ratings yet
Minimum Edit Distance in NLP
52 pages
E Search Searching
No ratings yet
E Search Searching
15 pages
2023 cl-3 4
No ratings yet
2023 cl-3 4
59 pages
Semantic Proofreading Tool Overview
No ratings yet
Semantic Proofreading Tool Overview
11 pages
Automatic Spelling Correction in Scientific and Scholarly Text
No ratings yet
Automatic Spelling Correction in Scientific and Scholarly Text
11 pages
Bayesian Models for Pronunciation Errors
No ratings yet
Bayesian Models for Pronunciation Errors
50 pages
Spelling Correction in Web Queries
No ratings yet
Spelling Correction in Web Queries
13 pages
NLP Tutorial08 Answers
No ratings yet
NLP Tutorial08 Answers
11 pages
Part of Speech in NLP Techniques
No ratings yet
Part of Speech in NLP Techniques
8 pages
Irjet V7i6453
No ratings yet
Irjet V7i6453
6 pages
Grammar Error Correction with NLP Techniques
No ratings yet
Grammar Error Correction with NLP Techniques
7 pages
NLP Challenges and Language Models
No ratings yet
NLP Challenges and Language Models
9 pages
Language Models For Contextual Error Detection and Correction
No ratings yet
Language Models For Contextual Error Detection and Correction
8 pages
NLP Pipeline Overview and Steps
No ratings yet
NLP Pipeline Overview and Steps
10 pages
NLP Models: SVM and Logistic Regression
No ratings yet
NLP Models: SVM and Logistic Regression
18 pages
NLP Feature Engineering Techniques
No ratings yet
NLP Feature Engineering Techniques
21 pages
Understanding Text Classification in NLP
No ratings yet
Understanding Text Classification in NLP
15 pages
Earth Science Teacher Resume Summary
No ratings yet
Earth Science Teacher Resume Summary
2 pages
Student Perspectives on AI in Education
No ratings yet
Student Perspectives on AI in Education
35 pages
English 10 Weekly Learning Plan
No ratings yet
English 10 Weekly Learning Plan
3 pages
Strengthening Self-Discovery in Grade 9
No ratings yet
Strengthening Self-Discovery in Grade 9
4 pages
Shared Mental Models in Economics
No ratings yet
Shared Mental Models in Economics
19 pages
Guidance Counselor Tasks and Outcomes
No ratings yet
Guidance Counselor Tasks and Outcomes
2 pages
Action Plan for Teacher Development
100% (4)
Action Plan for Teacher Development
2 pages
SAP PM Implementation Tutorial Guide
No ratings yet
SAP PM Implementation Tutorial Guide
4 pages
Technology: Distraction vs. Benefit
No ratings yet
Technology: Distraction vs. Benefit
2 pages
Food and Beverage Services NC II Guide
No ratings yet
Food and Beverage Services NC II Guide
111 pages
Student Perspectives on Distance Learning
100% (2)
Student Perspectives on Distance Learning
3 pages
Understanding Curriculum and Objectives
No ratings yet
Understanding Curriculum and Objectives
4 pages
Short AI Literacy Test for Students
No ratings yet
Short AI Literacy Test for Students
23 pages
Monitoring Stress in Shelter Cats via Sound
No ratings yet
Monitoring Stress in Shelter Cats via Sound
3 pages
Louisiana Music Standards Overview
No ratings yet
Louisiana Music Standards Overview
20 pages
Sa Project Impact
No ratings yet
Sa Project Impact
4 pages
Understanding Language Production Stages
No ratings yet
Understanding Language Production Stages
9 pages
English Exam Rubric for Famous People
No ratings yet
English Exam Rubric for Famous People
1 page
Kendriya Vidyalaya AI Sample Paper 417
No ratings yet
Kendriya Vidyalaya AI Sample Paper 417
33 pages
Effective Teaching Methodologies Guide
67% (3)
Effective Teaching Methodologies Guide
61 pages
Evolution and Phylogenetic Trees
No ratings yet
Evolution and Phylogenetic Trees
3 pages
Corporate Communication Mastery Guide
No ratings yet
Corporate Communication Mastery Guide
3 pages
Psycholinguistics in Language Learning
No ratings yet
Psycholinguistics in Language Learning
7 pages
Class 10 AI Revision: Key Concepts & Models
No ratings yet
Class 10 AI Revision: Key Concepts & Models
1 page
The Patterns of Accessing Learning Management System Among Students PDF
No ratings yet
The Patterns of Accessing Learning Management System Among Students PDF
7 pages
Intro to Macbeth Lesson Plan ELA
No ratings yet
Intro to Macbeth Lesson Plan ELA
3 pages
Daily Lesson Log for Grade IV English
No ratings yet
Daily Lesson Log for Grade IV English
5 pages
Parent Counselling Report: Insights & Actions
No ratings yet
Parent Counselling Report: Insights & Actions
3 pages
Creating Engaging STEAM Lessons
No ratings yet
Creating Engaging STEAM Lessons
3 pages

Spelling Error Detection in NLP

Uploaded by

Spelling Error Detection in NLP

Uploaded by

NLP Applications

(Information Extraction, Named Entity

Is Chat GPT Support Free Writing ?

You might also like