0% found this document useful (0 votes)

11 views7 pages

Unit 2 NLP

The document discusses regular expressions (RE) as powerful tools for searching and matching strings in text, detailing operations like concatenation, union, and Kleene star. It also covers morphological parsing in NLP, explaining how words are analyzed into morphemes, and introduces finite state automata (FSA) for pattern recognition. Additionally, it addresses spelling error detection and correction techniques in NLP, providing methods and examples for improving text quality.

Uploaded by

anikethbhosale11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

Unit 2 NLP

Uploaded by

anikethbhosale11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit-2

Regular Expressions
Regular expressions (RE) are specialized patterns for searching and matching strings within
text. They provide a concise and powerful language to specify text search criteria. REs include
operations such as:
 Concatenation (e.g., a.b means character a followed by b)
 Union or alternation (e.g., a+b means a or b)
 Kleene star (e.g., a* meaning zero or more repetitions of a)
Mathematically, basic regular expressions include empty string (ε), empty language (φ), and
can be combined using union (+), concatenation (.), and Kleene star (*).
Regular expressions (REs) provide a formal and flexible way to specify search patterns in text.
In NLP, they are widely used for:
Text searching and pattern matching: Quickly find substrings matching specific syntactic
patterns.
Tokenization: Defining rules to split running text into words, sentences, or other tokens.
Information extraction: Detecting structured elements such as dates, numbers, emails, or
domain-specific entities.
Preprocessing and normalization: Identifying and transforming text snippets (e.g., removing
punctuation or matching variants).

Regular set to represent the following patterns

(i) (a+b): union of characters 'a' and 'b'
(ii) a.b : concatenation of characters 'a' followed by 'b'
(iii) a* : Kleene star representing zero or more repetitions of character 'a'

The regular sets corresponding to each given pattern:

(i) (a + b)
Regular set: {a, b}
This represents the union of the single-character strings a and b.
(ii) a.b
Regular set: {ab}
This represents the concatenation of 'a' followed by 'b', forming the string ab.
(iii) a*
Regular set: {ε, a, aa, aaa, …}
Unit-2

This represents zero or more repetitions of the character 'a', including the empty string ε.

Write the regular expression for the following languages

(i) The language accepts all the strings containing any numbers a’s and b’s
(ii) The language accepts all the strings which are starting with 1’s and end with 0’s
(iii) The language starts with “a” but not having consecutive b’s

Regular Expressions for the Given Languages

(i) Language accepting all strings containing any numbers, a's, and b's:
[0-9ab]*
(ii) language accepting all strings starting with 1's and ending with 0's:
 Strings start with one or more '1's
 End with one or more '0's
 Middle may be any sequence of '0's and '1's

Regular Expression: 1+[01]*0+

Where:
1+ means one or more '1's, [01]* means zero or more '0' or '1', and 0+ means one or
more '0's
(iii) Language starting with "a" but not having consecutive 'b's:
a(a|ba)*

Morphological parsing in the context of natural language processing like prefixes,

suffixes.
Morphological parsing in Natural Language Processing (NLP) is the computational process of
analyzing the structure of words to identify their smallest meaningful components called
morphemes. These components include roots (base forms), prefixes (affixes added to the
beginning of a word), and suffixes (affixes added to the end of a word).
The goal of morphological parsing is to break down words into these constituent morphemes
to understand their formation and meaning. For example, the word “unhappiness” can be parsed
into:
Prefix: "un-"
Root: "happy"
Suffix: "-ness"
This analysis enables NLP systems to recognize the relationship between words, such as
identifying that "happy" is related to "unhappy" and "happiness" despite different surface
forms.
Unit-2

Morphological parsing helps in several NLP tasks including text normalization, spelling
correction, machine translation, and information retrieval.

How Morphological Parsing Works

It uses morphological rules that describe how morphemes combine in a particular language.
Often implemented using finite-state transducers (FSTs) which are state machines that map
surface words to their morphemes.
Handles orthographic rules, such as changing “y” to “ies” for pluralization in English (e.g.,
“country” to “countries”).

Role of Prefixes and Suffixes

Prefixes attach to the front of a root word and modify the meaning, e.g., "un-" in "undo," "pre-
" in "preview."
Suffixes attach to the end of the root, often indicating grammatical properties, e.g., "-ed" for
past tense (walked), "-s" for plural (cats), "-ness" for nouns (happiness).

Finite State Automaton (FSA)

A Finite State Automaton (FSA), also known as a finite-state machine or finite automaton, is
a mathematical computational model used to recognize patterns and accept or reject input
strings based on a finite number of states.
Components of FSA
Unit-2

Description of Components:
States: Represent possible conditions or configurations the automaton can be in, depicted as
circles.
Input Alphabet: Symbols the automaton reads from input sequentially.
Start State: The state where computation begins, indicated by an arrow with no origin.
Transition Function: Defines how the FSA moves between states on input symbols.
Final/Accepting State(s): If the FSA ends in one after consuming all input, the input string is
accepted.

Constructing FSA for Regular Expression /abb|acb/

Unit-2
Unit-2

Spelling Error Detection and Correction in NLP

Spelling error detection identifies misspelled or incorrectly typed words in text, while spelling
correction suggests possible correct versions. Together, these processes improve text quality,
essential for downstream NLP tasks.
Common Techniques for Spelling Error Detection
 Dictionary Lookup: Compare each word against a known dictionary; words not found
are flagged as errors.
 N-gram Analysis: Use character n-grams to detect unlikely or rare sequences
suggestive of errors.
 Statistical Models: Identify words with low probability or unusual context based on
language corpora.
Common Techniques for Spelling Error Correction
 Edit Distance Based Methods: Calculate minimum edits (insertions, deletions,
substitutions, transpositions) needed to transform a misspelled word into dictionary
words. Words with minimal edit distance are correction candidates.
 Noisy Channel Model: Treat the observed misspelled word as a distorted form of a
true word. Use Bayesian inference to find the most probable original word given the
typo.
 Probabilistic Models: Estimate likelihood of corrections based on language and error
statistics.
 Neural Networks: Use deep learning models (e.g., sequence-to-sequence, contextual
models) to learn error patterns and contextual corrections.

Example Approach: Peter Norvig’s Spell Correction Algorithm
Generate candidate corrections by applying edits at an edit distance of one or two:
 Deletions (remove a character),
 Transpositions (swap adjacent characters),
Unit-2

 Replacements (replace a character),

 Insertions (add a character).
 Retain candidates that appear in a dictionary.
 Choose the candidate with the highest occurrence probability from a corpus.

Question Bank
1. Explain the role of regular expressions in Natural Language Processing (NLP). Write
the regular expression for the following languages
(i) The language accepts all the strings containing any numbers a’s and b’s
(ii) The language accepts all the strings which are starting with 1’s and end with 0’s
(iii) The language starts with “a” but not having consecutive b’s
2. Write regular set to represent the following patterns
(i) (a+b): union of characters 'a' and 'b'
(ii) a.b : concatenation of characters 'a' followed by 'b'
(iii) a* : Kleene star representing zero or more repetitions of character 'a'.
3. With a suitable examples, explain the morphological parsing in the context of natural
language processing like prefixes, suffixes.
4. Define a Finite State Automaton (FSA) and explain its components in detail. Construct
the transition diagram and develop the state transition table for the regular expression
/abb|acb/. Illustrate how the automaton processes input strings according to this
expression.
5. Construct the transition diagram and develop the state transition table for the regular
expression /(a|b)^*baa$/. Include details on states, transitions, and final state
conditions.
6. State spelling error detection and correction in Natural Language Processing. Explain
the common techniques used for detecting spelling errors and providing corrections.
Illustrate your answer with an example algorithm or approach commonly used in NLP
for this task.
7. Explain two step morphological parsing with a suitable examples.

NLP Module2
No ratings yet
NLP Module2
9 pages
Understanding Finite State Transducers
No ratings yet
Understanding Finite State Transducers
12 pages
Word Level Analysis and Regex Techniques
No ratings yet
Word Level Analysis and Regex Techniques
114 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
Morphology and Lemmatization in NLP
No ratings yet
Morphology and Lemmatization in NLP
31 pages
NLP Techniques: Interpolation & FSA
No ratings yet
NLP Techniques: Interpolation & FSA
65 pages
FSA and Regex in NLP Applications
No ratings yet
FSA and Regex in NLP Applications
16 pages
Regular Expressions & Finite-State Automata
No ratings yet
Regular Expressions & Finite-State Automata
29 pages
Show File-5
No ratings yet
Show File-5
3 pages
AL3501 NLP Unit 1 Part B Answers
No ratings yet
AL3501 NLP Unit 1 Part B Answers
11 pages
Regex and Finite-State Automata Guide
No ratings yet
Regex and Finite-State Automata Guide
53 pages
English Morphology in NLP Overview
No ratings yet
English Morphology in NLP Overview
15 pages
Grammar and Statistical Language Models
No ratings yet
Grammar and Statistical Language Models
15 pages
Understanding Regular Expressions and FSA
No ratings yet
Understanding Regular Expressions and FSA
69 pages
Key Concepts in Language Modeling
No ratings yet
Key Concepts in Language Modeling
48 pages
Module 2 - Notes
No ratings yet
Module 2 - Notes
46 pages
Word Level Morphological Analysis in NLP
No ratings yet
Word Level Morphological Analysis in NLP
49 pages
Word Level Analysis and Morphology
No ratings yet
Word Level Analysis and Morphology
78 pages
NLP Exam Answers Part 1
No ratings yet
NLP Exam Answers Part 1
42 pages
NLP Origins, Challenges, and Techniques
100% (1)
NLP Origins, Challenges, and Techniques
16 pages
Regular Expressions in NLP
No ratings yet
Regular Expressions in NLP
85 pages
NLP Origins, Challenges, and Techniques
No ratings yet
NLP Origins, Challenges, and Techniques
16 pages
Unit 1 NLP
No ratings yet
Unit 1 NLP
23 pages
Understanding Morphology in NLP
No ratings yet
Understanding Morphology in NLP
8 pages
Word Level Analysis in NLP
No ratings yet
Word Level Analysis in NLP
97 pages
NLP Word Level Analysis Techniques
No ratings yet
NLP Word Level Analysis Techniques
81 pages
NLP Basics: Tokenization, Stemming, Lemmatization
No ratings yet
NLP Basics: Tokenization, Stemming, Lemmatization
8 pages
FSA and NLP Applications Explained
No ratings yet
FSA and NLP Applications Explained
15 pages
N-Grams and NLP Techniques Explained
No ratings yet
N-Grams and NLP Techniques Explained
13 pages
FSA Design for English Nouns and Verbs
No ratings yet
FSA Design for English Nouns and Verbs
10 pages
NLP Word Level Analysis and Regex
No ratings yet
NLP Word Level Analysis and Regex
278 pages
NLP Origins, Challenges, and Models
No ratings yet
NLP Origins, Challenges, and Models
8 pages
NLP - Word Level Analysis
No ratings yet
NLP - Word Level Analysis
25 pages
Understanding NLP: Corpora & Regex Basics
No ratings yet
Understanding NLP: Corpora & Regex Basics
19 pages
Introduction to Language Modeling in NLP
No ratings yet
Introduction to Language Modeling in NLP
74 pages
Module2 NLP CP1
No ratings yet
Module2 NLP CP1
29 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
148 pages
Regular Expressions and Morphological Parsing
No ratings yet
Regular Expressions and Morphological Parsing
28 pages
Understanding Regular Expressions and FSAs
No ratings yet
Understanding Regular Expressions and FSAs
12 pages
NLP Shorts 3
No ratings yet
NLP Shorts 3
25 pages
Understanding Regular Expressions and FSAs
No ratings yet
Understanding Regular Expressions and FSAs
23 pages
NLP Word Level Analysis Notes
No ratings yet
NLP Word Level Analysis Notes
20 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
7 pages
NLP Syntax and Semantics Overview
No ratings yet
NLP Syntax and Semantics Overview
48 pages
Speech and NLP Assignment Questions
No ratings yet
Speech and NLP Assignment Questions
8 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
12 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
19 pages
NLP Foundations and Techniques Overview
No ratings yet
NLP Foundations and Techniques Overview
106 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
26 pages
N-Grams and Morphological Analysis in NLP
No ratings yet
N-Grams and Morphological Analysis in NLP
15 pages
Overview of Language Models in NLP
No ratings yet
Overview of Language Models in NLP
28 pages
NLP Applications and Components Overview
No ratings yet
NLP Applications and Components Overview
19 pages
NNN
No ratings yet
NNN
51 pages
Morphological Analysis of Words
No ratings yet
Morphological Analysis of Words
35 pages
NLP Syllabus Overview and Key Concepts
No ratings yet
NLP Syllabus Overview and Key Concepts
9 pages
Regular Expressions and FSA Overview
No ratings yet
Regular Expressions and FSA Overview
69 pages
Module2 NLP BAD613B Notes
No ratings yet
Module2 NLP BAD613B Notes
49 pages
A2 Textiles Research Plan by Matthew Xiao
No ratings yet
A2 Textiles Research Plan by Matthew Xiao
5 pages
ECDIS Familiarisation Guidelines
No ratings yet
ECDIS Familiarisation Guidelines
8 pages
Evolutionary Psychology and Genetics Overview
No ratings yet
Evolutionary Psychology and Genetics Overview
32 pages
Introduction to Vehicular IoT Systems
No ratings yet
Introduction to Vehicular IoT Systems
15 pages
Analyzing Historical Significance in Texts
No ratings yet
Analyzing Historical Significance in Texts
4 pages
Glasser's Reality Therapy Overview
100% (2)
Glasser's Reality Therapy Overview
12 pages
IEP Compliance Verification Form
No ratings yet
IEP Compliance Verification Form
2 pages
Embedded System Design & Modeling Techniques
No ratings yet
Embedded System Design & Modeling Techniques
29 pages
JEE Advanced Physics MCQs and Solutions
No ratings yet
JEE Advanced Physics MCQs and Solutions
3 pages
Types of Welding Techniques Explained
No ratings yet
Types of Welding Techniques Explained
4 pages
Consignment Handling in SD
No ratings yet
Consignment Handling in SD
25 pages
IT Project Risk Management Strategies
No ratings yet
IT Project Risk Management Strategies
47 pages
Pardon Campaign for Helen Duncan
No ratings yet
Pardon Campaign for Helen Duncan
7 pages
Ludovico Einaudi Piano Sheet Music
No ratings yet
Ludovico Einaudi Piano Sheet Music
9 pages
Emotional Intelligence in Entrepreneurs
No ratings yet
Emotional Intelligence in Entrepreneurs
6 pages
Edible Candy Wrapper Research Study
100% (1)
Edible Candy Wrapper Research Study
60 pages
Volvo Excavators: Performance & Specs
No ratings yet
Volvo Excavators: Performance & Specs
8 pages
Cold Mix Asphalt in Road Construction
No ratings yet
Cold Mix Asphalt in Road Construction
26 pages
Class 11 Business Studies Revision Worksheet
No ratings yet
Class 11 Business Studies Revision Worksheet
2 pages
Mariani College Online Registration 2024-25
No ratings yet
Mariani College Online Registration 2024-25
2 pages
Nursing Legacy: Improving Care Outcomes
No ratings yet
Nursing Legacy: Improving Care Outcomes
6 pages
Types of Software Explained
No ratings yet
Types of Software Explained
30 pages
Busega District Form Four History Exam
No ratings yet
Busega District Form Four History Exam
4 pages
Choosing Yarn for Amigurumi Projects
No ratings yet
Choosing Yarn for Amigurumi Projects
5 pages
Blackpink's Impact on Luxury Fashion
No ratings yet
Blackpink's Impact on Luxury Fashion
13 pages
Paul's Conversion in Acts 9 Explained
No ratings yet
Paul's Conversion in Acts 9 Explained
4 pages
Legacy of Karumuttu Thiagaraja Chettiar
No ratings yet
Legacy of Karumuttu Thiagaraja Chettiar
7 pages
QC Manager CV Approval for Bala Murugesh
No ratings yet
QC Manager CV Approval for Bala Murugesh
6 pages
Ruminant Feeding Behavior Insights
No ratings yet
Ruminant Feeding Behavior Insights
15 pages
PNOZ X2.1 Emergency Stop Relay Guide
No ratings yet
PNOZ X2.1 Emergency Stop Relay Guide
3 pages

Unit 2 NLP

Uploaded by

Unit 2 NLP

Uploaded by

Unit-2

Regular set to represent the following patterns

The regular sets corresponding to each given pattern:

Write the regular expression for the following languages

Regular Expressions for the Given Languages

Regular Expression: 1+[01]*0+

Morphological parsing in the context of natural language processing like prefixes,

How Morphological Parsing Works

Role of Prefixes and Suffixes

Finite State Automaton (FSA)

Constructing FSA for Regular Expression /abb|acb/

Spelling Error Detection and Correction in NLP

 Replacements (replace a character),

You might also like