0% found this document useful (0 votes)

70 views7 pages

Naive Bayes for Word Sense Disambiguation

Uploaded by

Kranti Gajmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views7 pages

Naive Bayes for Word Sense Disambiguation

Uploaded by

Kranti Gajmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

WSD SUPERVISED LEARNING

Naive Biased algorithm in NLP

 A supervised Naive Bayes algorithm for Word Sense Disambiguation (WSD) classifies
the correct meaning of a word in a given context by calculating the probability of each
possible sense.
 It relies on Bayes' theorem and the "naive" assumption that features used for
classification are independent of each other.

 A "naive biased algorithm" for Word Sense Disambiguation (WSD) is not a standard
term in Natural Language Processing (NLP), but it refers to the application of the Naive
Bayes algorithm in a way that incorporates a specific form of bias.

 The algorithm's "naive" assumption is that all contextual features (surrounding words)
are conditionally independent, and the "bias" can be introduced by leveraging prior
knowledge, such as the most frequent sense (MFS) heuristic.

How Naive Bayes works for WSD

The core task of WSD is to assign the correct meaning, or sense, to an ambiguous
word in a given context. Naive Bayes accomplishes this by treating it as a
classification problem, where the features are the words surrounding the ambiguous
word and the classes are the possible senses.

The process involves these steps:

 Feature Extraction:

For a target ambiguous word, features are extracted from its surrounding
context. These features can include:
Collocation features: Words, their part-of-speech (POS) tags, or other lexical
information at specific positions relative to the target word (e.g., the word
immediately to the left, the POS tag of the word two positions to the right).

Bag-of-words features: The presence or frequency of words within a defined

window around the target word, without regard to their specific position.
 Training Phase (Supervised WSD):

A sense-tagged corpus is used, where instances of ambiguous words are manually

labeled with their correct sense.

For each sense of a target word, the Naive Bayes model learns the conditional
probability of observing each feature given that sense. This involves calculating the
frequency of features associated with each sense.
 Disambiguation Phase:
When presented with a new instance of an ambiguous word in a sentence, the
algorithm extracts the relevant features.
It then calculates the posterior probability of each possible sense given the observed
features using Bayes' theorem:

Advantages and disadvantages for WSD using Naïve Bised

Example
Using the word "bank" as an example, here is how a Naive Bayes algorithm would perform
Word Sense Disambiguation (WSD)
.
1. Training phase
The algorithm must first be trained on a corpus of text where the ambiguous word's senses have
already been manually tagged. Suppose the word "bank" has two senses:

 Sense 1: Financial Institution (FI)

 Sense 2: River Bank (RB)

The training data would look like this:

 "I went to the bank to deposit money." (Sense: FI)

 "The fisherman sat on the river bank." (Sense: RB)
 "My savings account is at the local bank." (Sense: FI)
 "The boat ran aground on the muddy bank." (Sense: RB)

From this data, the algorithm learns the following probabilities:

 Prior probabilities: The overall likelihood of each sense.

o P(FI) = (2 financial examples) / (4 total examples) = 0.5
o P(RB) = (2 river examples) / (4 total examples) = 0.5
 Likelihoods: The probability of context words appearing with each sense. A "Bag of
Words" model is used, which means the position of the words is ignored.

The algorithm calculates the conditional probability of each surrounding word for each sense:

2. Disambiguation (classification) phase

Now, consider a new, unseen sentence: "The money was deposited in the bank."
The algorithm needs to determine if "bank" in this sentence means "financial institution" or
"river bank".

Step 1: Identify context words

The context words (features) surrounding "bank" are "money," "deposited," and "in."

Step 2: Calculate posterior probability for each sense

Using Bayes' theorem, the algorithm calculates the probability of each sense given the context
words. To handle the "naive" assumption of feature independence, it multiplies the
probabilities of the individual context words.
Step 3: Apply the probabilities

Assuming for simplicity that the prior probabilities are equal (as they were in the training data),
we can focus on the likelihoods:

Step 4: Compare and classify

 Probability (FI) > 0
 Probability (RB) = 0

The algorithm will choose Sense 1 (Financial Institution) as the correct sense because it has
the highest probability.
DECISION LIST

Supervised word sense disambiguation (WSD) using decision lists is a machine learning
approach that creates an ordered list of weighted "if-then-else" rules to determine the correct
meaning of a word in context. It is particularly effective for words with a limited number of
distinct meanings.

 Word Sense Disambiguation (WSD): A task in natural language processing (NLP)

that identifies the correct meaning (or sense) of a word in a specific context. For
example, determining if the word "bank" in "river bank" refers to a financial institution
or the side of a river.
 Supervised Learning: This approach requires a pre-labeled training dataset, known as
a sense-tagged corpus, where each instance of an ambiguous word is manually tagged
with its correct sense. The algorithm learns a classifier from this data.
 Decision Lists: A decision list is a sequence of rules, each of which has a
corresponding classification (or sense) and a confidence score. The rules are ordered
from most to least reliable. To classify a new instance, the system iterates through the
list and applies the first rule that matches the features of the input.

How decision lists work for WSD

1. Feature extraction: Before training, features are extracted from the sense-tagged
training data. These features represent the context surrounding the ambiguous word and
can include:
o Collocation features: The specific words that appear immediately to the left or
right of the target word.
o Bag-of-words features: The words that appear within a fixed-size window
around the target word, without regard to their position.
o Syntactic features: Part-of-speech (POS) tags of the surrounding words.
Example: Disambiguating the word "bass"
Consider the ambiguous word "bass," which can refer to a type of fish or the
sound/instrument.

 Sense 1: fish
 Sense 2: musical instrument/sound

A sense-tagged corpus would contain sentences like:

 "I caught a large bass." (Sense 1)

 "He plays the bass guitar in a band." (Sense 2)

More Rules

 Rule 1 (high confidence): IF "guitar" is in the context THEN assign Sense 2

 Rule 2 (medium confidence): IF "caught" is in the context THEN assign Sense 1
 Rule 3 (medium confidence): IF "band" is in the context THEN assign Sense 2
 Rule 4 (low confidence): IF the word two positions to the left is "large" THEN
assign Sense 1
 Default rule (fallback): ASSIGN Most Frequent Sense

Applying the decision list:

 Input sentence: "The bass sounded terrible."

 The system checks Rule 1 for "guitar." No match.
 It checks Rule 2 for "caught." No match.
 It checks Rule 3 for "band." No match.
 The system continues until it applies a rule or the default.

In practice, a more powerful feature might be "sounded," which would lead to a rule like IF
"sounded" is in the context THEN assign Sense 2, resulting in a more accurate
classification.
Advantages:

 Simplicity and interpretability: The rule-based nature of decision lists makes them
easy to understand and debug a key advantage over more complex "black box" models.
 Effectiveness with specific features: For many WSD problems, a few very strong
contextual features can provide accurate disambiguation. Decision lists are optimized to
exploit these features by placing the most confident rules at the top.
 Scalability: Learning a decision list is computationally efficient and can handle a large
number of features without the data sparseness issues of methods like Naive Bayes.

Disadvantages:

 Knowledge acquisition bottleneck: Creating the large sense-tagged corpora required

for training supervised models is time-consuming and expensive.
 Limited expressiveness: The sequential "if-then" structure may not capture complex
interactions between multiple features as effectively as other machine learning models.
 Manual feature engineering: The quality of the model heavily depends on the manual
selection and engineering of relevant contextual features.

Yarowsky Algorithm for WSD Explained
No ratings yet
Yarowsky Algorithm for WSD Explained
5 pages
Lesk Algorithm for Word Sense Disambiguation
No ratings yet
Lesk Algorithm for Word Sense Disambiguation
4 pages
FSA Design for English Nouns and Verbs
No ratings yet
FSA Design for English Nouns and Verbs
10 pages
Understanding Parameter Estimation in NLP
No ratings yet
Understanding Parameter Estimation in NLP
12 pages
Information Retrieval Systems Overview
No ratings yet
Information Retrieval Systems Overview
35 pages
Chart Parsing Techniques in NLP
No ratings yet
Chart Parsing Techniques in NLP
5 pages
Backtracking vs Branch and Bound Techniques
No ratings yet
Backtracking vs Branch and Bound Techniques
21 pages
NLTK and spaCy Installation Guide
No ratings yet
NLTK and spaCy Installation Guide
33 pages
Understanding Parse Trees in Grammar
No ratings yet
Understanding Parse Trees in Grammar
45 pages
Dendrogram in Hierarchical Clustering
No ratings yet
Dendrogram in Hierarchical Clustering
50 pages
Technical NLP U3-6
No ratings yet
Technical NLP U3-6
83 pages
NLP Word Level Analysis Notes
No ratings yet
NLP Word Level Analysis Notes
20 pages
Understanding Big Data Analytics
No ratings yet
Understanding Big Data Analytics
29 pages
Clustering Validation Techniques Explained
No ratings yet
Clustering Validation Techniques Explained
4 pages
Election Algorithms in Distributed Systems
No ratings yet
Election Algorithms in Distributed Systems
22 pages
NLP Syntax and Semantics Overview
No ratings yet
NLP Syntax and Semantics Overview
48 pages
Nature Inspired Computing - Unit-4
No ratings yet
Nature Inspired Computing - Unit-4
28 pages
NLP: Stages, Ambiguities, and Applications
No ratings yet
NLP: Stages, Ambiguities, and Applications
10 pages
NLP Chapter-1
No ratings yet
NLP Chapter-1
24 pages
Shift-Reduce Parsing in NLP
No ratings yet
Shift-Reduce Parsing in NLP
85 pages
Grammar-Based Language Modeling in NLP
No ratings yet
Grammar-Based Language Modeling in NLP
17 pages
Lecture 8 N Gram Numerical
No ratings yet
Lecture 8 N Gram Numerical
5 pages
Deep Neural Networks: Overview & Applications
No ratings yet
Deep Neural Networks: Overview & Applications
24 pages
1 An Introduction To Rough Set Theory and Its Applic
No ratings yet
1 An Introduction To Rough Set Theory and Its Applic
40 pages
Ambiguity Resolution in NLP
No ratings yet
Ambiguity Resolution in NLP
15 pages
Chapter V - Working With Text Data
No ratings yet
Chapter V - Working With Text Data
30 pages
NLP Question Bank by Binayak Bartaula
No ratings yet
NLP Question Bank by Binayak Bartaula
50 pages
NLP Word-Level Analysis and Techniques
No ratings yet
NLP Word-Level Analysis and Techniques
22 pages
DAA Exam Paper for BCA IV Semester
No ratings yet
DAA Exam Paper for BCA IV Semester
17 pages
Types of Tagging in Linguistics
No ratings yet
Types of Tagging in Linguistics
3 pages
Effective Subset Selection in Data Analytics
No ratings yet
Effective Subset Selection in Data Analytics
11 pages
Human Voice Pattern Recognition Case Study
No ratings yet
Human Voice Pattern Recognition Case Study
5 pages
Understanding POS Tagging in NLP
No ratings yet
Understanding POS Tagging in NLP
9 pages
Semantic Parsing in NLP Explained
No ratings yet
Semantic Parsing in NLP Explained
38 pages
Spam Detection Datasets Overview
No ratings yet
Spam Detection Datasets Overview
50 pages
Data Science Overview and Applications
No ratings yet
Data Science Overview and Applications
25 pages
Understanding Computational Graphs in DL
No ratings yet
Understanding Computational Graphs in DL
3 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
Understanding Word2Vec in NLP
No ratings yet
Understanding Word2Vec in NLP
38 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
34 pages
Stemming vs. Lemmatization Explained
No ratings yet
Stemming vs. Lemmatization Explained
31 pages
Data Mining and Warehouse Overview
No ratings yet
Data Mining and Warehouse Overview
26 pages
Introduction to Randomized Algorithms
No ratings yet
Introduction to Randomized Algorithms
18 pages
Hashing Techniques in Algorithm Analysis
No ratings yet
Hashing Techniques in Algorithm Analysis
53 pages
NLP Question Bank for ODD SEM 2023-24
No ratings yet
NLP Question Bank for ODD SEM 2023-24
3 pages
Tokenization in Natural Language Processing
No ratings yet
Tokenization in Natural Language Processing
179 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
51 pages
Spelling Correction in NLP Overview
No ratings yet
Spelling Correction in NLP Overview
9 pages
Understanding Information Networks and the Web
No ratings yet
Understanding Information Networks and the Web
37 pages
Anaphora Resolution with Hobbs Algorithm
No ratings yet
Anaphora Resolution with Hobbs Algorithm
23 pages
Advanced Grammar in NLP Systems
No ratings yet
Advanced Grammar in NLP Systems
6 pages
DGIM Algorithm for Counting 1's
No ratings yet
DGIM Algorithm for Counting 1's
5 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
13 pages
Information Retrieval and Lexical Models
No ratings yet
Information Retrieval and Lexical Models
37 pages
Pattern Recognition...
No ratings yet
Pattern Recognition...
21 pages
Word and Syntactic Analysis in NLP
100% (1)
Word and Syntactic Analysis in NLP
16 pages
Advanced Techniques in Association Rules
No ratings yet
Advanced Techniques in Association Rules
18 pages
Tagged Corpora and NLP Tagging Methods
No ratings yet
Tagged Corpora and NLP Tagging Methods
42 pages
Word Sense Disambiguation in NLP
No ratings yet
Word Sense Disambiguation in NLP
9 pages
Word Sense Disambiguation in NLP
No ratings yet
Word Sense Disambiguation in NLP
46 pages
AWN Chapter3 Routing Protocols (Network Layer)
No ratings yet
AWN Chapter3 Routing Protocols (Network Layer)
43 pages
CSC-501 TCS Notes Module 1
No ratings yet
CSC-501 TCS Notes Module 1
25 pages
Power and Limitation of TM
No ratings yet
Power and Limitation of TM
2 pages
Distributed Systems Lab Manual
No ratings yet
Distributed Systems Lab Manual
38 pages
Types of Queues in Data Structures
No ratings yet
Types of Queues in Data Structures
23 pages
Mumbai University Data Structures Solutions
No ratings yet
Mumbai University Data Structures Solutions
20 pages
Business Development Executive Job in Ahmedabad
No ratings yet
Business Development Executive Job in Ahmedabad
2 pages
WAIS-IV IQ Scores and MHC Stages
No ratings yet
WAIS-IV IQ Scores and MHC Stages
25 pages
Active vs. Passive Learning Perceptions
No ratings yet
Active vs. Passive Learning Perceptions
7 pages
K to 12 Curriculum Overview Philippines
No ratings yet
K to 12 Curriculum Overview Philippines
48 pages
Conic Sections Lesson Plan Activities
No ratings yet
Conic Sections Lesson Plan Activities
2 pages
Nursing Practice Teaching Evaluation Form
No ratings yet
Nursing Practice Teaching Evaluation Form
2 pages
Sensory Painting Activities for Toddlers
No ratings yet
Sensory Painting Activities for Toddlers
5 pages
Coursera User Engagement and Retention Analysis
No ratings yet
Coursera User Engagement and Retention Analysis
20 pages
Marking Criteria for Guidance & Counseling
No ratings yet
Marking Criteria for Guidance & Counseling
4 pages
Gratitude Letter for NMIMS Trial Advocacy
No ratings yet
Gratitude Letter for NMIMS Trial Advocacy
1 page
Spreadsheet Creation and Testing Guide
No ratings yet
Spreadsheet Creation and Testing Guide
10 pages
Senior High School: Practical Research 2 - Grade 12
No ratings yet
Senior High School: Practical Research 2 - Grade 12
24 pages
Handling Breeders in Swine Production
No ratings yet
Handling Breeders in Swine Production
2 pages
Effective Ticket Sales for Gamification
67% (3)
Effective Ticket Sales for Gamification
17 pages
Conference Template A4
No ratings yet
Conference Template A4
9 pages
Stephanie Yorke: Gifted Education Expert
No ratings yet
Stephanie Yorke: Gifted Education Expert
1 page
AI Masterclass: Prompt Engineering & Apps
No ratings yet
AI Masterclass: Prompt Engineering & Apps
10 pages
Makerere University Project Management Diploma
No ratings yet
Makerere University Project Management Diploma
56 pages
Field Study Report: Classroom Observation
No ratings yet
Field Study Report: Classroom Observation
1 page
Saskatchewan Needs-Based Education Model
No ratings yet
Saskatchewan Needs-Based Education Model
16 pages
Assertiveness Training Icebreakers, Energisers and Short Exercises
No ratings yet
Assertiveness Training Icebreakers, Energisers and Short Exercises
1 page
Ethics in Student Assessment Practices
No ratings yet
Ethics in Student Assessment Practices
7 pages
Class 3 Maths Annual Lesson Plan
No ratings yet
Class 3 Maths Annual Lesson Plan
5 pages
Principles of Authentic Educational Assessment
No ratings yet
Principles of Authentic Educational Assessment
3 pages
Psychometric Testing for Student Development
67% (3)
Psychometric Testing for Student Development
25 pages
Understanding Perennialism in Education
No ratings yet
Understanding Perennialism in Education
9 pages
Dance Teaching Methods and Cognitive Models
No ratings yet
Dance Teaching Methods and Cognitive Models
15 pages
2nd Grade Water Cycle Lesson Plan
No ratings yet
2nd Grade Water Cycle Lesson Plan
5 pages
Formal E-mail Writing Guide for Students
No ratings yet
Formal E-mail Writing Guide for Students
5 pages
Communication Process Lesson Plan
No ratings yet
Communication Process Lesson Plan
4 pages

Naive Bayes for Word Sense Disambiguation

Uploaded by

Naive Bayes for Word Sense Disambiguation

Uploaded by

WSD SUPERVISED LEARNING

Naive Biased algorithm in NLP

How Naive Bayes works for WSD

The process involves these steps:

Bag-of-words features: The presence or frequency of words within a defined

A sense-tagged corpus is used, where instances of ambiguous words are manually

Advantages and disadvantages for WSD using Naïve Bised

 Sense 1: Financial Institution (FI)

The training data would look like this:

 "I went to the bank to deposit money." (Sense: FI)

From this data, the algorithm learns the following probabilities:

 Prior probabilities: The overall likelihood of each sense.

2. Disambiguation (classification) phase

Step 1: Identify context words

Step 2: Calculate posterior probability for each sense

Step 4: Compare and classify

 Word Sense Disambiguation (WSD): A task in natural language processing (NLP)

How decision lists work for WSD

A sense-tagged corpus would contain sentences like:

 "I caught a large bass." (Sense 1)

 Rule 1 (high confidence): IF "guitar" is in the context THEN assign Sense 2

Applying the decision list:

 Input sentence: "The bass sounded terrible."

 Knowledge acquisition bottleneck: Creating the large sense-tagged corpora required

You might also like