0% found this document useful (0 votes)

7 views28 pages

Understanding Natural Language Processing

Natural Language Processing (NLP) is a subfield of Artificial Intelligence focused on enabling computers to understand and generate human languages. Language modeling, including techniques like n-grams and smoothing methods, plays a crucial role in applications such as speech recognition and POS tagging. The document also discusses the use of Hidden Markov Models for automating part-of-speech tagging through statistical methods.

Uploaded by

62adsrtyuiop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views28 pages

Understanding Natural Language Processing

Uploaded by

62adsrtyuiop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

What is NLP ?

• NLP is Natural Language Processing.

Natural languages are those spoken by people.

• NLP encompasses anything a computer needs to understand natural

language (typed or spoken) and also generate the natural language.

• Natural Language Processing (NLP) is a subfield of Artificial intelligence

and linguistic, devoted to make computers "understand" statements
written in human languages.
Language modeling :
Language modeling is the way of determining the probability of
any sequence of words. Language modeling is used in a wide
variety of applications such as Speech Recognition, Spam
filtering, information extraction, prompt generarion etc. In fact,
language modeling (that’s why LLM) is the key aim behind the
implementation of many state-of-the-art Natural Language
Processing models.

N-grams are contiguous sequences of items that are collected

from a sequence of text or speech corpus or almost any type of
data. The n in n-grams specify the size of a number of items to
consider, unigram for n =1, bigram for n = 2, and trigram for n = 3,
and so on. n-gram and n-gram models are widely used in
probability, communication theory, computational linguistics like
statistical natural language processing), computational biology etc.
Sometimes it is also regarded as “bag of words”
Methods of Language Modelings:

Two types of Language Modelings:

Statistical Language Modelings: Statistical Language Modeling,

or Language Modeling, is the development of probabilistic models
that are able to predict the next word in the sequence given the
words that precede. Examples such as N-gram language modeling.

Neural Language Modelings: Neural network methods are

achieving better results than classical methods both on standalone
language models and when models are incorporated into larger
models on challenging tasks like speech recognition and machine
translation. A way of performing a neural language model is
through word embeddings.
N-Grams
Probability of a sentence can be calculated by the probability of sequence of
words occurring in it.

We can use Markov assumption, that the probability of a word in a sentence
depends on the probability of the word occurring just before it.

Such a model is called first order Markov model or the bigram model.

Here, Wn refers to the word token corresponding to the nth word in a sequence.
A combination of words forms a sentence. However, such a formation
is meaningful only when the words are arranged in some order.
Ex: Sit I car in the.

Such a sentence is not grammatically acceptable. However some

perfectly grammatically correct sentences can be nonsensical too!
Eg: Colorless green ideas sleep furiously.

One easy way to handle such unacceptable sentences is by assigning

probabilities to the strings of words i.e, how likely the sentence is in
that particular form.
Probability of a sentence
If we consider each word occurring in its correct location as an
independent event, the probability of the sentences is :

Sentence Structure: P(w(1), w(2)..., w(n-1), w(n))

Using chain rule: = P(w(1)) * P(w(2) | w(1)) * P(w(3) | w(1)w(2)) ... P(w(n) | w(1)w(2) ... w(n-1))
Bigrams

We can avoid this very long calculation by approximating that the

probability of a given word depends only on the probability of its
previous words.

This assumption is called Markov assumption and such a model is

called Markov model- bigrams.

Bigrams can be generalized to the n-gram which looks at (n-1)

words in the past.

A bigram is a first-order Markov model.

Therefore , P(w(1), w(2)..., w(n-1), w(n)) = P(w(2)|w(1)) P(w(3)|w(2)) ... P(w(n)|w(n-1))

We use (eos) tag to mark the beginning and end of a sentence.
A bigram table for a given corpus can be generated and used as a
lookup table for calculating probability of sentences.

Eg: Corpus - (eos) You book a flight (eos) I read a book (eos) You read (eos)
N-Grams Smoothing
One major problem with standard N-gram models is that they
must be trained from some corpus, and because any particular
training corpus is finite, some perfectly acceptable N-grams are
bound to be missing from it.

We can see that bigram matrix for any given training corpus is
sparse. There are large number of cases with zero probabilty
bigrams and that should really have some non-zero probability.

This method tend to underestimate the probability of strings that

happen not to have occurred nearby in their training corpus.

There are some techniques that can be used for assigning a non-
zero probabilty to these 'zero probability bigrams'. This task of
reevaluating some of the zero-probability and low-probabilty N-
grams, and assigning them non-zero values, is called smoothing.
This task of reevaluating some of the zero-probability and low-
probability N-grams, and assigning them non-zero values, is called
smoothing.

Some of the techniques are:

1. Add-One Smoothing,

2. Witten-Bell Discounting,

3. Good-Turing Discounting.

Add-One Smoothing
In Add-One smoothing, we add one to all the bigram counts before
normalizing them into probabilities. This is called add-one
smoothing.
Application on unigrams
The unsmoothed maximum likelihood estimate of the unigram
probability can be computed by dividing the count of the word by
the total number of word tokens N.

P(wx) = c(wx)/sumi{c(wi)} = c(wx)/N

Let there be an adjusted count c.

ci = (c i+1 * N/(N+V))
where where V is the total number of word types in the language.

Now, probabilities can be calculated by normalizing counts by N.

pi* = (c i+1)/(N+V)
Application on bigrams

Normal bigram probabilities are computed by normalizing each row

of counts by the unigram count:

P(wn|wn-1) = C(wn-1wn)/C(wn-1)

For add-one smoothed bigram counts we need to augment the

unigram count by the number of total word types in the vocabulary

V: p*(wn|wn-1) = ( C(wn-1wn)+1 )/( C(wn-1)+V )

POS Tagging - Hidden Markov Model

POS tagging or part-of-speech tagging is the procedure of

assigning a grammatical category like noun, verb, adjective etc. to
a word.
In this process both the lexical information and the context play an
important role as the same lexical form can behave differently in a
different context.

For example the word "Park" can have two different lexical
categories based on the context.

The boy is playing in the park. ('Park' is Noun)

Park the car. ('Park' is Verb)

Assigning part of speech to words by hand is a common exercise
one can find in an elementary grammar class.

But here we wish to build an automated tool which can assign the
appropriate part-of-speech tag to the words of a given sentence.
One can think of creating hand crafted rules by observing patterns
in the language, but this would limit the system's performance to
the quality and number of patterns identified by the rule crafter.

Thus, this approach is not practically adopted for building POS

Tagger. Instead, a large corpus annotated with correct POS tags for
each word is given to the computer and algorithms then learn the
patterns automatically from the data and store them in form of a
trained model.

Later this model can be used to POS tag new sentences.

A Hidden Markov Model (HMM) is a statistical Markov model in
which the system being modeled is assumed to be a Markov process
with unobserved (hidden) states. In a regular Markov model the state is
directly visible to the observer, and therefore the state transition
probabilities are the only parameters. In a hidden Markov model, the
state is not directly visible, but output, dependent on the state, is
visible.

Hidden Markov Model has two important components-

1)Transition Probabilities: The one-step transition probability is the

probability of transitioning from one state to another in a single step.

2)Emission Probabilties: : The output probabilities for an observation

from state. Emission probabilities B = { bi,k = bi(ok) = P(ok | qi) },
where ok is an Observation. Informally, B is the probability that the
output is ok given that the current state is qi
For POS tagging, it is assumed that POS are generated as random process, and each process
randomly generates a word.
Hence, transition matrix denotes the transition probability from one POS to another and
emission matrix denotes the probability that a given word can have a particular POS. Word
acts as the observations.
Calculating the Probabilities

Consider the given corpus

EOS/eos They/pronoun cut/verb the/determiner paper/noun

EOS/eos He/pronoun asked/verb for/preposition his/pronoun
cut/noun. EOS/eos Put/verb the/determiner paper/noun
in/preposition the/determiner cut/noun EOS/eos
Calculating Emission Probability Matrix
Count the no. of times a specific word occus with a specific POS tag in the corpus.
Here, say for "cut“

count(cut,verb)=1
count(cut,noun)=2
count(cut,determiner)=0
and so on zero for other tags too.

count(cut) = total count of cut = 3

Now, calculating the probability

Probability to be filled in the matrix cell at the intersection of cut and verb

P(cut/verb)=count(cut,verb)/count(cut)=1/3=0.33

Similarly, Probability to be filled in the cell at he intersection of cut and determiner

P(cut/determiner)=count(cut,determiner)/count(cut)=0/3=0

Calculate for P(cut/noun)?

Calculating Transition Probability Matrix
Count the no. of times a specific tag comes after other POS tags in the corpus.
Here, say for "determiner“

count(verb,determiner)=2
count(preposition,determiner)=1
count(determiner,determiner)=0
count(eos,determiner)=0
count(noun,determiner)=0

and so on zero for other tags too.

count(determiner) = total count of tag 'determiner' = 3

Now, calculating the probability Probability to be filled in the cell at he intersection of

determiner(in the column) and verb(in the row)
P(determiner/verb)=count(verb,determiner)/count(determiner)=2/3=0.66 Similarly,

Probability to be filled in the cell at he intersection of determiner(in the column) and

noun(in the row)
P(determiner/noun)=count(noun,determiner)/count(determiner)=0/3=0

Repeat the same for all the tags

Language Modeling in NLP Techniques
No ratings yet
Language Modeling in NLP Techniques
65 pages
NLP with Probabilistic Models Overview
No ratings yet
NLP with Probabilistic Models Overview
36 pages
N-gram Models and POS Tagging in NLP
No ratings yet
N-gram Models and POS Tagging in NLP
80 pages
Statistical Ambiguity Resolution Techniques
No ratings yet
Statistical Ambiguity Resolution Techniques
13 pages
Understanding N-gram Language Models
No ratings yet
Understanding N-gram Language Models
3 pages
N-Gram Models and Smoothing Techniques
No ratings yet
N-Gram Models and Smoothing Techniques
7 pages
Understanding Language Modeling Techniques
No ratings yet
Understanding Language Modeling Techniques
15 pages
Understanding Language Models & N-Grams
No ratings yet
Understanding Language Models & N-Grams
48 pages
Language Models: N-grams & Markov Models
No ratings yet
Language Models: N-grams & Markov Models
26 pages
Unit - 2 NLP
No ratings yet
Unit - 2 NLP
15 pages
Challenges in Natural Language Processing
No ratings yet
Challenges in Natural Language Processing
12 pages
Understanding N-Gram Language Models
No ratings yet
Understanding N-Gram Language Models
13 pages
Techniques for POS Tagging
No ratings yet
Techniques for POS Tagging
12 pages
N-gram Language Models Explained
No ratings yet
N-gram Language Models Explained
28 pages
Understanding N-gram Language Models
No ratings yet
Understanding N-gram Language Models
33 pages
Statistical NLP Techniques Overview
No ratings yet
Statistical NLP Techniques Overview
43 pages
Understanding Language Models in NLP
No ratings yet
Understanding Language Models in NLP
23 pages
Semester 2 Handbook: NLP Overview
No ratings yet
Semester 2 Handbook: NLP Overview
122 pages
Unsmoothed N-grams in NLP Models
100% (1)
Unsmoothed N-grams in NLP Models
6 pages
Understanding Language Models and HMMs
No ratings yet
Understanding Language Models and HMMs
34 pages
N-grams and Word Classes in NLP
No ratings yet
N-grams and Word Classes in NLP
7 pages
Overview of Language Models in NLP
No ratings yet
Overview of Language Models in NLP
28 pages
N-Gram Models in NLP
No ratings yet
N-Gram Models in NLP
23 pages
NLP Unit-2
No ratings yet
NLP Unit-2
12 pages
Unsmoothed N-grams in NLP Analysis
No ratings yet
Unsmoothed N-grams in NLP Analysis
21 pages
NLP Foundations and Word Embeddings
No ratings yet
NLP Foundations and Word Embeddings
10 pages
Understanding Sequence Labeling in NLP
No ratings yet
Understanding Sequence Labeling in NLP
63 pages
Language Models: Types and Evaluation
No ratings yet
Language Models: Types and Evaluation
30 pages
Ai CHBX1 2
No ratings yet
Ai CHBX1 2
60 pages
Understanding Bigram Perplexity in NLP
No ratings yet
Understanding Bigram Perplexity in NLP
22 pages
Word Level Analysis in NLP Techniques
No ratings yet
Word Level Analysis in NLP Techniques
14 pages
N-grams Language Model in NLP
No ratings yet
N-grams Language Model in NLP
33 pages
N-grams and Markov Models in NLP
No ratings yet
N-grams and Markov Models in NLP
51 pages
NLP System Components and Models
No ratings yet
NLP System Components and Models
13 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
3 pages
Ambiguity Resolution in NLP Techniques
No ratings yet
Ambiguity Resolution in NLP Techniques
24 pages
Statistical Parsing in NLP Analysis
No ratings yet
Statistical Parsing in NLP Analysis
6 pages
N-Gram Models in Natural Language Processing
No ratings yet
N-Gram Models in Natural Language Processing
22 pages
N-Gram Language Models in NLP
100% (1)
N-Gram Language Models in NLP
22 pages
N-Gram Models in Language Processing
No ratings yet
N-Gram Models in Language Processing
12 pages
N-gram Models in Language Processing
No ratings yet
N-gram Models in Language Processing
25 pages
AI Language Models and Applications
No ratings yet
AI Language Models and Applications
16 pages
N-grams in Statistical Language Models
No ratings yet
N-grams in Statistical Language Models
87 pages
NLP Unit 2
No ratings yet
NLP Unit 2
26 pages
Sequence Learning in NLP: POS Tagging
No ratings yet
Sequence Learning in NLP: POS Tagging
50 pages
AI Language Models and Applications
No ratings yet
AI Language Models and Applications
14 pages
Understanding N-Gram Models in NLP
No ratings yet
Understanding N-Gram Models in NLP
12 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
NLP Language Models and POS Tagging
No ratings yet
NLP Language Models and POS Tagging
119 pages
Word-Level Analysis in NLP Techniques
No ratings yet
Word-Level Analysis in NLP Techniques
25 pages
N-gram Language Model Overview
No ratings yet
N-gram Language Model Overview
51 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
Challenges in Probabilistic Language Models
No ratings yet
Challenges in Probabilistic Language Models
12 pages
Trigram Language Models in NLP
No ratings yet
Trigram Language Models in NLP
19 pages
Natural Language Processing Lecture Notes Columbia Cs4705 Compress
No ratings yet
Natural Language Processing Lecture Notes Columbia Cs4705 Compress
147 pages
Probabilistic Models in NLP Explained
No ratings yet
Probabilistic Models in NLP Explained
15 pages
NLP Tasks and Grammar Modeling Overview
No ratings yet
NLP Tasks and Grammar Modeling Overview
46 pages
NLP Material Unit 2
No ratings yet
NLP Material Unit 2
18 pages
Understanding Language Models in NLP
No ratings yet
Understanding Language Models in NLP
36 pages
NLP Applications and Text Classification
No ratings yet
NLP Applications and Text Classification
64 pages
Understanding Simulation Techniques
100% (11)
Understanding Simulation Techniques
2 pages
Backtracking in Algorithm Design
No ratings yet
Backtracking in Algorithm Design
10 pages
AVL Tree Implementation in C
No ratings yet
AVL Tree Implementation in C
4 pages
Rationals Review 8 - Practice Test
No ratings yet
Rationals Review 8 - Practice Test
2 pages
Maxima and Minima Problem Solutions
No ratings yet
Maxima and Minima Problem Solutions
4 pages
Algorithm Analysis Midterm Questions
No ratings yet
Algorithm Analysis Midterm Questions
8 pages
M12 Fraser 07 PPT C12
No ratings yet
M12 Fraser 07 PPT C12
59 pages
Discrete Logarithm and Hash Functions in Cryptography
No ratings yet
Discrete Logarithm and Hash Functions in Cryptography
5 pages
Estat Gof with SVY Logistic in Stata
No ratings yet
Estat Gof with SVY Logistic in Stata
2 pages
Foliated Order in Fracton Phase Transition
No ratings yet
Foliated Order in Fracton Phase Transition
9 pages
Machine Learning Product Lifecycle Guide
No ratings yet
Machine Learning Product Lifecycle Guide
35 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
2 pages
Fast and Furious 3D Detection System
No ratings yet
Fast and Furious 3D Detection System
9 pages
Overview of Trie Data Structures
No ratings yet
Overview of Trie Data Structures
11 pages
Asymptotic Notation in Data Structures
No ratings yet
Asymptotic Notation in Data Structures
29 pages
Algorithm Analysis: Homework 2 Solutions
No ratings yet
Algorithm Analysis: Homework 2 Solutions
4 pages
Data Security & Privacy Assignment Guide
No ratings yet
Data Security & Privacy Assignment Guide
4 pages
Transforming Sentences with Transformers
No ratings yet
Transforming Sentences with Transformers
30 pages
Optimizing Rectangular Patch Antenna Design
No ratings yet
Optimizing Rectangular Patch Antenna Design
10 pages
Eigenvalues and Eigenspaces Explained
No ratings yet
Eigenvalues and Eigenspaces Explained
18 pages
Bootstrap Method for Variance Estimation
No ratings yet
Bootstrap Method for Variance Estimation
10 pages
Implementing A* for 8-Puzzle in Python
No ratings yet
Implementing A* for 8-Puzzle in Python
5 pages
Introduction to Computability and Automata
No ratings yet
Introduction to Computability and Automata
33 pages
Feature Extraction: Corner Detection Techniques
No ratings yet
Feature Extraction: Corner Detection Techniques
37 pages
Train Operation Adjustment Using Rough Sets
No ratings yet
Train Operation Adjustment Using Rough Sets
3 pages
MBA Exam: Quantitative Techniques 2023
No ratings yet
MBA Exam: Quantitative Techniques 2023
10 pages
Understanding the Z-Transform Basics
No ratings yet
Understanding the Z-Transform Basics
109 pages
Nyquist Criterion in Control Systems
No ratings yet
Nyquist Criterion in Control Systems
16 pages
Machine Learning for Long-Term Flood Forecasting
No ratings yet
Machine Learning for Long-Term Flood Forecasting
19 pages

Understanding Natural Language Processing

Uploaded by

Understanding Natural Language Processing

Uploaded by

What is NLP ?

• NLP is Natural Language Processing.

• NLP encompasses anything a computer needs to understand natural

• Natural Language Processing (NLP) is a subfield of Artificial intelligence

N-grams are contiguous sequences of items that are collected

Two types of Language Modelings:

Statistical Language Modelings: Statistical Language Modeling,

Neural Language Modelings: Neural network methods are

Such a sentence is not grammatically acceptable. However some

One easy way to handle such unacceptable sentences is by assigning

Sentence Structure: P(w(1), w(2)..., w(n-1), w(n))

We can avoid this very long calculation by approximating that the

This assumption is called Markov assumption and such a model is

Bigrams can be generalized to the n-gram which looks at (n-1)

A bigram is a first-order Markov model.

Therefore , P(w(1), w(2)..., w(n-1), w(n)) = P(w(2)|w(1)) P(w(3)|w(2)) ... P(w(n)|w(n-1))

This method tend to underestimate the probability of strings that

Some of the techniques are:

P(wx) = c(wx)/sumi{c(wi)} = c(wx)/N

Let there be an adjusted count c.

Now, probabilities can be calculated by normalizing counts by N.

Normal bigram probabilities are computed by normalizing each row

For add-one smoothed bigram counts we need to augment the

V: p*(wn|wn-1) = ( C(wn-1wn)+1 )/( C(wn-1)+V )

POS tagging or part-of-speech tagging is the procedure of

The boy is playing in the park. ('Park' is Noun)

Park the car. ('Park' is Verb)

Thus, this approach is not practically adopted for building POS

Later this model can be used to POS tag new sentences.

Hidden Markov Model has two important components-

1)Transition Probabilities: The one-step transition probability is the

2)Emission Probabilties: : The output probabilities for an observation

Consider the given corpus

EOS/eos They/pronoun cut/verb the/determiner paper/noun

count(cut) = total count of cut = 3

Now, calculating the probability

Similarly, Probability to be filled in the cell at he intersection of cut and determiner

Calculate for P(cut/noun)?

and so on zero for other tags too.

Now, calculating the probability Probability to be filled in the cell at he intersection of

Probability to be filled in the cell at he intersection of determiner(in the column) and

Repeat the same for all the tags

You might also like