0% found this document useful (0 votes)
13 views41 pages

Probabilistic Language Modeling Basics

The document discusses probabilistic language models, focusing on assigning probabilities to sentences for applications like machine translation, spell correction, and speech recognition. It explains how to compute these probabilities using the Chain Rule and introduces the Markov Assumption to simplify calculations. Additionally, it covers estimating N-gram probabilities, specifically bigrams and trigrams, with examples of calculating sentence probabilities.

Uploaded by

spv12344321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views41 pages

Probabilistic Language Modeling Basics

The document discusses probabilistic language models, focusing on assigning probabilities to sentences for applications like machine translation, spell correction, and speech recognition. It explains how to compute these probabilities using the Chain Rule and introduces the Markov Assumption to simplify calculations. Additionally, it covers estimating N-gram probabilities, specifically bigrams and trigrams, with examples of calculating sentence probabilities.

Uploaded by

spv12344321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Language Model

Probabilistic Language Models


• Today’s goal: assign a probability to a sentence
• Machine Translation:
• P(high winds tonite) > P(large winds tonite)
• Spell Correction
Why? • The office is about fifteen minuets from my house
• P(about fifteen minutes from) > P(about fifteen minuets from)
• Speech Recognition
• P(I saw a van) >> P(eyes awe of an)
• + Summarization, question-answering, etc., etc.!!
Probabilistic Language Modeling
• Goal: compute the probability of a sentence or sequence
of words:
P(W) = P(w1,w2,w3,w4,w5…wn)

• Related task: probability of an upcoming word:


P(w5|w1,w2,w3,w4)
• A model that computes either of these:
P(W) or P(wn|w1,w2…wn-1) is called a language model.
• Better: the grammar But language model or LM is standard
How to compute P(W)
• How to compute this joint probability:

• P(its, water, is, so, transparent, that)


• Intuition: let’s rely on the Chain Rule of Probability
Reminder: The Chain Rule
• Recall the definition of conditional probabilities
p(B|A) = P(A,B)/P(A) Rewriting: P(A,B) = P(A)P(B|A)

• More variables:
P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)
• The Chain Rule in General
P(x1,x2,x3,…,xn) = P(x1)P(x2|x1)P(x3|x1,x2)…P(xn|x1,…,xn-1)
The Chain Rule applied to compute joint
probability of words in sentence

P(w1w2 … wn ) = Õ P(wi | w1w2 … wi-1 )


i

P(“its water is so transparent”) =


P(its) × P(water|its) × P(is|its water)
× P(so|its water is) × P(transparent|its water is
so)
How to estimate these probabilities
• Could we just count and divide?

P(the | its water is so transparent that) =


Count(its water is so transparent that the)
Count(its water is so transparent that)
• No! Too many possible sentences!
• We’ll never see enough data for estimating these
Markov Assumption

•Simplifying assumption:
Andrei Markov

P(the | its water is so transparent that) » P(the | that)


•Or maybe

P(the | its water is so transparent that) » P(the | transparent that)


Markov Assumption

P(w1w2 … wn ) » Õ P(wi | wi-k … wi-1 )


i
•In other words, we approximate each
component in the product

P(wi | w1w2 … wi-1) » P(wi | wi-k … wi-1)


Language
Modeling
Introduction to N-grams
Language
Modeling
Estimating N-gram
Probabilities
Estimating bigram probabilities
• The Maximum Likelihood Estimate

count(wi-1,wi )
P(wi | w i-1) =
count(w i-1 )

c(wi-1,wi )
P(wi | w i-1 ) =
c(wi-1)
Build the Bigram Model:
Assume our training corpus contains the following
sentences:

"I am happy"
"I am sad"
"I am excited"
"You are happy"
Bigrams: Unigrams:

("I", "am"): 3 "I": 3


("am", "happy"): 2 "am": 3
("am", "sad"): 1 "happy": 2
("am", "excited"): 1 "sad": 1
("You", "are"): 1 "excited": 1
("are", "happy"): 1 "You": 1
"are": 1
Calculate Bigram Probabilities:
Calculate the Probability of the Full Sentence:
When some type the words I am, the probability of next word
is Happy or Sad or Excited
Given the following trigram probabilities derived from a training corpus,
calculate the probability of the sentence "She is feeling very good."

Trigrams:
Unigrams:

("She", "is", "feeling"): 2 "She": 2


("is", "feeling", "very"): 1 "is": 2
("feeling", "very", "good"): 2 "feeling": 2
"very": 2
Bigrams: "good": 2
•("She", "is"): 2
•("is", "feeling"): 1
•("feeling", "very"): 1
•("very", "good"): 2
P("She is feeling very good")=P("feeling"∣"She is")×P("very"∣"is feeling")
×P("good"∣"feeling very")

=1.0×1.0×2.0 =2.0

The probability of the sentence "She is feeling very good" using the
trigram model is 2.0.
Problem Statement

Given the partial sentence "She is feeling", predict the most likely next word using the trigram model.
Trigram Model Data

From the training corpus, we have the following trigrams and their counts:

Trigrams:
("She", "is", "feeling"): 2
("is", "feeling", "very"): 1
("is", "feeling", "good"): 1
("feeling", "very", "good"): 2
("feeling", "good", "happy"): 1

Bigrams:
("She", "is"): 2
("is", "feeling"): 1
("feeling", "very"): 1
("feeling", "good"): 1
("good", "happy"): 1
𝐶𝑜𝑢𝑛𝑡(“𝑖𝑠 𝑓𝑒𝑒𝑙𝑖𝑛𝑔 𝑔𝑜𝑜𝑑”)
P(“good”|”is feeling”)= =1
𝐶𝑜𝑢𝑛𝑡(“𝑖𝑠 𝑓𝑒𝑒𝑙𝑖𝑛𝑔 ”)
Example Sentence: The Cat

You might also like