Module #1
CSE 243:
Natural Language Processing
Recap from the Previous Lecture
• Named Entity Recognition
2
Contents
• Statistical Machine Translation
• Project Updates
3
Machine Translation
• Conversion of a text from one language to another using computers.
• Example: Translation from English to French.
• Input language is also known as source language, and output /
translated language is also known as target language.
4
Examples of MT
• Conversion from English to French
• SRC: The three rabbits of Grenoble
• TGT: Les trois lapins de Grenoble
• Conversion from English to Simple English
• SRC: Students should not procrastinate their assignment submissions.
• TGT: Students should not delay submitting their assignments.
• Conversion from Spoken English to English
• SRC: I bought a car for Rupees five lakhs.
• TGT: I bought a car for Rs. 5,00,000.
• Code mixing
5
Machine Translation Paradigms
• Rule-based MT: Using linguistic rules to perform translation
• Example: Plurals end with an “s” in English. Hence, “Hudugaru” (plural of
“Huduga” in Kannada) = “Boys” in English.
• Example-based MT: Translation by analogy
• Statistical-based MT: Using source – target language pairs / parallel
corpus to learn alignments.
• Neural MT: Uses an encoder-decoder architecture to learn
representations of the source and map it to the target language
representations.
6
Challenges of MT
• Ambiguity
• Same word, multiple meanings
• Same meaning, multiple words
• Word Order
• SOV to SVO?
• Morphological Richness
• Challenging for SMT systems!
7
Problems with Rule-based MT
• Requires linguistic knowledge of both languages.
• Maintenance of the system is challenging
• Difficult to handle ambiguity
• Scaling is difficult!
8
Statistical MT
• Model translation using a probabilistic model.
• Measure of confidence in the translations
• Modeling uncertainty in translations
• Using argmax:
• E* = argmax P(e|f)
• E* = best translation
• e = target language text
• f = source language text
9
Word Alignment
• Given a parallel corpus, we find word-level alignments
• Example:
• English: Narendra Modi is the Prime Minister of India
• Hindi: Bharat ke Pradhan Mantri, Narendra Modi Hain.
• Alignments:
• Narendra Modi (English) -> Narendra Modi (Hindi)
• Prime Minister of India -> Bharat ke Pradhan Mantri
• Is (English) -> Hain
• Prime Minister (English) -> Pradhan Mantri
• of India (English) -> Bharat ke
• ………
• ………
10
Word Alignment
• There can be multiple possible alignments.
• Example: Prime Minister -> Bhaarat ke (?)
• Another example: Narendra Modi -> Bhaarat ke (?)
• With one sentence pair, we cannot find alignments properly!
• We need a parallel corpus to find alignments using co-occurrence of
words.
11
Example of Word Alignment
• Consider a parallel corpus with 2 sentences:
• S1: “Three rabbits” = “Trois lapins”
• S2: “The rabbits of Grenoble” = “Les lapins de Grenoble”
• What all words can be aligned?
• What about “The rabbit of Bengaluru”?
12
Phrase Table
• Table of probabilities of phrases
• Phrase table is learnt with word alignments
English Hindi Probability
Prime Minister of India Bhaarat ke Pradhan Mantri 0.75
Prime Minister of India Bhaarat ke Bhootpurv Pradhan Mantri 0.02
Prime Minister of India Pradhan Mantri 0.23
13
Challenges in PBSMT
• Divergent Word Order
• Rich morphology
• Named entities and OOV words
• To be covered in the next class…
14