0% found this document useful (0 votes)
334 views8 pages

Dynamic Programming Parsing in NLP

Dynamic programming parsing is an efficient method in NLP and compiler design that avoids redundant computations by storing intermediate results, allowing for polynomial-time parsing of context-free grammars. Shallow parsing identifies main constituents without building complete parse trees, while probabilistic context-free grammars (PCFGs) assign probabilities to parse rules, aiding in ambiguity resolution. Unification of feature structures combines linguistic constraints, ensuring agreement and consistency in grammar frameworks.

Uploaded by

thrimurthimasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
334 views8 pages

Dynamic Programming Parsing in NLP

Dynamic programming parsing is an efficient method in NLP and compiler design that avoids redundant computations by storing intermediate results, allowing for polynomial-time parsing of context-free grammars. Shallow parsing identifies main constituents without building complete parse trees, while probabilistic context-free grammars (PCFGs) assign probabilities to parse rules, aiding in ambiguity resolution. Unification of feature structures combines linguistic constraints, ensuring agreement and consistency in grammar frameworks.

Uploaded by

thrimurthimasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Dynamic programming parsing :

• Dynamic Programming (DP) parsing is a method used in natural language processing


(NLP) and compiler design to efficiently parse sentences or strings according to a
grammar, usually a context-free grammar (CFG). Instead of recomputing the parse of
every substring multiple times, DP parsing stores intermediate results and reuses
them—making it much more efficient than naïve recursive parsing.
• Naïve recursive approaches may repeatedly recompute whether the same substring can
be generated from a nonterminal. Dynamic programming avoids this redundancy by
breaking the input into smaller substrings, solving each subproblem once, and storing
results in a table.
• It avoids redundant computations (unlike naïve recursion).
• Handles ambiguity by storing multiple parses for the same span.
• Enables probabilistic parsing by attaching probabilities to entries in the table.
• Dynamic programming parsing (like CKY or Earley) builds up possible parses of
substrings in a table and reuses them to assemble parses of longer spans. This makes
context-free parsing polynomial-time instead of exponential.
• The Dynamic programming parsers are
• 1) Shallow parsing
• 2) Probabilistic CFG

Shallow parsing :
• Shallow parsing is a lightweight form of syntactic analysis that identifies the main
constituents of a sentence (like noun phrases and verb phrases) but does not build a
complete parse tree.
• Instead of analyzing the full hierarchical structure of the sentence (like deep parsing
does), shallow parsing just "chunks" words into meaningful groups.
Example
Sentence:
"The quick brown fox jumps over the lazy dog."
Deep Parsing → full syntax tree:
S
├── NP (The quick brown fox)
└── VP (jumps over the lazy dog)
└── PP (over the lazy dog)
└── NP (the lazy dog)

Shallow Parsing → chunks:


[NP The quick brown fox] [VP jumps] [PP over] [NP the lazy dog]
So shallow parsing doesn’t care about the full hierarchical relations, just about grouping
words into chunks.
Shallow parsing usually combines:
1. POS Tagging → assign part-of-speech tags to each word.
o "The/DT quick/JJ brown/JJ fox/NN jumps/VBZ over/IN the/DT lazy/JJ
dog/NN"
2. Chunking Rules / Models → group tokens into chunks:
o Determiners + adjectives + nouns → NP
o Verbs (possibly with auxiliaries) → VP
o Prepositions → PP
These can be rule-based (using regex over POS tags) or statistical (using machine
learning like CRFs, HMMs, or neural models).

Probabilistic CFG :
• A Probabilistic Context-Free Grammar (PCFG) is just like a regular context-free
grammar (CFG), but each production rule has a probability attached.
• This makes the grammar stochastic: instead of just telling us whether a sentence is
grammatical, it lets us assign probabilities to different possible parses.

Formal Definition
A PCFG is a 5-tuple:
G=(N,Σ,R,S,P)
• N: Set of nonterminals (e.g., S, NP, VP)
• Σ: Set of terminals (words/tokens, e.g., "dog", "runs")
• R: Set of production rules (e.g., S → NP VP)
• S: Start symbol (e.g., S)
• P: Probability distribution over rules (for each nonterminal, rule probabilities sum to 1)

Example
Sentence: "the dog chased the cat"
Grammar:

S → NP VP [1.0]
NP → Det N [0.6]
NP → "the dog" [0.4]
VP → V NP [1.0]
Det → "the" [1.0]
N → "dog" [0.5]
N → "cat" [0.5]
V → "chased" [1.0]

Uses of PCFG
• Ambiguity resolution: If a sentence has multiple parses, PCFG chooses the most
probable one.
• Data-driven: Rule probabilities can be estimated from a treebank (corpus of parsed
sentences).
• Foundation of probabilistic parsing: CKY, Earley, and other DP parsers can be
extended to PCFGs.

Limitations of PCFGs
• Assume independence of rules (not always true in real language).
• Struggle with long-distance dependencies (e.g., subject–verb agreement).
• Often extended with lexicalization (probabilities conditioned on words).

Probabilistic CYK Algorithm :


The CYK algorithm is a classic dynamic programming parser for context-free grammars in
Chomsky Normal Form (CNF).
The probabilistic version (for PCFGs) extends CYK by not just storing whether a substring
can be derived from a nonterminal, but also the probability of the best derivation.

Algorithm Outline
Input:
• A sentence of length nnn.
• A PCFG in CNF (rules of form A → BC or A → a).
Output:
• A parse table storing the maximum probability of each nonterminal spanning
substring [i,j][i, j][i,j].
• The most probable parse tree (if desired).
Example
Grammar (PCFG in CNF):
S → NP VP [1.0]
NP → Det N [0.6]
NP → "dogs" [0.4]
VP → V NP [1.0]
Det → "the" [1.0]
N → "dogs" [0.5]
N → "bones" [0.5]
V → "chase" [1.0]
Probabilistic Lexicalized CFGs:
Probabilistic Lexicalized CFGs are an extension of PCFGs where every phrase is annotated
with a head word, and rule probabilities are conditioned on these heads. This makes the
grammar more sensitive to actual word choices, improving disambiguation and parsing
accuracy.
A lexicalized context-free grammar is a CFG where each phrasal category (like NP, VP,
PP, …) is associated with a head word (the word that determines the phrase’s syntactic and
semantic properties).
Example:
• NP → the big dog (head = "dog")
• VP → chased the cat (head = "chased")

Why Lexicalization?
Standard PCFGs assume rules are chosen independently of the actual words.
• Example: A PCFG might say PP → P NP has probability 1.0, no matter what the
preposition is.
• But in real language:

o "eat pizza with fork"


o "eat pizza with anchovies"

o "eat pizza with telescope" less likely


Lexicalization lets probabilities depend on specific words, capturing stronger
syntactic/semantic preferences.

Probabilistic Lexicalized CFG (PLCFG)


A PLCFG = a CFG where:
1. Every constituent has a head word.
2. Rules expand head-annotated categories.
3. Rule probabilities are conditioned on the head word.

Example
Sentence: "the dog chased the cat"
CFG (lexicalized version):
S(chased) → NP(dog) VP(chased)
VP(chased) → V(chased) NP(cat)
NP(dog) → Det(the) N(dog)
NP(cat) → Det(the) N(cat)
Probabilities:
• P(S(chased) → NP(dog) VP(chased)) = 1.0
• P(VP(chased) → V(chased) NP(cat)) = 0.9
• P(NP(dog) → Det(the) N(dog)) = 0.7
• P(NP(cat) → Det(the) N(cat)) = 0.6
Now probabilities depend not just on the category (NP, VP) but also on the lexical head
("dog", "chased", "cat").
This allows more nuanced probabilities, e.g.:
• VP(eat) → V(eat) NP(pizza) might be very probable.
• VP(eat) → V(eat) NP(idea) might be very improbable.

Unification of feature structures.


Unification of feature structures is the process of combining two sets of linguistic constraints
into one consistent set. If constraints clash, unification fails. This mechanism lets grammar
frameworks enforce agreement and other linguistic conditions in a principled way.

What are Feature Structures?


A feature structure is a set of attribute–value pairs used to represent linguistic information
(syntax, semantics, morphology, agreement, etc.).
Example (NP “the dogs”):
[ CAT: NP
NUM: plural
PERS: 3rd ]
Here:
• CAT = syntactic category
• NUM = number (singular/plural)
• PERS = person

What is Unification?
Unification is the operation of merging two feature structures into one, provided they are
compatible.
• If both have the same feature with the same value → keep it.
• If one has a feature the other lacks → include it.
• If both specify a feature with different values → conflict → unification fails.

Think of it as “combining constraints”: the result must satisfy both.

Example 1: Successful Unification


FS1:
[ CAT: NP
NUM: plural ]
FS2:
[ PERS: 3rd
NUM: plural ]
Unification result:
[ CAT: NP
NUM: plural
PERS: 3rd ]

Works, since both agree on NUM = plural.

Example 2: Failed Unification


FS1:
[ CAT: NP
NUM: singular ]
FS2:
[ NUM: plural
PERS: 3rd ]

Conflict: NUM = singular vs NUM = plural.


Unification fails.

🛠 Why is Unification Useful?


In grammars, rules often carry constraints. Unification enforces these:
• Subject–verb agreement:
o Subject NP: [NUM: plural]
o Verb: [NUM: plural]
o Unification succeeds → sentence grammatical.
o If mismatch → unification fails → sentence rejected.
• Case, gender, tense, semantic roles can all be enforced through unification.

Applications
• Unification grammars (HPSG, LFG, PATR-II)
• Feature-based parsing in NLP
• Constraint solving (agreement, selection restrictions)
• Computational morphology (inflection constraints)

Common questions

Powered by AI

Lexicalization is important because it conditions grammar rules on specific head words rather than just categories like NP or VP. This adjustment makes the grammar more sensitive to actual word choices, capturing stronger syntactic and semantic preferences, and thus significantly improving disambiguation and parsing accuracy. Lexicalized grammars are especially effective in capturing language patterns that are not easily modeled with standard PCFG due to their reliance on specific lexical context .

PCFGs assume rule independence and struggle with long-distance dependencies, which can be critical in natural language. Lexicalized CFGs address these issues by associating each rule with head words, allowing probabilities to be conditioned on specific words, which captures stronger syntactic and semantic preferences. This enables better disambiguation and parsing accuracy, particularly in sentences with nuanced structures .

Shallow parsing supports probabilistic parsing by chunking sentences into meaningful parts using statistical models like Hidden Markov Models (HMMs) or Conditional Random Fields (CRFs). These models can learn from annotated corpora to predict the most likely structure of a sentence, thus providing a statistical basis for assigning probabilities to different parses, which enhances accuracy and efficiency in syntactic analysis .

Feature-based parsing in NLP utilizes the unification of feature structures to solve agreement and selection restrictions by representing linguistic information as sets of attributes and values. During parsing, these feature structures are unified to ensure that all constraints—such as agreement in number, gender, case, and specific selection restrictions—are consistently applied. If any constraints clash, unification fails, flagging an inconsistency in the parse tree, which ensures that only grammatically correct sentences are accepted .

A PCFG consists of nonterminals, terminals, production rules, a start symbol, and a probability distribution over the rules. These components contribute to its probabilistic nature by assigning probabilities to production rules, allowing the grammar to not only determine whether a sentence is grammatical but also to assign probabilities to different possible parses. This stochastic nature aids in resolving ambiguities by selecting the most probable parse tree based on these probabilities .

Dynamic programming parsers handle ambiguity by storing multiple possible parses for the same spans in a table. This means they can efficiently manage cases where substrings of a sentence can be parsed in different ways, allowing for probabilistic parsing where the parser later determines the most likely interpretation by combining the stored parses into complete parse trees .

Dynamic programming parsing improves efficiency by storing intermediate results in a table and reusing them, instead of recomputing the parse of every substring multiple times like naïve recursive parsing. This avoids redundant computations and makes parsing polynomial-time, which is efficient compared to the potential exponential time complexity of naïve recursive methods .

Feature structures in unification grammars represent linguistic information as attribute–value pairs. Unification is the process of merging two feature structures, ensuring that all constraints are satisfied. It plays a critical role in enforcing linguistic constraints such as subject-verb agreement and case, by ensuring consistency across feature structures, and rejecting parse trees where constraints clash .

Shallow parsing differs from deep parsing as it identifies the main constituents of a sentence without constructing a full hierarchical parse tree. It groups words into chunks (e.g., noun phrases, verb phrases) rather than analyzing their full syntactic relationships. This simplicity means shallow parsing is faster and less resource-intensive but may miss more complex syntactic details that deep parsing captures, such as nested structures and detailed syntactic roles .

Probabilistic parsing using the CYK algorithm extends standard CYK parsing by storing not only whether a substring can be derived from a nonterminal, but also the probability of the best derivation for each substring. This allows the parser to choose the most probable parse tree, effectively handling ambiguity in parse structures and enhancing parsing accuracy based on probabilistic models .

You might also like