UNIT - II
Syntax Analysis(Part-A):
• Introduction Context-Free
Grammars
• Writing a Grammar
• Ambiguous Grammars
1
2.1 Introduction to Syntax
Analysis
2
Syntax
Analysis
Syntax Analysis
The parser (syntax analyzer) receives the source
code in the form of tokens from the lexical
analyzer and performs syntax analysis, which
create a tree-like intermediate representation that
depicts the grammatical structure of the token
stream.
Syntax analysis is also called parsing.
A typical representation is a abstract syntax
tree in which
each interior node represents an operation
the children of the node represent the
arguments of the operation
3
Position of Syntax
Analyzer
4
Role of
Parser
Parser
Checks the stream of words and their parts of
speech
(produced by the scanner) for grammatical
correctness
Determines if the input is syntactically well
formed Guides checking at deeper levels
than syntax (static
semantics checking)
Builds an IR representation of the code
5
2.2 Context-Free Grammar
Definition
A context-free grammar (CFG) has four components:
A set of terminal symbols, sometimes referred to as "tokens."
A set of nonterminal symbols. sometimes called "syntactic
variables."
One nonterminal is distinguished as the start symbol.
A set of productions in the form: LHS → RHS
whereLHS (called head, or left side) is a single nonterminal
symbol RHS (called body, or right side) consists of zero
or more terminals and nonterminals.
The terminals are the elementary symbols of the language defined by
the grammar.
Nonterminals impose a hierarchical structure on the language that is
key to
syntax analysis and translation.
Conventionally, the productions for the start symbol are listed first.
6
Benefits Offered by
Grammar
Grammars offer significant benefits for both
language designers and compiler writers:
A grammar gives a precise, yet easy-to-
understand
syntactic specification to a programming
language.
Parsers can automatically be constructed for
certain classes of grammars.
The parser-construction process can reveal
syntactic
ambiguities and trouble spots.
A grammar imparts structure to a language.
The structure is useful for translating source
programs
into correct object code and for detecting
errors. 7
CFG vs. RE
Grammars are a more powerful notation than
regular expressions.
Every construct that can be described by a
regular expression can be described by a
grammar, but not vice-versa.
Every regular language is a context-free
language, but not vice-versa.
8
CFG
Example
A CFG Grammar
9
Another CFG
Example
Grammar for simple arithmetic
expressions
1 expr → expr + term | expr - term | term
2 term → term * factor | term / factor |
3 factor factor → ( expr ) | id
where
expr, term, and factor are
nonterminals
id, +, -, *, /, (, and ) are terminals
expr is the start symbol
10
Notational
Conventions
To avoid confusion between terminals and nonterminals, the following
notational conventions for grammar will be used.
terminal symbols
lowercase letters like a, b, c.
digits, operator and punctuation symbols, such as +, *, (, ), 0, 1, ...,
9.
Boldface strings such as id, or if. Each of which represents
a single terminal symbol.
nonterminal symbols
uppercase letters early in the alphabet like
A, B, C. lowercase italic names such as expr,
or stmt.
Specific symbols begin with a uppercase
letter such as Expr, Stmt.
Unless stated otherwise, the head of the
first production is the start symbol. 11
Notational Conventions
(cont)
To avoid confusion between terminals and nonterminals, the following
notational conventions for grammar will be used.
Grammar symbols (i.e. either terminal or nonterminal)
uppercase letters late in the alphabet, such as X, Y, Z.
lowercase Greek letters, α, β, γ for example, represent strings of
grammar symbols. Thus, a production can be written as: A → α.
12
Derivatio
ns
Derivations
A grammar derives strings by beginning with the start symbol and
repeatedly replacing a nonterminal by the body of a production for
that nonterminal. This sequence of replacements is called
derivation.
Derivation Example
Given the grammar:
1 exp → exp op exp | ( exp ) | number
2 op → + | - | *
The following is a derivation for an expression. At each step the
grammar rule choice used for the replacement is given on the
right.
Context-Free
Language
New Notations: ⇒=∗ and = +
⇒
α 1⇒+=∗ α mn eans α de1 rives α in nzero or more steps.
⇒
α 1 = α n means α 1 derives α n in one or more steps.
Definition
If S =⇒∗ α, where S is the start symbol of grammar G, then α
is called a sentential form of G. A sentential form may contain
both terminals and nonterminals.
A sentence of G is a sentential form with no nonterminals.
The language generated by a grammar G is its set of sentences,
denoted as L(G).
A language that can be generated by a context-free grammar is
said to be a
context-free language.
If two grammars generate the same language, the grammars
are said to be
equivalent.
Leftmost and Rightmost
Derivations
The point of parsing is to construct a derivation.
At each step, we choose a nonterminal to
replace.
Different choices can lead to different
derivations
Two derivations are of interest
Leftmost derivation - replace leftmost
⇒
nonterminal at each lm step, denoted as: .
Rightmost derivation - replace rightmost
⇒
nonterminal at
each step, denoted ras:
m
.
Leftmost and rightmost are the two
systematic derivations. We don’t care
about randomly-ordered derivations!
15
Leftmost and Rightmost
Derivations
Leftmost Derivation of (number - number)*number
Rightmost Derivation of (number - number)*number
16
Parse
Trees
Definition
A parse Tree is a labeled tree representation of a
derivation that filters out the order in which
productions are applied to replace nonterminals.
The interior nodes are labeled by nonterminals
The leaf nodes are labeled by terminals
The children of each internal node A are
labeled, from left to right, by the symbols in
the body of the production by which this A was
replaced during the derivation.
Since a parse tree ignores variations in the order in which
symbols in sentential forms are replaced, there is a many-to-one
relationship between derivations and parse tree.
17
Leftmost and Rightmost
Derivations
The following is a parse tree for these two
derivations discussedhere.
18
2.2 .Writing a Grammar
1 Lexical Versus Syntactic Analysis
2. Eliminating Ambiguity
3. Elimination of Left Recursion
[Link] Factoring
19
2.2.1 Lexical Versus Syntactic
Analysis
20
2.2.2 Ambiguous Grammars
21
Ambiguity
22
Removing the ambiguity
23
24
Deeper Ambiguity
25
2.2.3 Left Recursion
26
Left Recursion
27
Removal of Immediate Left
Recursion
28
General Left Recursion
Bad News!
Can only be removed when there is
no empty-string production and no
cycle in the grammar.
Good News!!!!
Never seen in grammars of any
programming languages
29
2.2.4 Left Factoring
Left factoring is required to eliminate non-
determinism of a grammar.
Suppose a grammar, S -> abS | aSb
Here, S is deriving the same terminal a in
the production rule(two alternative choices
for S), which follows non-determinism. We
can rewrite the production to defer the
decision of S as-
S -> aS'
S' -> bS | Sb
Thus, S' can be replaced for bS or S
30
Left Factoring
Left factor causes non-LL(1)
Given A X Y | X Z. Both A X Y
and A X Z can be chosen when A
is on top of stack and a token in
First(X) is the next token.
AXY|XZ
can be left-factored as
A X A’ and A’ Y | Z
31
Example of Left Factor
ifSt if ( exp ) st else st | if ( exp ) st
can be left-factored as
ifSt if ( exp ) st elsePart
elsePart else st |
seq st ; seq | st
can be left-factored as
seq st seq’
seq’ ; seq |
32
Next topic
Bottom-up parsing
Top-down parsing Shift-reduce parsers
Recursive-descent parsing
LL(1) parsing LR(0) parsing
LL(1) parsing LR(0) items
algorithm Finite automata of items
First and follow LR(0) parsing algorithm
sets LR(0) grammar
Constructing LL(1) SLR(1) parsing
parsing table SLR(1) parsing algorithm
Error recovery SLR(1) grammar
Parsing conflict
33