CHAPTER THREE
COMPILER DESIGN
Prepared by: Mesay Yohannes (MSc.)
November, 2023
Jinka
Syntax Analysis (Parsing)
Outline
3.1. The Role of Parsing
3.2. Parsing
3.2.1. Derivation,
3.2.2. Parse Tree,
3.2.3. Ambiguity, and
3.2.4. Left Recursion
1
3.1. The Role of Parsing
Syntax analyzer is the second phase of a compiler and also known as parser.
As we see in chapter 2, a lexical analyzer can identify tokens with the help of regular
expressions and patterns/rules.
As second phase, a syntax analyzer or parser takes the input from a lexical analyzer in the form
of token streams.
A parser is a program that generates a parse tree for the given string, if the string is generated
from the underlying grammar.
A parser analyzes the source code (token stream) against the production rules to detect any
errors in the code.
2
3.1. The Role of Parsing…(Cont.)
In this way, the parser accomplishes two tasks,
Parsing the code,
Looking for errors and generating a parse tree as the output of the phase.
In compilation, the parser obtains a string of tokens from the lexical analyzer, and
expected to parse the whole code even if some errors exist in the program.
A lexical analyzer cannot check the syntax of a given sentence due to the limitations of
the regular expressions.
Regular expressions cannot check balancing tokens, such as parenthesis.
Therefore, the syntax analysis phase uses context-free grammar (CFG), which is
recognized by push-down automata.
The syntax of a language is specified by a context free grammar (CFG).
3
3.1. The Role of Parsing…(Cont.)
Context Free Grammar (CFG)
CFG is a helpful tool in describing the syntax of programming languages.
CFG is a formal grammar which is used to generate all possible patterns of strings in a given form.
The rules in a CFG are mostly recursive.
A syntax analyzer checks whether a given program satisfy the rules implied by a CFG or not.
If it satisfies, the syntax analyzer creates a parse tree for the given program.
Components or Tuples of Context Free Grammar (CFG)
A context free grammar has four components; G = {V, T, P, S}
Grammar (G): is grammar, which consists of a set of production rules. Which also generate the
string of a language.
1. Non-terminals (V): are a set of syntactic variables that defines sets of strings.
They are represented by upper case letters ([A-Z]) and they can be further derivate.
2. Terminal Symbols (T): are the basic symbols from which strings are formed, also set of
tokens.
Terminals are represented by lower case letters ([a-z], [0-9], (, ), … ) and they do not
have further derivation.
4
3.1. The Role of Parsing…(Cont.)
3. Productions (P): is a set of production rules of a grammar specify the manner in
which the terminals and non-terminals can be combined to form strings.
Each productions consists of three things.
A non-terminal (left side of the production),
An arrow, and
A sequence of tokens with non-terminals (right side of the production),
EE+E
S As
Sb|a
4. Start Symbol (S): is one of the non-terminals from where the production begins.
The strings are derived from the start symbol by repeatedly replacing a non-
terminal.
Initially a non-terminal is start symbol by the right side of a production, for that
non-terminal.
5
3.1. The Role of Parsing…(Cont.)
Example: Let we take the problem of palindrome language; L = {w| w = wR},
This language cannot be described by means of regular expressions, because it is not a
regular language, but it can be described by means of CFG, as illustrated below;
G = (V, Σ, P, S); where: G- is a grammar,
V = {Q, Z, N}, which is set of non-terminals,
Σ = {0, 1}, which is a set of terminals,
P = {Q -> Z | Q -> N | Q -> | Z -> 0Q0 | N -> 1Q1}, which is a set of production
S = {Q}, which is a start symbol
This grammar describes palindrome language, such as: 1001, 11100111, 1010101, 11111, etc.
Check: 1001
QN
Q 1Q1
Q 10Q01
Q 10
Q 1001
6
3.1. The Role of Parsing…(Cont.)
Syntax Analyzer Vs Lexical Analyzer
A program recognized by LA, and it is also by the syntax analyzer.
Both of them do similar things;
But the LA deals with simple non-recursive constructs of the language.
The SA deals with recursive constructs (Recursive is a function which calls itself
again and again) of the language.
Q N + T|T
T T + P|P
Pa|b
The lexical analyzer simplifies the jobs of the syntax analyzer.
The lexical analyzer recognizes the smallest meaningful units (tokens) in a source
program.
7
3.2. Parsing
Parsing is performed at the SA phase where a stream of tokens is taken as input from
LA and the parser produces the parser tree for the tokens against the syntax error.
The concept of parsing is basically address various concepts such as:
Derivation
Parse Tree
Ambiguity
Left Recursion
A. Derivation
A derivation is basically a sequence of production rules, in order to get the input string.
During the production rule, we take two decisions for some sentential form of input.
Deciding the non-terminal which is to be replaced.
Deciding the production rule, by which the non-terminal will be replaced.
8
3.2. Parsing…(Cont.)
To decide which non-terminal to be replaced with production rule, we can have two options;
i. Left-most Derivation
Here, the sentential form of an input is scanned and replaced from left to right.
Sentential form derived from left-most derivation is called a left-sentential form.
Sentential Form: a sentence that contains variables and terminals.
i.e. S 0S1
Variables Combination of terminals
ii. Right-most Derivation
and variables
Here, we scan and replace the input with production rules, from right to left.
Sentential form derived from right-most derivation is called right-sentential form.
9
3.2. Parsing…(Cont.)
Example: consider the following production rules, Obtain left most and right most
derivation for the input string: id + id * id
EE+E
EE*E
E id
The left-most derivation is: The right-most derivation is:
EE*E Since {E E + E} EE+E Since {E E + E}
EE+E*E Since {E E * E} EE+E*E Since {E E * E}
E id + E * E Since {E id} E E + E * id Since {E id}
E id + id * E Since {E id} E E + id * id Since {E id}
E id + id * id Since {E id} E id + id * id Since {E id}
NB; in left-most derivation, the left-most side non-terminal is always processed first;
Whereas,
In right-most derivation, the right-most side non-terminal is always processed first.
10
3.2. Parsing…(Cont.)
Example: consider the following grammar
S 0A | 1B | 0 | 1
A 0S | 1B | 1
B 0A | 1S
Construct LMD and parse tree for the following string.
i) 0101 ii) 1100101
i) 0101
S 0A
S 01B
S 010A
S 0101
11
3.2. Parsing…(Cont.)
Example: consider the following grammar
S 0A | 1B | 0 | 1
A 0S | 1B | 1
B 0A | 1S
Construct LMD and parse tree for the following string.
i) 0101 ii) 1100101
i) 1100101
S 1B
S 11B
S 110A
S 1100S
S 11001B
S 110010A
S 1100101
12
3.2. Parsing…(Cont.)
Example: consider the grammar
S (L)|a
L L,S|S
a) What are the non-terminals, terminals, and start symbols
b) Find LMD, RMD and parse trees for the following
i) (a, a) ii) (a, (a,a))
c) What language does the grammar great
Solution
a) Non-terminals = {S, L}
Terminals = {(, ), a, ,}
Start Symbols = {S}
b) i) (a, a) derivate by using LMD
S (L)
S (L, S)
S (S, S)
S (a, S)
S (a, a)
13
3.2. Parsing…(Cont.)
b(i) (a, a) derivate by using RMD
S (L)
S (L, S)
S (L, a)
S (S, a)
S (a, a)
b(ii) (a, (a, a)) derivate by using RMD
S (L)
S (L, S)
S (S, S)
S (a, S)
S (a, (L))
S (a, (L, S))
S (a, (S, S))
S (a, (a, S))
S (a, (a, a))
14
3.2. Parsing…(Cont.)
B. Parse Tree
A parse tree is a graphical depiction of a derivation.
It is convenient to see how strings are derived from the start symbol.
The start symbol of the derivation becomes the root of the parse tree.
All leaf nodes are terminals, and
All the interior nodes are non-terminals.
Example: let we take the left-most derivation of the given input string; id + id * id and draw the
parse tree. Assume the production rule E E + E| E * E | a| b| c.
Solution: The left-most derivation is a + b * c is:
EE*E
EE+E*E
E id + E * E
E id + id * E
14
E id + id * id
3.2. Parsing…(Cont.)
Example 3: design LMD AND RMD for the string aaabaab.
Grammar is S aS|aSbS|
LMD: aaabaab
S aS
S [Link]
S aaaSbS
S aaabS
S aaabS
S aaabaS
S aaabaaSbS
S aaabaab
S aaabaab
15
3.2. Parsing…(Cont.)
C. Ambiguity
If a context free grammar G has more than one derivation tree for some string w ∈ L(G), it is
called an ambiguous grammar.
There exist multiple right-most or leftmost derivations for some string generated from that
grammar.
Example: Check whether the grammar G with production rules is ambiguous or not.
X → X+X | X*X |X| a
Solution
Let’s find out the derivation tree for the string "a+a*a". It has two leftmost derivations.
Derivation 1:
X → X+X
X → a +X
X → a+ X*X
X → a+a*X
X → a+a*a
16
3.2. Parsing…(Cont.)
Example: Check whether the grammar G with production rules is ambiguous or not.
X → X+X | X*X |X| a
Solution
Let’s find out the derivation tree for the string "a+a*a". It has two leftmost derivations.
Derivation 2:
X → X*X
X → X +X*X
X → a+ X*X
X → a+a*X
X → a+a*a
Since there are two parse trees for a single string "a+a*a", the grammar G is ambiguous.
17
3.2. Parsing…(Cont.)
Class Activity:
Consider the following grammar:
G1: S SbS|a
Design LMD AND RMD for the string ababa
18
3.2. Parsing…(Cont.)
Solution:
G1: S SbS|a
Design LMD AND RMD for the string ababa
Solution: LMD
S SbS
S abS
S abSbS
S ababa
Solution: RMD
S SbS
S Sba
S SbSba
S Sbaba
S ababa
19
3.2. Parsing…(Cont.)
There are three main reasons leads to Ambiguity
Associativity
Rewriting the whole grammar
Precedence.
i) Associativity
It represents the direction in which the expression to be evaluated
Two types of associativity
1. Left to Right
2. Right to Left
If the operation is left to Right associative, then the operand will be taken by the left operator
If the operation is right to Left associative, the right operator will take the operand.
Operations like Addition, Multiplication, Subtraction, and division are left
associative.
For example; if the expression contains: id op id op id
It will be evaluated as; (id op id) op id i.e. (id + id) + id
Operations like exponentiation are right associative.
20
3.2. Parsing…(Cont.)
21
3.2. Parsing…(Cont.)
Example: a + b + c
Two way to evaluate the above expression
(a +b) + c and a + (b + c)
Example: a = b = c
Left Associative Grammar
Example:
EE+E
Ea|b|c
Design for the string a + b + c (The associativity of addition operator is left to right)
(a + b) + c a + (b + c)
EE+E EE+E
EE+E+E EE+E+E
Ea+E+E EE+E+c
Ea+b+E EE+b+c
Ea+b+c Ea+b+c
Left to Right Associative
22
3.2. Parsing…(Cont.)
Example: a = b = c
Two way to evaluate the above expression
a = (b = c) and (a = b) = c
Right Associative Grammar
Example:
EE=E
Ea|b|c
Design for the string a = b = c (The associativity of assignment operator is right to left)
Which means a = (b = c)
A = (b = c)
(a = b) = c
EE=E
EE=E
EE=E=E
EE=E=E
EE=E=c
EE=E=c
EE=b=c
EE=b=c
Ea=b=c
Ea=b=c
Right to Left Associative
23
3.2. Parsing…(Cont.)
24
3.2. Parsing…(Cont.)
ii) Rewrite the Grammar (Left Recursion)
Example:
E E + T|T
EE+E
Rewrite the Grammar Ta|b|c
Ea|b|c
a+b+c
a+b+c
Solution
a+b+c
EE+T
EE+T+E
ET+T+E
Ea+T+E
Ea+b+E
Ea+b+T
Ea+b+c
25
3.2. Parsing…(Cont.)
ii) Rewrite the Grammar (Right Recursion)
Example:
E T = E|T
EE=E
Rewrite the Grammar Ta|b|c
Ea|b|c
a=b= c
a=b=c
Solution
a=b=c
ET=E
EE=E=T
EE=E=c
EE=T=c
EE=b=c
ET=b=c
Ea+b+c
26
3.2. Parsing…(Cont.)
27
3.2. Parsing…(Cont.)
iii) Precedence
When more than one operator presents in an expression which one to be evaluated
first is determined by Operator Precedence.
Operator Precedence Table
28
3.2. Parsing…(Cont.)
iii) Precedence
Consider the Expression 4 + 3 * 6
4 + 3 * 6 = 7 * 6 = 42
4 + 3 * 6 = 4 + 18 = 22
Example: E + E | E * E | id, for the string id + id * id
29
3.2. Parsing…(Cont.)
Re-writing Grammar With Additional Rules
Example: E E + E | E * E | id
For the string id + id * id
Create Levels of Non-Terminals
E – First level of precedence
T – Second Level of precedence
F – Generating basic units (terminal) in expression
E E + T|T
T T * F|F
F id
id + id * id
EE+T
EE+T*F
E E + T * id
E E + F * id
E E + id * id
E id + id * id 30
U!
YO
NK
H A
T