CS 2201: Formal Language & Automate Theory
Bottom-up Parsing
1
Bottom-Up Parsing
• Bottom-up parser:
– parse tree created from the given input starting from leaves towards the root
– tries to find the right-most derivation of the given input in the reverse order
S ... (the right-most derivation of )
(the bottom-up parser finds the right-most derivation in the reverse
order)
• Bottom-up parsing: also known as shift-reduce parsing because its two main
actions are shift and reduce
– At each shift action, the current symbol in the input string is pushed to a stack
– At each reduction step, the symbols at the top of the stack (this symbol sequence is
the right side of a production) replaced by the non-terminal at the left side of that
production
– Two more actions: accept and error
Shift-Reduce Parsing
• Shift-reduce parser tries to reduce the given input string into the starting symbol
a string the starting symbol
reduced to
• At each reduction step, a substring of the input matching to the right side of a
production rule is replaced by the non-terminal at the left side of that production rule
• If the substring is chosen correctly, the right most derivation of that string is created in
the reverse order
rm
*
Rightmost Derivation: S
rm rm
Shift-Reduce Parser finds: ... S
Shift-Reduce Parsing -- Example
S aABb input string: aaabb
A aA | a aaAbb
B bB | b aAbb reduction
aABb
S
aABb
S rm aaAbb
rm aAbb rm rm aaabb
Right Sentential Forms
• How do we know which substring to be replaced at each reduction step?
Handle
• Handle of a string is a substring that matches the right side of a
production rule
– not every substring that matches the right side of a production rule is
handle
• A handle of a right sentential form ( ) :
a production rule A and a position of
where the string may be found and replaced by A to produce
the previous right-sentential form in a rightmost derivation of
rm
* rm
S A
is a string of terminals
• If the grammar is unambiguous, then every right-sentential form of the
grammar has exactly one handle
Handle Pruning
• A right-most derivation in reverse can be obtained by handle-pruning
S=0
rm 1 rm
2 rm
... rm
n-1
rm n=
input string
• Start from n, find a handle Ann in n, and replace n by An to get n-1
• Then find a handle An-1n-1 in n-1, and replace n-1 by An-1 to get n-2
• Repeat this, until we reach S
A Shift-Reduce Parser
E E+T | T Right-Most Derivation of id + id*id
T T*F | F E E+T E+T*F E+T*id E+F*id
F (E) | id E+id*id T+id*id F+id*id id+id*id
Right-Most Sentential Form Reducing Production
id+id*id F id
F+id*id TF
T+id*id ET
E+id*id F id
E+F*id TF
E+T*id F id
E+T*F T T*F
E+T E E+T
E
Handles are red and underlined in the right-sentential forms.
A Stack Implementation of A Shift-Reduce Parser
Four possible actions of a shift-parser :
1. Shift : The next input symbol is shifted onto the top of the stack
2. Reduce: Replace the handle on the top of the stack by the non-
terminal
3. Accept: Successful completion of parsing
4. Error: Parser discovers a syntax error, and calls an error recovery
routine
• Initial stack: contains only the end-marker $
• End of the input string: marked by the end-marker $
A Stack Implementation of A Shift-Reduce Parser
Stack Input Action
$ id+id*id $ shift
$id +id*id$ reduce by F id Parse Tree
$F +id*id$ reduce by T F
$T +id*id$ reduce by E T E 8
$E +id*id$ shift
$E+ id*id$ shift E 3 + T 7
$E+id *id$ reduce by F id
$E+F *id$ reduce by T F T 2 T 5 * F6
$E+T *id$ shift
$E+T* id$ shift F 1 F 4 id
$E+T*id $ reduce by F id
$E+T*F $ reduce by T T*F id id
$E+T $ reduce by E E+T
$E $ accept
Conflicts During Shift-Reduce Parsing
• For certain class of CFGs, shift-reduce parsers cannot be used
• Stack contents and the next input symbol may not decide action:
– shift/reduce conflict: Whether make a shift operation or a reduction
– reduce/reduce conflict: The parser cannot decide which of several
reductions to make
• If a shift-reduce parser cannot be used for a grammar, that grammar is
called as non-LR(k) grammar
left to right right-most k lookhead
scanning derivation
An ambiguous grammar can never be a LR grammar
Shift-Reduce Parsers
Two main categories of shift-reduce parsers:
1. Operator-Precedence Parser
– simple, but only a small class of grammars
CFG
LR
LALR
2. LR-Parsers
SLR
– covers wide range of grammars
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookahead LR parser)
– SLR, LR and LALR work same, only their parsing tables are different
LR Parsers
• The most powerful shift-reduce parsing (yet efficient) is:
LR(k) parsing
left to right right-most k lookahead
scanning derivation
• LR parsing is attractive because:
– LR parsers can be constructed to recognize virtually all programming language
constructs for which CFGs can be written
– LR parsing is most general non-backtracking shift-reduce parsing, yet it is still
efficient
– Class of grammars that can be parsed using LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers or LL methods
LL(1)-Grammars LR(1)-Grammars
– LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-
right scan of the input
LR (K) vs. LL (K)
• LR (k) grammar
– It is sufficient to show that a left to right shift reduce parsers be able
to recognize handles of right-sentential forms when they appear on
the top of the stack
– Or, we must be able to recognize the occurrence of the right side of a
production in a right-sentential form, with k input symbols of
lookahead
• LL(k) grammar: we must able to recognize the use of a production
seeing only the first k symbols of what its right side derives
LR Parsing Algorithm
input a1 ... ai ... an $
stack
Sm
Xm
LR Parsing Algorithm output
Sm-1
Xm-1
.
.
Action Table Goto Table
S1 terminals and $ non-terminal
X1 s s
t four different t each item is
S0 a actions a a state number
t t
e e
s s
A Configuration of LR Parsing Algorithm
• A configuration of a LR parsing is:
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )
Stack Rest of Input
• Sm and ai decides the parser action by consulting the parsing action
table (Initial Stack contains just So )
• So : does not represent any grammar symbol
• A configuration of a LR parsing represents the right sentential form:
X1 ... Xm ai ai+1 ... an $
Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )
2. reduce A (or rn where n is a production number)
– pop 2|| (=r) items from the stack;
– then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )
– Output is the reducing production A
3. Accept – Parsing successfully completed
4. Error -- Parser detected an error (an empty entry in the action table)
(SLR) Parsing Tables for Expression Grammar
Action Table Goto Table
1) E E+T state id + * ( ) $ E T F
2) ET 0 s5 s4 1 2 3
1 s6 acc
3) T T*F 2 r2 s7 r2 r2
4) TF 3 r4 r4 r4 r4
5) F (E) 4 s5 s4 8 2 3
6) F id 5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Actions of A (S)LR-Parser -- Example
stack input action output
0 id*id+id$ shift 5
0id5 *id+id$ reduce by Fid Fid
0F3 *id+id$ reduce by TF TF
0T2 *id+id$ shift 7
0T2*7 id+id$ shift 5
0T2*7id5 +id$ reduce by Fid Fid
0T2*7F10 +id$ reduce by TT*F TT*F
0T2 +id$ reduce by ET ET
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by Fid Fid
0E1+6F3 $ reduce by TF TF
0E1+6T9 $ reduce by EE+T EE+T
0E1 $ accept
Constructing SLR Parsing Tables – LR(0) Item
• LR(0) item: a production of G a dot at some position of the right side
Ex: A aBb Possible LR(0) Items: ..
A aBb
A a Bb
(four different possibility)
..
A aB b
A aBb
• Sets of LR(0) items: states of action and goto table of the SLR parser
• Collection of sets of LR(0) items (the canonical LR(0) collection) is
the basis for constructing SLR parsers
• Augmented Grammar: G’ is
G with a new production rule S’S where S’ is the new starting
symbol
The Closure Operation
• I: set of LR(0) items for a grammar G
closure(I): Set of LR(0) items constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I)
..
2. If A B is in closure(I) and B is a production rule of G;
then B will be in the closure(I)
We will apply this rule until no more new LR(0) items can be added
to closure(I)
The Closure Operation -- Example
E’ E .
closure({E’ E}) =
E E+T .
{ E’ E kernel items
ET .
E E+T
T T*F .
E T
TF .
T T*F
F (E) .
T F
F id .
F (E)
.
F id }
Kernel and Non-kernel Items
• Each set of items formed by taking the closure of a set of kernel items
• We are really interested in kernel items
• Non-kernel items can be removed to save storage
• Non-kernel items could be generated by the closure process
Goto Operation
• If I is a set of LR(0) items and X is a grammar symbol (terminal or non-
terminal), then goto (I,X) is defined as follows:
.
– If A X in I
.
then every item in closure({A X }) will be in goto (I,X)
Example:
.. .. .
I ={ E’ E, E E+T, E T,
T T*F, T F,
. .. .
F (E), F id }
goto (I,E) = { E’ E , E E +T }
.. .
goto (I,T) = { E T , T T *F }
goto (I,F) = {T F }
.. .. . .
goto (I,() = { F ( E), E E+T, E
F (E), F id }
T, T T*F, T .
F,
.
goto (I,id) = { F id }
Construction of The Canonical LR(0) Collection
• To create the SLR parsing tables for a grammar G, we will create the
canonical LR(0) collection of the grammar G’
• Algorithm:
.
C is { closure({S’ S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C
• goto function is a DFA on the sets in C.
The Canonical LR(0) Collection -- Example
I0: E’ .E I1: E’ E. I6: E E+.T I9: E E+T.
E .E+T E E.+T T .T*F T T.*F
E .T T .F
T .T*F I2: E T. F .(E) I10: T T*F.
T .F T T.*F F .id
F .(E)
F .id I3: T F. I7: T T*.F I11: F (E).
F .(E)
I4: F (.E) F .id
E .E+T
E .T I8: F (E.)
T .T*F E E.+T
T .F
F .(E)
F .id
I5: F id.
Transition Diagram (DFA)
E + T
I0 I1 I6 I9 * to I7
F
( to I3
T
id to I4
to I5
F I2 * I7 F
I10
(
I3 id to I4
(
to I5
I4 E I8 )
id id T
F
to I2 + I11
I5 to I3 to I6
(
to I4
Constructing SLR Parsing Table
(of an augumented grammar G’)
1. Construct the canonical collection of sets of LR(0) items for G’
C{I0,...,In}
2. Create the parsing action table as follows
• If a is a terminal, A.a in Ii and goto (Ii, a)=Ij then action[i, a] is shift j
• If A. is in Ii , then action[i,a] is reduce A for all a in FOLLOW(A)
where AS’
• If S’S. is in Ii , then action[i,$] is accept
• If any conflicting actions generated by these rules, the grammar is not SLR(1)
3. Create the parsing goto table
• for all non-terminals A, if goto (Ii, A)=Ij then goto [i, A]=j
4. All entries not defined by (2) and (3) are errors
5. Initial state of the parser contains S’.S
The Canonical LR(0) Collection -- Example
I0: E’ .E I1: E’ E. I6: E E+.T I9: E E+T.
E .E+T E E.+T T .T*F T T.*F
E .T T .F
T .T*F I2: E T. F .(E) I10: T T*F.
T .F T T.*F F .id
F .(E)
F .id I3: T F. I7: T T*.F I11: F (E).
F .(E)
I4: F (.E) F .id
E .E+T
E .T I8: F (E.)
T .T*F E E.+T
T .F
F .(E)
F .id
I5: F id.
Parsing Tables of Expression Grammar
Action Table Goto Table
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
SLR(1) Grammar
• An LR parser using SLR(1) parsing tables for a grammar G is called as
the SLR(1) parser for G
• If a grammar G has an SLR(1) parsing table, it is called SLR(1)
grammar (or SLR grammar in short)
• Every SLR grammar is unambiguous, but every unambiguous grammar
is not a SLR grammar
shift/reduce and reduce/reduce conflicts
• shift/reduce conflict: choice between a shift operation or reduction for
a terminal
• reduce/reduce conflict: If a state does not know whether it will make a
reduction operation using the production rule i or j for a terminal
• If the SLR parsing table of a grammar G has a conflict, we say that the
grammar is not SLR grammar
Conflict Example
S L=R I0: S’ .S I1:S’ S. I6:S L=.R I9: S L=R.
SR S .L=R R .L
L *R S .R I2:S L.=R L .*R
L id L .*R R L. L .id
RL L .id
R .L I3:S R.
I4:L *.R I7:L *R.
Problem R .L
FOLLOW(R)={=,$} L .*R I8:R L.
= shift 6 L .id
reduce by R L
shift/reduce conflict I5:L id.
Conflict Example2
S AaAb I0: S’ .S
S BbBa S .AaAb
A S .BbBa
B A.
B.
Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
a reduce by A b reduce by A
reduce by B reduce by B
reduce/reduce conflict reduce/reduce conflict
Constructing Canonical LR(1) Parsing Tables
• In SLR method, the state i makes a reduction by A when the current
token is a:
– if A. in Ii and a is in FOLLOW(A)
• In some situations, A cannot be followed by the terminal a in a right-
sentential form when and the state i are on the top stack. This means
that making reduction in this case is not correct.
S AaAb SAaAbAabab SBbBaBbaba
S BbBa
A Aab ab Bba ba
B AaAb Aa b BbBa Bb a
An Example
S L=R I0: S’ .S I1:S’ S. I6:S L=.R I9: S L=R.
SR S .L=R R .L
L *R S .R I2:S L.=R L .*R
L id L .*R R L. L .id
RL L .id
R .L I3:S R.
I4:L *.R I7:L *R.
Problem R .L
FOLLOW(R)={=,$} L .*R I8:R L.
= shift 6 L .id
reduce by R L
I5: L id.
- If L is reduced to R then the contents appear as: R= (no right sentential form can derive it)
- R L is INVALID for the input symbol “=“
LR(1) Item
• To avoid some of invalid reductions, the states need to carry more
information
• Extra information is put into a state by including a terminal symbol as a
second component in an item
• A LR(1) item is:
.
A ,a where a is the look-ahead of the LR(1) item
(a is a terminal or end-marker)
LR(1) Item (cont.)
.
• When ( in the LR(1) item A ,a ) is not empty, the look-ahead
does not have any affect
.
• When is empty (A ,a ), then
Reduce by A only if the next input symbol is a (not for any
terminal in FOLLOW(A))
• A state contains .
A ,a1 where {a1,...,an} FOLLOW(A)
...
A ,an.
Canonical Collection of Sets of LR(1) Items
• Process: similar to the construction of the canonical collection of the
sets of LR(0) items,
but closure and goto operations work a little bit different
closure(I) : ( where I is a set of LR(1) items)
– every LR(1) item in I is in closure(I)
.
– if A B, a in closure(I) and
B is a production rule of G;
then B., b belongs to closure(I) for each terminal b in FIRST(a)
goto operation
• If I is a set of LR(1) items and X is a grammar symbol
(terminal or non-terminal),
then goto (I, X) defined as follows:
– If A .X, a in I
then every item in closure({A X., a}) belongs to
goto (I,X)
Construction of The Canonical LR(1) Collection
• Algorithm:
C is { closure({S’.S,$}) }
repeat the followings until no more set of LR(1) items can be added to
C.
for each I in C and each grammar symbol X
if goto (I, X) is not empty and not in C
add goto (I, X) to C
• goto function is a DFA on the sets in C
A Short Notation for The Sets of LR(1) Items
• A set of LR(1) items containing the following items
.
A ,a1
...
.
A ,an
can be written as
.
A ,a1/a2/.../an
Canonical LR(1) Collection -- Example
S AaAb I0: S’ .S ,$ S I1: S’ S. ,$
S BbBa S .AaAb ,$ A
a
A S .BbBa ,$ I2: S [Link] ,$ to I4
B A . ,a B
B . ,b I3: S [Link] ,$ b to I5
I4: S [Link] ,$ A I6: S AaA.b ,$ b I8: S AaAb. ,$
A . ,b
I5: S [Link] ,$ B I7: S BbB.a ,$ a I9: S BbBa. ,$
B . ,a
Canonical LR(1) Collection – Example 2
S’ S I0:S’ .S,$ I1:S’ S.,$ I4:L *.R,= R to I7
1) S L=R S .L=R,$ S * R .L, = L
to I8
2) S R S .R,$ L I2:S L.=R,$ to I6 L .*R, = *
3) L *R L .*R,= R L.,$ L .id, = to I4
id
4) L id L .id,= R to I5
I3:S R.,$ id
I5:L id., =
5) R L R .L,$
I9:S L=R.,$
R I13:L *R.,$
I6:S L=.R,$ to I9
R .L,$ L I10:R L.,$
to I10
L .*R,$ *
R
I4 and I11
L .id,$ to I11 I11:L *.R,$ to I13
id R .L,$ L I5 and I12
to I12 to I10
L .*R,$ *
I7:L *R.,= L .id,$ to I11 I7 and I13
id
I8: R L.,= to I12 I8 and I10
I12:L id.,$
Construction of LR(1) Parsing Tables
1. Construct the canonical collection of sets of LR(1) items for G’.
C{I0,...,In}
2. Create the parsing action table as follows
• .
If a is a terminal, A a,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
• .
If A ,a is in Ii , then action[i,a] is reduce A where AS’.
• .
If S’S ,$ is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not LR(1).
3. Create the parsing goto table
• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.
5. Initial state of the parser contains S’.S,$
LR(1) Parsing Tables – (for Example2)
id * = $ S L R
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
5 r4 r4 no shift/reduce or
6 s12 s11 10 9 no reduce/reduce conflict
7
8
r3
r5
r3
r5
9 r1 so, it is a LR(1) grammar
10 r5
11 s12 s11 10 13
12 r4
13 r3