0% found this document useful (0 votes)
8 views45 pages

Bottom Up Parser

The document provides an overview of bottom-up parsing, specifically focusing on shift-reduce parsing techniques. It explains the actions involved in parsing, such as shifting and reducing, and discusses the concepts of handles and handle pruning. Additionally, it covers the construction of LR parsing tables, the closure operation, and the canonical LR(0) collection necessary for implementing SLR parsers.

Uploaded by

rajprashant3302
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views45 pages

Bottom Up Parser

The document provides an overview of bottom-up parsing, specifically focusing on shift-reduce parsing techniques. It explains the actions involved in parsing, such as shifting and reducing, and discusses the concepts of handles and handle pruning. Additionally, it covers the construction of LR parsing tables, the closure operation, and the canonical LR(0) collection necessary for implementing SLR parsers.

Uploaded by

rajprashant3302
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS 2201: Formal Language & Automate Theory

Bottom-up Parsing

1
Bottom-Up Parsing
• Bottom-up parser:
– parse tree created from the given input starting from leaves towards the root
– tries to find the right-most derivation of the given input in the reverse order
S  ...   (the right-most derivation of )
 (the bottom-up parser finds the right-most derivation in the reverse
order)

• Bottom-up parsing: also known as shift-reduce parsing because its two main
actions are shift and reduce
– At each shift action, the current symbol in the input string is pushed to a stack
– At each reduction step, the symbols at the top of the stack (this symbol sequence is
the right side of a production) replaced by the non-terminal at the left side of that
production
– Two more actions: accept and error
Shift-Reduce Parsing
• Shift-reduce parser tries to reduce the given input string into the starting symbol

a string  the starting symbol


reduced to

• At each reduction step, a substring of the input matching to the right side of a
production rule is replaced by the non-terminal at the left side of that production rule

• If the substring is chosen correctly, the right most derivation of that string is created in
the reverse order
rm
*

Rightmost Derivation: S


rm rm
Shift-Reduce Parser finds:   ...  S
Shift-Reduce Parsing -- Example

S  aABb input string: aaabb


A  aA | a aaAbb
B  bB | b aAbb  reduction
aABb
S

 aABb 
S rm  aaAbb 
rm aAbb rm rm aaabb

Right Sentential Forms

• How do we know which substring to be replaced at each reduction step?


Handle
• Handle of a string is a substring that matches the right side of a
production rule
– not every substring that matches the right side of a production rule is
handle
• A handle of a right sentential form  ( ) :
a production rule A   and a position of 
where the string  may be found and replaced by A to produce
the previous right-sentential form in a rightmost derivation of 

rm
* rm

S  A  
 is a string of terminals

• If the grammar is unambiguous, then every right-sentential form of the


grammar has exactly one handle
Handle Pruning
• A right-most derivation in reverse can be obtained by handle-pruning

S=0 
rm 1 rm
 2 rm
 ... rm
 n-1 
rm n= 
input string

• Start from n, find a handle Ann in n, and replace n by An to get n-1

• Then find a handle An-1n-1 in n-1, and replace n-1 by An-1 to get n-2

• Repeat this, until we reach S


A Shift-Reduce Parser
E  E+T | T Right-Most Derivation of id + id*id
T  T*F | F E  E+T  E+T*F  E+T*id  E+F*id
F  (E) | id  E+id*id  T+id*id  F+id*id  id+id*id

Right-Most Sentential Form Reducing Production


id+id*id F  id
F+id*id TF
T+id*id ET
E+id*id F  id
E+F*id TF
E+T*id F  id
E+T*F T  T*F
E+T E  E+T
E
Handles are red and underlined in the right-sentential forms.
A Stack Implementation of A Shift-Reduce Parser

Four possible actions of a shift-parser :

1. Shift : The next input symbol is shifted onto the top of the stack
2. Reduce: Replace the handle on the top of the stack by the non-
terminal
3. Accept: Successful completion of parsing
4. Error: Parser discovers a syntax error, and calls an error recovery
routine

• Initial stack: contains only the end-marker $


• End of the input string: marked by the end-marker $
A Stack Implementation of A Shift-Reduce Parser

Stack Input Action


$ id+id*id $ shift
$id +id*id$ reduce by F  id Parse Tree
$F +id*id$ reduce by T  F
$T +id*id$ reduce by E  T E 8
$E +id*id$ shift
$E+ id*id$ shift E 3 + T 7
$E+id *id$ reduce by F  id
$E+F *id$ reduce by T  F T 2 T 5 * F6
$E+T *id$ shift
$E+T* id$ shift F 1 F 4 id
$E+T*id $ reduce by F  id
$E+T*F $ reduce by T  T*F id id
$E+T $ reduce by E  E+T
$E $ accept
Conflicts During Shift-Reduce Parsing

• For certain class of CFGs, shift-reduce parsers cannot be used


• Stack contents and the next input symbol may not decide action:
– shift/reduce conflict: Whether make a shift operation or a reduction
– reduce/reduce conflict: The parser cannot decide which of several
reductions to make
• If a shift-reduce parser cannot be used for a grammar, that grammar is
called as non-LR(k) grammar

left to right right-most k lookhead


scanning derivation

An ambiguous grammar can never be a LR grammar


Shift-Reduce Parsers
Two main categories of shift-reduce parsers:

1. Operator-Precedence Parser
– simple, but only a small class of grammars

CFG
LR
LALR
2. LR-Parsers
SLR
– covers wide range of grammars
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookahead LR parser)
– SLR, LR and LALR work same, only their parsing tables are different
LR Parsers
• The most powerful shift-reduce parsing (yet efficient) is:

LR(k) parsing

left to right right-most k lookahead


scanning derivation

• LR parsing is attractive because:


– LR parsers can be constructed to recognize virtually all programming language
constructs for which CFGs can be written
– LR parsing is most general non-backtracking shift-reduce parsing, yet it is still
efficient
– Class of grammars that can be parsed using LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers or LL methods
LL(1)-Grammars  LR(1)-Grammars
– LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-
right scan of the input
LR (K) vs. LL (K)
• LR (k) grammar

– It is sufficient to show that a left to right shift reduce parsers be able


to recognize handles of right-sentential forms when they appear on
the top of the stack

– Or, we must be able to recognize the occurrence of the right side of a


production in a right-sentential form, with k input symbols of
lookahead

• LL(k) grammar: we must able to recognize the use of a production


seeing only the first k symbols of what its right side derives
LR Parsing Algorithm

input a1 ... ai ... an $


stack
Sm
Xm
LR Parsing Algorithm output
Sm-1
Xm-1
.
.
Action Table Goto Table
S1 terminals and $ non-terminal
X1 s s
t four different t each item is
S0 a actions a a state number
t t
e e
s s
A Configuration of LR Parsing Algorithm

• A configuration of a LR parsing is:

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

• Sm and ai decides the parser action by consulting the parsing action


table (Initial Stack contains just So )
• So : does not represent any grammar symbol

• A configuration of a LR parsing represents the right sentential form:

X1 ... Xm ai ai+1 ... an $


Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )  ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

2. reduce A (or rn where n is a production number)


– pop 2|| (=r) items from the stack;
– then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )  ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

– Output is the reducing production A

3. Accept – Parsing successfully completed

4. Error -- Parser detected an error (an empty entry in the action table)
(SLR) Parsing Tables for Expression Grammar
Action Table Goto Table
1) E  E+T state id + * ( ) $ E T F

2) ET 0 s5 s4 1 2 3
1 s6 acc
3) T  T*F 2 r2 s7 r2 r2
4) TF 3 r4 r4 r4 r4
5) F  (E) 4 s5 s4 8 2 3
6) F  id 5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Actions of A (S)LR-Parser -- Example

stack input action output


0 id*id+id$ shift 5
0id5 *id+id$ reduce by Fid Fid
0F3 *id+id$ reduce by TF TF
0T2 *id+id$ shift 7
0T2*7 id+id$ shift 5
0T2*7id5 +id$ reduce by Fid Fid
0T2*7F10 +id$ reduce by TT*F TT*F
0T2 +id$ reduce by ET ET
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by Fid Fid
0E1+6F3 $ reduce by TF TF
0E1+6T9 $ reduce by EE+T EE+T
0E1 $ accept
Constructing SLR Parsing Tables – LR(0) Item
• LR(0) item: a production of G a dot at some position of the right side
Ex: A  aBb Possible LR(0) Items: ..
A  aBb
A  a Bb
(four different possibility)
..
A  aB b
A  aBb
• Sets of LR(0) items: states of action and goto table of the SLR parser
• Collection of sets of LR(0) items (the canonical LR(0) collection) is
the basis for constructing SLR parsers
• Augmented Grammar: G’ is
G with a new production rule S’S where S’ is the new starting
symbol
The Closure Operation

• I: set of LR(0) items for a grammar G


closure(I): Set of LR(0) items constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I)

..
2. If A   B is in closure(I) and B is a production rule of G;
then B  will be in the closure(I)

We will apply this rule until no more new LR(0) items can be added
to closure(I)
The Closure Operation -- Example

E’  E .
closure({E’  E}) =
E  E+T .
{ E’  E kernel items
ET .
E  E+T
T  T*F .
E T
TF .
T  T*F
F  (E) .
T F
F  id .
F  (E)
.
F  id }
Kernel and Non-kernel Items

• Each set of items formed by taking the closure of a set of kernel items

• We are really interested in kernel items

• Non-kernel items can be removed to save storage

• Non-kernel items could be generated by the closure process


Goto Operation
• If I is a set of LR(0) items and X is a grammar symbol (terminal or non-
terminal), then goto (I,X) is defined as follows:
.
– If A   X in I
.
then every item in closure({A  X }) will be in goto (I,X)
Example:
.. .. .
I ={ E’  E, E  E+T, E  T,
T  T*F, T  F,
. .. .
F  (E), F  id }
goto (I,E) = { E’  E , E  E +T }
.. .
goto (I,T) = { E  T , T  T *F }
goto (I,F) = {T  F }
.. .. . .
goto (I,() = { F  ( E), E  E+T, E 
F  (E), F  id }
T, T  T*F, T  .
F,

.
goto (I,id) = { F  id }
Construction of The Canonical LR(0) Collection

• To create the SLR parsing tables for a grammar G, we will create the
canonical LR(0) collection of the grammar G’

• Algorithm:
.
C is { closure({S’ S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C

• goto function is a DFA on the sets in C.


The Canonical LR(0) Collection -- Example

I0: E’  .E I1: E’  E. I6: E  E+.T I9: E  E+T.


E  .E+T E  E.+T T  .T*F T  T.*F
E  .T T  .F
T  .T*F I2: E  T. F  .(E) I10: T  T*F.
T  .F T  T.*F F  .id
F  .(E)
F  .id I3: T  F. I7: T  T*.F I11: F  (E).
F  .(E)
I4: F  (.E) F  .id
E  .E+T
E  .T I8: F  (E.)
T  .T*F E  E.+T
T  .F
F  .(E)
F  .id

I5: F  id.
Transition Diagram (DFA)

E + T
I0 I1 I6 I9 * to I7
F
( to I3
T
id to I4
to I5
F I2 * I7 F
I10
(
I3 id to I4
(
to I5
I4 E I8 )
id id T
F
to I2 + I11
I5 to I3 to I6
(
to I4
Constructing SLR Parsing Table
(of an augumented grammar G’)

1. Construct the canonical collection of sets of LR(0) items for G’


C{I0,...,In}
2. Create the parsing action table as follows
• If a is a terminal, A.a in Ii and goto (Ii, a)=Ij then action[i, a] is shift j
• If A. is in Ii , then action[i,a] is reduce A for all a in FOLLOW(A)
where AS’
• If S’S. is in Ii , then action[i,$] is accept
• If any conflicting actions generated by these rules, the grammar is not SLR(1)

3. Create the parsing goto table


• for all non-terminals A, if goto (Ii, A)=Ij then goto [i, A]=j

4. All entries not defined by (2) and (3) are errors


5. Initial state of the parser contains S’.S
The Canonical LR(0) Collection -- Example

I0: E’  .E I1: E’  E. I6: E  E+.T I9: E  E+T.


E  .E+T E  E.+T T  .T*F T  T.*F
E  .T T  .F
T  .T*F I2: E  T. F  .(E) I10: T  T*F.
T  .F T  T.*F F  .id
F  .(E)
F  .id I3: T  F. I7: T  T*.F I11: F  (E).
F  .(E)
I4: F  (.E) F  .id
E  .E+T
E  .T I8: F  (E.)
T  .T*F E  E.+T
T  .F
F  .(E)
F  .id

I5: F  id.
Parsing Tables of Expression Grammar

Action Table Goto Table


state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
SLR(1) Grammar

• An LR parser using SLR(1) parsing tables for a grammar G is called as


the SLR(1) parser for G

• If a grammar G has an SLR(1) parsing table, it is called SLR(1)


grammar (or SLR grammar in short)

• Every SLR grammar is unambiguous, but every unambiguous grammar


is not a SLR grammar
shift/reduce and reduce/reduce conflicts

• shift/reduce conflict: choice between a shift operation or reduction for


a terminal

• reduce/reduce conflict: If a state does not know whether it will make a


reduction operation using the production rule i or j for a terminal

• If the SLR parsing table of a grammar G has a conflict, we say that the
grammar is not SLR grammar
Conflict Example

S  L=R I0: S’  .S I1:S’  S. I6:S  L=.R I9: S  L=R.


SR S  .L=R R  .L
L *R S  .R I2:S  L.=R L .*R
L  id L  .*R R  L. L  .id
RL L  .id
R  .L I3:S  R.

I4:L  *.R I7:L  *R.


Problem R  .L
FOLLOW(R)={=,$} L .*R I8:R  L.
= shift 6 L  .id
reduce by R  L
shift/reduce conflict I5:L  id.
Conflict Example2

S  AaAb I0: S’  .S
S  BbBa S  .AaAb
A S  .BbBa
B A.
B.

Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
a reduce by A   b reduce by A  
reduce by B   reduce by B  
reduce/reduce conflict reduce/reduce conflict
Constructing Canonical LR(1) Parsing Tables

• In SLR method, the state i makes a reduction by A when the current
token is a:
– if A. in Ii and a is in FOLLOW(A)

• In some situations, A cannot be followed by the terminal a in a right-


sentential form when  and the state i are on the top stack. This means
that making reduction in this case is not correct.

S  AaAb SAaAbAabab SBbBaBbaba


S  BbBa
A Aab   ab Bba   ba
B AaAb  Aa  b BbBa  Bb  a
An Example
S  L=R I0: S’  .S I1:S’  S. I6:S  L=.R I9: S  L=R.
SR S  .L=R R  .L
L *R S  .R I2:S  L.=R L .*R
L  id L  .*R R  L. L  .id
RL L  .id
R  .L I3:S  R.

I4:L  *.R I7:L  *R.


Problem R  .L
FOLLOW(R)={=,$} L .*R I8:R  L.
= shift 6 L  .id
reduce by R  L
I5: L  id.
- If L is reduced to R then the contents appear as: R= (no right sentential form can derive it)
- R  L is INVALID for the input symbol “=“
LR(1) Item

• To avoid some of invalid reductions, the states need to carry more


information

• Extra information is put into a state by including a terminal symbol as a


second component in an item
• A LR(1) item is:
.
A   ,a where a is the look-ahead of the LR(1) item
(a is a terminal or end-marker)
LR(1) Item (cont.)
.
• When  ( in the LR(1) item A   ,a ) is not empty, the look-ahead
does not have any affect

.
• When  is empty (A   ,a ), then
Reduce by A only if the next input symbol is a (not for any
terminal in FOLLOW(A))

• A state contains .
A   ,a1 where {a1,...,an}  FOLLOW(A)
...
A   ,an.
Canonical Collection of Sets of LR(1) Items
• Process: similar to the construction of the canonical collection of the
sets of LR(0) items,
but closure and goto operations work a little bit different

closure(I) : ( where I is a set of LR(1) items)


– every LR(1) item in I is in closure(I)
.
– if A B, a in closure(I) and
B is a production rule of G;

then B., b belongs to closure(I) for each terminal b in FIRST(a)


goto operation

• If I is a set of LR(1) items and X is a grammar symbol


(terminal or non-terminal),

then goto (I, X) defined as follows:


– If A  .X, a in I
then every item in closure({A  X., a}) belongs to
goto (I,X)
Construction of The Canonical LR(1) Collection

• Algorithm:
C is { closure({S’.S,$}) }
repeat the followings until no more set of LR(1) items can be added to
C.
for each I in C and each grammar symbol X
if goto (I, X) is not empty and not in C
add goto (I, X) to C

• goto function is a DFA on the sets in C


A Short Notation for The Sets of LR(1) Items

• A set of LR(1) items containing the following items


.
A   ,a1
...
.
A   ,an

can be written as

.
A   ,a1/a2/.../an
Canonical LR(1) Collection -- Example
S  AaAb I0: S’  .S ,$ S I1: S’  S. ,$
S  BbBa S  .AaAb ,$ A
a
A S  .BbBa ,$ I2: S  [Link] ,$ to I4
B A  . ,a B
B  . ,b I3: S  [Link] ,$ b to I5

I4: S  [Link] ,$ A I6: S  AaA.b ,$ b I8: S  AaAb. ,$


A  . ,b

I5: S  [Link] ,$ B I7: S  BbB.a ,$ a I9: S  BbBa. ,$


B  . ,a
Canonical LR(1) Collection – Example 2
S’  S I0:S’  .S,$ I1:S’  S.,$ I4:L  *.R,= R to I7
1) S  L=R S  .L=R,$ S * R  .L, = L
to I8
2) S  R S  .R,$ L I2:S  L.=R,$ to I6 L .*R, = *
3) L *R L  .*R,= R  L.,$ L  .id, = to I4
id
4) L  id L  .id,= R to I5
I3:S  R.,$ id
I5:L  id., =
5) R  L R  .L,$

I9:S  L=R.,$
R I13:L  *R.,$
I6:S  L=.R,$ to I9
R  .L,$ L I10:R  L.,$
to I10
L  .*R,$ *
R
I4 and I11
L  .id,$ to I11 I11:L  *.R,$ to I13
id R  .L,$ L I5 and I12
to I12 to I10
L .*R,$ *
I7:L  *R.,= L  .id,$ to I11 I7 and I13
id
I8: R  L.,= to I12 I8 and I10
I12:L  id.,$
Construction of LR(1) Parsing Tables
1. Construct the canonical collection of sets of LR(1) items for G’.
C{I0,...,In}

2. Create the parsing action table as follows


• .
If a is a terminal, A a,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
• .
If A ,a is in Ii , then action[i,a] is reduce A where AS’.
• .
If S’S ,$ is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not LR(1).

3. Create the parsing goto table


• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S’.S,$


LR(1) Parsing Tables – (for Example2)
id * = $ S L R
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
5 r4 r4 no shift/reduce or
6 s12 s11 10 9 no reduce/reduce conflict
7
8
r3
r5
r3
r5 
9 r1 so, it is a LR(1) grammar
10 r5
11 s12 s11 10 13
12 r4
13 r3

You might also like