0% found this document useful (0 votes)
5 views20 pages

Canonical LR(0) and LR(1) Parsing Tables

The document discusses the construction of parsing tables for grammars using various methods such as SLR, LR(1), and LALR. It explains the algorithms for creating canonical collections of LR items, handling conflicts, and the significance of unambiguous grammars in parsing. Additionally, it highlights the differences between SLR and LALR parsers, including their state management and conflict resolution strategies.

Uploaded by

souryadeep097
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views20 pages

Canonical LR(0) and LR(1) Parsing Tables

The document discusses the construction of parsing tables for grammars using various methods such as SLR, LR(1), and LALR. It explains the algorithms for creating canonical collections of LR items, handling conflicts, and the significance of unambiguous grammars in parsing. Additionally, it highlights the differences between SLR and LALR parsers, including their state management and conflict resolution strategies.

Uploaded by

souryadeep097
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture #21

Construction of The Canonical LR(0) Collection

To create the SLR parsing tables for a grammar G, we will create the canonical LR(0)
collection of the grammar G’.

Algorithm:
C is { closure({S’→.S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C
GOTO function is a DFA on the sets in C. Example:
For grammar used above, Canonical LR(0) items are as follows-
I0: E’ → .E I1: E’ → E. I6: E → E+.T I9: E → E+T.
E → .E+T E → E.+T T→ .T*F T → T.*F
E → .T T→ .F
T → .T*F I2: E → T. F→ .(E) I10: T → T*F.
T → .F T → T.*F F→ .id
F → .(E)
F → .id I3: T → F. I7: T → T*.F I11: F → (E).
F → .(E)
I4: F → (.E) F → .id
E → .E+T
E → .T I8: F → (E.)
T → .T*F E → E.+T
T → .F
F → .(E)
F → .id

I5: F → id.

Transition Diagram (DFA) of GOTO Function is as follows-


Parsing Table

1. Construct the canonical collection of sets of LR(0) items for G’.


C←{I0,...,In}
2. Create the parsing action table as follows
a. If a is a terminal, Aα→.aβ in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.

b. If Aα→. is in Ii , then action[i,a] is reduce Aα→ for all a in FOLLOW(A) where

A≠S’.

c. If S’→S. is in Ii , then action[i,$] is accept.


d. If any conflicting actions generated by these rules, the grammar is not SLR(1).
3. Create the parsing goto table
a. for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.
5. Initial state of the parser contains S’→.S

Example:

For the Grammar used above, SLR Parsing table is as follows:


Action Goto
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Lecture #22
Shift/reduce and reduce/reduce conflicts

• If a state does not know whether it will make a shift operation or reduction for a terminal,
we say that there is a shift/reduce conflict.

Example:

S → L=R I0: S’ → .S I1: S’ → S. I6: S → L=.R I9: S → L=R.

S → .L=R
S→ R R → .L

L→ *R S → .R I2: S → L.=R L→ .*R

L → id L → .*R R → L. L → .id

R→ L L → .id

R → .L I3: S → R.

I7: L → *R.
I4: L → *.R
Problem in I2
R → .L

FOLLOW(R)={=,$}
L→ .*R I8: R → L.
= shift 6
L → .id
& reduce by R → L
shift/reduce conflict
I5: L → id.

• If a state does not know whether it will make a reduction operation using the production
rule i or j for a terminal, we say that there is a reduce/reduce conflict.

Example:
S → AaAb I0: S’ → .S

S → BbBa S → .AaAb

A→ε S → .BbBa

B→ε A→.

B→ .

Problem FOLLOW(A)={a,b} FOLLOW(B)={a,b}


a reduce by A → ε b reduce by A → ε
reduce by B → ε reduce by B → ε
reduce/reduce conflict reduce/reduce conflict

If the SLR parsing table of a grammar G has a conflict, we say that that grammar is not SLR
grammar.

Constructing Canonical LR(1) Parsing tables

• In SLR method, the state i makes a reduction by Aα→ when the current token is a:

– if the Aα→. in the Ii and a is FOLLOW(A)

• In some situations, βA cannot be followed by the terminal a in a right-sentential form when

αβ and the state i are on the top stack. This means that making reduction in this
case is not correct.

S → AaAb S⇒AaAb⇒Aab⇒ab S⇒BbBa⇒Bba⇒ba

S → BbBa

A→ε Aab ⇒ ε ab Bba ⇒ ε ba


B→ε AaAb ⇒ Aa ε b BbBa ⇒ Bb ε a

LR(1) Item

• To avoid some of invalid reductions, the states need to carry more information.
• Extra information is put into a state by including a terminal symbol as a second component
in an item.

• A LR(1) item is: item


A → α.β,a where a is the look-head of the LR(1) (a is a terminal or end-marker.)
• When β ( in the LR(1) item A → α.β,a ) is not empty, the look-head does not have any
affect.
• When β is empty (A → α.,a ), we do the reduction by Aα→ only if the next input symbol
is a (not for any terminal in FOLLOW(A)).

• A state will contain A → α.,a1 where {a1,...,an} ⊆ FOLLOW(A)


...
A → α.,an

Closure and GOTO Operations

closure(I) is: ( where I is a set of LR(1) items

every LR(1) item in I is in closure(I)


if Aα→.Bβ,a in closure(I) and Bγ→ is a production rule of G; then B→.γ,b will be in the closure(I)

for each terminal b in FIRST(βa) .

If I is a set of LR(1) items and X is a grammar symbol (terminal or non-terminal), then goto(I,X) is
defined as follows:

If A → α.Xβ,a in I then every item in closure({A → αX.β,a}) will be in goto(I,X).


Lecture #23
Construction of The Canonical LR(1) Collection

Algorithm:
C is { closure({S’→.S,$}) }
repeat the followings until no more set of LR(1) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C

GOTO function is a DFA on the sets in C.

A set of LR(1) items containing the following items


A → α.β,a1
...
A → α.β,an

can be written as A → α.β,a1/a2/.../an

Example:

Parsing Table

1. Construct the canonical collection of sets of LR(1) items for G’.


C←{I0,...,In}

2. Create the parsing action table as follows


a. If a is a terminal, Aα→.aβ,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j. b. If Aα→.,a is in Ii

, then action[i,a] is reduce Aα→ where A≠S’.

c. If S’→S.,$ is in Ii , then action[i,$] is accept.


d. If any conflicting actions generated by these rules, the grammar is not LR(1).

3. Create the parsing goto table


a. for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S’→.S,$

Example:

For the above used Grammar, the parse table is as follows:

id * = $ S L R
0 S5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 S5 s4 8 7
5 r4 r4
6 s12 s11 10 9
7 r3 r3
8 r5 r5
9 r1
10 r5
11 s12 s11 10 13
12 r4
13 r3
Lecture #24
Constructing LALR Parsing tables

• LALR stands for LookAhead LR.


• LALR parsers are often used in practice because LALR parsing tables are smaller than
Canonical LR parsing tables.
• The number of states in SLR and LALR parsing tables for a grammar G are equal.
• But LALR parsers recognize more grammars than SLR parsers.
• yacc creates a LALR parser for the given grammar.
• A state of LALR parser will be again a set of LR(1) items.

Canonical LR(1) Parser -> LALR Parser shrink # of states

• This shrink process may introduce a reduce/reduce conflict in the resulting LALR parser.
In that case the grammar is NOT LALR.
• This shrink process cannot produce a shift/reduce conflict.

The Core of A Set of LR(1) Items

• The core of a set of LR(1) items is the set of its first component.
Example:
S → L.=R,$ -> S → L.=R R → L.,$ R → L.

• We will find the states (sets of LR(1) items) in a canonical LR(1) parser with same cores.
Then we will merge them as a single state.

Example:

I1:L → id.,= I12: L → id.,=

L → id.,$ I2:L → id.,$ have same core, merge them

• We will do this for all states of a canonical LR(1) parser to get the states of the LALR
parser.
• In fact, the number of the states of the LALR parser for a grammar will be equal to the
number of states of the SLR parser for that grammar.

Parsing Tables

• Create the canonical LR(1) collection of the sets of LR(1) items for the given grammar.
• Find each core; find all sets having that same core; replace those sets having same cores
with a single set which is their union.
C={I0,...,In} ->C’={J1,...,Jm} where m ≤ n
• Create the parsing tables (action and goto tables) same as the construction of the
parsing tables of LR(1) parser.
– Note that: If J=I1 ∪ ... ∪ Ik since I1,...,Ik have same cores
->cores of goto(I1,X),...,goto(I2,X) must be same.
–So, goto(J,X)=K where K is the union of all sets of items having same cores as
goto(I1,X).

• If no conflict is introduced, the grammar is LALR(1) grammar. (We may only


introduce reduce/reduce conflicts; we cannot introduce a shift/reduce conflict)

Shift/Reduce Conflict

• We say that we cannot introduce a shift/reduce conflict during the shrink process for the
creation of the states of a LALR parser.
• Assume that we can introduce a shift/reduce conflict. In this case, a state of LALR parser
must have:
A → α.,a and B → β.aγ,b

• This means that a state of the canonical LR(1) parser must have: A → α.,a and

B → β.aγ,c

But, this state has also a shift/reduce conflict. i.e. The original canonical

LR(1) parser has a conflict.

(Reason for this, the shift operation does not depend on Lookaheads)

Reduce/Reduce Conflict

But, we may introduce a reduce/reduce conflict during the shrink process for the creation of the
states of a LALR parser.

I1 : A → α.,a I2: A → α.,b

B → β.,b B → β.,c

I12: A → α.,a/b - > reduce/reduceconflict

B → β.,b/c
Example:

For the above Canonical LR Parsing table, we can get the following LALR(1) collection
Lecture #25
Using Ambiguous Grammars

• All grammars used in the construction of LR-parsing tables must be un-ambiguous.


• Can we create LR-parsing tables for ambiguous grammars?
– Yes, but they will have conflicts.
– We can resolve these conflicts in favor of one of them to disambiguate
the grammar.
– At the end, we will have again an unambiguous grammar.
• Why we want to use an ambiguous grammar?
– Some of the ambiguous grammars are much natural, and a
corresponding unambiguous grammar can be very complex.
– Usage of an ambiguous grammar may eliminate unnecessary reductions.
Example:
E → E+T | T

E → E+E | E*E |

(E) | id T → T*F

| F

F → (E) | id
Sets of LR(0) Items for Ambiguous Grammar

SLR-Parsing Tables for Ambiguous Grammar

FOLLOW(E) = { $,+,*,) }
State I7 has shift/reduce conflicts for symbols + and *.
when current token is +
shift ->+ is right-associative reduce ->+ is left-associative
when current token is *
shift - > * has higher
precedence than + reduce->+
has higher precedence than *

State I8 has shift/reduce conflicts for symbols +


and *. when current token is *
shift ->* is right-associative reduce ->* is left-associative
when current token is +
shift - > + has higher
precedence than * reduce->*
has higher precedence than +

id + * ( ) $ E

0 s3 s2 1

1 s4 s5 acc

2 s3 s2 6

3 r4 r4 r4 r4

4 s3 s2 7

5 s3 s2 8

6 s4 s5 s9

7 r1 s5 r1 r1

8 r2 r2 r2 r2

9 r3 r3 r3 r3
Module-2
Lecture #26

SYNTAX-DIRECTED TRANSLATION
• Grammar symbols are associated with attributes to associate information with the programming
language constructs that they represent.
• Values of these attributes are evaluated by the semantic rules associated with the production
rules.
• Evaluation of these semantic rules:
– may generate intermediate codes
– may put information into the symbol table
– may perform type checking
– may issue error messages
– may perform some other activities
– In fact, they may perform almost any activities.
• An attribute may hold almost any thing.
A string, a number, a memory location, a complex record.
• Evaluation of a semantic rule defines the value of an attribute. But a semantic rule may also have
some side effects such as printing a value.

Example:

Production Semantic Rule Program Fragment

L → E return print([Link]) print(val[top-1])


1 1 val[ntop] = val[top-2] + val[top]
E→E +T [Link] = E .val + [Link]
E→T [Link] = [Link]
1 1 val[ntop] = val[top-2] * val[top]
T→T*F [Link] = T .val * [Link]
T→F [Link] = [Link]
F→(E) [Link] = [Link] val[ntop] = val[top-1]
F → digit [Link] = [Link] val[top] = [Link]

• Symbols E, T, and F are associated with an attribute val.


• The token digit has an attribute lexval (it is assumed that it is evaluated by the lexical analyzer).
• The Program Fragment above represents the implementation of the semantic rule for a bottom-up
parser.
• At each shift of digit, we also push [Link] into val-stack.
• At all other shifts, we do not put anything into val-stack because other terminals do not have
attributes (but we increment the stack pointer for val-stack).
• The above model is suited for a desk calculator where the purpose is to evaluate and to
generate code.

1. Intermediate Code Generation

• Intermediate codes are machine independent codes, but they are close to machine instructions.
• The given program in a source language is converted to an equivalent program in an
intermediate language by the intermediate code generator.
• Intermediate language can be many different languages, and the designer of the compiler
decides this intermediate language.
– syntax trees can be used as an intermediate language.
– postfix notation can be used as an intermediate language.
– three-address code (Quadraples) can be used as an intermediate language
• we will use quadraples to discuss intermediate code generation
• quadraples are close to machine instructions, but they are not actual machine instructions.

2 Syntax Tree

Syntax Tree is a variant of the Parse tree, where each leaf represents an operand and each interior
node an operator.

Example:

Production Semantic Rule

E → E1 op E2 [Link] = NODE (op, [Link], [Link]) E → (E1) [Link] = [Link]


E → - E1 [Link] = UNARY ( - , [Link]) E → id [Link] = LEAF ( id )

A sentence a*(b+d) would have the following syntax tree:

Postfix Notation

Postfix Notation is another useful form of intermediate code if the language is mostly expressions.

Example:

Production Semantic Rule Program Fragment


print op print id
E → E1 op E2 [Link] = [Link] || [Link] || op
E → (E1) E → id [Link] = [Link]
[Link] = id
3 Three Address Code

• We use the term “three-address code” because each statement usually contains three addresses
(two for operands, one for the result).
• The most general kind of three-address code is:
x := y op z
where x, y and z are names, constants or compiler-generated temporaries; op is any operator.
• But we may also the following notation for quadraples (much better notation because it looks like a
machine code instruction)
op y,z,x
apply operator op to y and z, and store the result in x.
4 Representation of three-address codes

Three-address code can be represented in various forms viz. Quadruples, Triples and
Indirect Triples. These forms are demonstrated by way of an example below. Example:
A = -B * (C + D) Three-Address code is as follows:

T1 = -B

T2 = C + D T3 = T1 * T2

A = T3

Quadruple:
Operator Operand 1 Operand 2 Result

(1) - B T1
(2) + C D T2
(3) * T1 T2 T3
(4) = A T3

Triple:

Operator Operand 1 Operand 2


(1) - B
(2) + C D
(3) * (1) (2)
(4) = A (3)

Indirect Triple:

Statement
(0) (56)
(1) (57)
(2) (58)
(3) (59)

Operator Operand 1 Operand 2


(56) - B
(57) + C D
(58) * (56) (57)
(59) = A (58)
Lecture #27
Translation of Assignment Statements

A statement A := - B * (C + D) has the following three-address translation: T1 := - B


T2 := C+D
T3 := T1* T2
A := T3

Production Semantic Action

S → id := E [Link] = [Link] || gen( [Link] = [Link] )

E → E1 + E2 [Link] = newtemp();
[Link] = [Link] || [Link] || gen( [Link] = [Link] + [Link] )

E → E1 * E2 [Link] = newtemp();
[Link] = [Link] || [Link] || gen( [Link] = [Link] * [Link] )

E → - E1 [Link] = newtemp();
[Link] = [Link] || gen( [Link] = - [Link] )

E → ( E1 ) [Link] = [Link]; [Link] = [Link]

E → id [Link] = [Link]; [Link] = null

1. Translation of Boolean Expressions


Grammar for Boolean Expressions is: E->E or E
E->E and E
E->not E E->( E ) E->id
E->id relop id

There are two representations viz. Numerical and Control-Flow.

Numerical Representation of Boolean

o TRUE is denoted by 1 and FALSE by 0.


o Expressions are evaluated from left to right, in a manner similar to arithmetic expressions.

Example:
The translation for A or B and C is the three-address sequence:

T1 := B and C T2 := A or T1

Also, the translation of a relational expression such as A < B is the three-address sequence:

(1) if A < B goto (4) (2) T := 0


(3) goto (5) (4) T := 1 (5)
Therefore, a Boolean expression A < B or C can be translated as: (1) if A < B goto (4)
(2) T1 := 0
(3) goto (5) (4) T1 := 1
(5) T2 := T1 or C

Production Semantic Action

E->E1 or E2 T = newtemp ();


[Link] = T;
Gen (T = [Link] or [Link])

E->E1 and E2 T = newtemp (); [Link] = T;


Gen (T = [Link] and [Link])

E->not E1 T = newtemp (); [Link] = T;


Gen (T = not [Link])

E → ( E1 ) [Link] = [Link]; [Link] = [Link]

E → id [Link] = [Link]; [Link] = null

E->id1 relop id2 T = newtemp (); [Link] = T;


Gen (if [Link] relop [Link] goto NEXTQUAD+3) Gen (T = 0)
Gen (goto NEXTQUAD+2)
` Gen (T = 1)

o Quadruples are being generated and NEXTQUAD indicates the next available entry in the
quadruple array.

Control-Flow Representation of Boolean Expressions

o If we evaluate Boolean expressions by program position, we may be able to avoid evaluating


the entire expressions.
o In A or B, if we determine A to be true, we need not evaluate B and can declare the entire
expression to be true.
o In A and B, if we determine A to be false, we need not evaluate B and can declare the entire
expression to be false.
o A better code can thus be generated using the above properties.

Example:

The statement if (A<B || C<D) x = y + z; can be translated as


(1) if A<B goto (4) (2) if C<D goto (4) (3) goto (6)
(4) T = y + z
(5) X = T (6)

Here (4) is a true exit and (6) is a false exit of the Boolean expressions.
Lecture #28
Generating 3-address code for Numerical Representation of Boolean expressions

o Consider a production E->E1 or E2 that represents the OR Boolean expression. If E1 is


true, we know that E is true so we make the location TRUE for E1 be the same as TRUE for E. If E1
is false, then we must evaluate E2, so we make FALSE for E1 be the first statement in the code for
E2. The TRUE and FALSE exits can be made the same as the TRUE and FALSE exits of E,
respectively.
o Consider a production E->E1 and E2 that represents the AND Boolean expression. If
E1 is false, we know that E is false so we make the location FALSE for E1 be the same
as FALSE for E. If E1 is true, then we must evaluate E2, so we make TRUE for E1 be the first
statement in the code for E2. The TRUE and FALSE exits can be made the same as the TRUE and
FALSE exits of E, respectively.
o Consider the production E->not E that represents the NOT Boolean expression. We
may simply interchange the TRUE and FALSE exits of E1 to get the TRUE and FALSE
exits of E.
o To generate quadruples in the manner suggested above, we use three functions- Makelist,
Merge and Backpatch that shall work on the list of quadruples as suggested by their name.
o If we need to proceed to E2 after evaluating E1, we have an efficient way of doing this by
modifying our grammar as follows:

E->E or M E
E->E and M E E->not E
E->( E ) E->id
E->id relop id
M->ε

The translation scheme for this grammar would as follows:

Production Semantic Action

E->E1 or M E2 BACKPATCH ([Link], [Link]); [Link] = MERGE


([Link], [Link]); [Link] = [Link];

E->E1 and M E2 BACKPATCH ([Link], [Link]); [Link] = [Link];


[Link] = MERGE ([Link], [Link]);

E->not E1 [Link] = [Link]; [Link] = [Link];


E->( E1 ) [Link] = [Link]; [Link] = [Link];

E->id [Link] = MAKELIST (NEXTQUAD); [Link] = MAKELIST


(NEXTQUAD + 1); GEN (if [Link] goto _ );
GEN (goto _ );

E->id1 relop id2 [Link] = MAKELIST (NEXTQUAD); [Link] = MAKELIST


(NEXTQUAD + 1); GEN ( if [Link] relop [Link] goto _ );
GEN (goto _ );

M->ε [Link] = NEXTQUAD;

Example:

For the expression P<Q or R<S and T, the parsing steps and corresponding semantic actions are
shown below. We assume that NEXTQUAD has an initial value of 100.

Step 1: P<Q gets reduced to E by E->id relop id. The grammatical form is E1 or R<S
and T.

We have the following code generated (Makelist).

100: if P<Q goto _


101: goto _

E1 is true if goto of 100 is reached and false if goto of 101 is reached.

Step 2: R<S gets reduced to E by E->id relop id. The grammatical form is E1 or E2 and
T.

We have the following code generated (Makelist).

102: if R<S goto _


103: goto _

E2 is true if goto of 102 is reached and false if goto of 103 is reached. Step 3: T gets reduced to E
by E->id. The grammatical form is E1 or E2 and E3.
We have the following code generated (Makelist).

104: if T goto _
105: goto _

E3 is true if goto of 104 is reached and false if goto of 105 is reached.

Step 4: E2 and E3 gets reduced to E by E->E and E. The grammatical form is E1 or E4.

We have no new code generated but changes are made in the already generated code (Backpatch).
100: if P<Q goto _
101: goto _
102: if R<S goto 104
103: goto _
104: if T goto _
105: goto _

E4 is true only if [Link] (goto of 104) is reached. E4 is false if [Link] (goto of 103) or
[Link] (goto of 105) is reached (Merge).

Step 5: E1 or E4 gets reduced to E by E->E or E. The grammatical form is E.

We have no new code generated but changes are made in the already generated code (Backpatch).

100: if P<Q goto _


101: goto 102
102: if R<S goto 104
103: goto _
104: if T goto _
105: goto _

E is true only if [Link] (goto of 100) or [Link] (goto of 104) is reached (Merge). E is false if
[Link] (goto of 103 or 105) is reached.

1. Mixed Mode Expressions

o Boolean expressions may in practice contain arithmetic sub expressions e.g. (A+B)>C.
o We can accommodate such sub-expressions by adding the production E->E op E to our
grammar.
o We will also add a new field MODE for E. If E has been achieved after reduction using
the above (arithmetic) production, we make [Link] = arith, otherwise make [Link] =
bool.
o If [Link] = arith, we treat it arithmetically and use [Link]. If [Link] = bool, we treat
it as Boolean and use [Link] and [Link].

You might also like