University of Science and Technology Houari Boumediene
Faculty of Computer Science
Compilation
3rd ING computer security
2024/2025
Dr [Link] [Link]@[Link]
1
The steps of Input lexical analysis
compilation lexical
entities
Syntactic analysis.
syntax
tree
Symbol
handling semantic analysis table
errors management
generation of
intermediate code
Optimizing the
intermediate code
object
object code generation
code
2
Introduction :
Lexical analysis aims to transform a source program, which is a sequence of characters, into
lexical tokens.
The role of syntactic analysis is to verify whether the sequence of lexical entities provided by
the lexical analyzer respects a precise order of succession described by the syntax of the
language.
This verification is done using a language-specific grammar called a syntax grammar.
We notice that syntax analysis is the most formalized stage of compilation.
lexical Syntax
Input lexical analysis Syntactic analysis.
entities tree
3
Basic concepts
Definition of a syntactic grammar
A syntax grammar G is composed of production rules that can generate all programs written in a
given language. It is defined by a quadruple <T,N,S,P> where:
• T represents a non-empty set of terminal symbols of G,
• N represents a non-empty set of non-terminal symbols,
• S is the axiom,
• P is composed of the set of production rules of the form A→α where A N and α (T N)*.
A represents the left-hand side of the production (LHP) and α represents the right-hand side
of the production (RHP). A production rule specifies that A can be replaced by the sequence
of symbols α.
4
Exemple1:
Soit l’instruction Pascal suivante :
<inst-if> If <condition> then <inst1> else <inst2>
Exemple2:
Soit la grammaire des expressions arithmétiques suivante:
<program> <expression>
<expression> <term> / <expression> + <term>
<term> <fact> / <term> * <fact>
<fact> id / (<expression>)
5
Concept of derivation
A derivation is an operation that consists, using a production rule of the form A →α, of
replacing the left-hand side A with the right-hand side α.
A derivation is the process of building a sentence or program by starting with the initial
symbol and repeatedly replacing non-terminal symbols with their equivalent sequences of
symbols as defined by the grammar's production rules.
Concept of reduction
A reduction is the inverse operation of a derivation. It involves replacing a recognized
right-hand side of a production rule with its corresponding left-hand side non-terminal.
Concept of a Syntax Tree
A syntax tree is a tree where:
•The root represents the axiom of the syntax grammar.
•The nodes represent the right-hand sides of production rules.
•The leaves represent the tokens of the string15/10/2024
being parsed.
6
Example :
o Using the grammar from example 2, provide the derivations for the string id+id*id#
o Give the different reduction operations of the string id+id*id#
Two methods are used to construct a syntax tree:
• Top-down parsing: Starting from the axiom, a series of derivations are performed until the
input string is reached. (c’est le cas descendant.)
• Bottom-up parsing: Starting from the input string, a series of reductions are performed
until the axiom is reached. (c’est le cas ascendant.)
• A tree that has two children is called binary.
7
Example :
Top-down parsing: Starting from
the axiom, a series of derivations are
performed until the input string is
reached. (c’est le cas descendant.)
8
Example :
Bottom-up parsing: Starting from
the input string, a series of reductions
are performed until the axiom is
reached. (c’est le cas ascendant.)
9
Concept of Ambiguity
Concept of Ambiguity A grammar is said to be ambiguous if there exist two or
more different syntax trees corresponding to the same string of words.
Question:
Is the following grammar ambiguous? G: E -> E+E/ E*E/ id / (E)
Analyze the string id + id * id #
10
Concept of Ambiguity
The grammar G is ambiguous because, as an example, two syntax trees can be
associated with the string id +id *id, when applying a top-down method,
11
Eliminating left recursion in a grammar
• A top-down parse cannot be performed on a grammar that is directly or indirectly
left-recursive.
• If in a grammar, there exists a production of the type A → Aα where α (T N)+,
the identification of the string to be parsed with one of the right-hand sides of the
production will be blocked (infinite loop).
• It is thus necessary to eliminate direct and indirect left recursions in a grammar.
12
Definition Direct Left Recursion:
A grammar is said to be directly left-recursive if it has a non-terminal A N such
that A → Aα | β where A N and α,β (T N) .
Definition Indirect Left Recursion:
A grammar is indirectly left-recursive if it has a non-terminal A N such that
A −→ Aα | β where α,β (T N) .
A → B α1 / α2
B → A α / β, α1, α 2, α , β N
If the grammar is of the form A → α A, we will talk about right recursion.
13
Elimination of direct left recursion
Eliminating direct left recursion involves transforming production rules of a non-
terminal A of the type:
A → A α1 / A α2 / β1 / β2 , A N et βi ,αi (T N)
gives A → β1 A ’ / β2 A ’ / β1/ β2
P A ’ → α1 A’ / α2 A ’ / α1 / α2
14
Example :
Let G be the grammar defined by <T,N,S,P>
where T={a,b,c}; N={S,A} and
P: S→Sa | SAb | Ac | a
A→aA | c
15
Example :
Let G be the grammar defined by <T,N,S,P> where T={a,b,c}; N={S,A} and
P: S→Sa | SAb | Ac | a
A→aA | c
G is directly left-recursive in S.
The direct elimination of direct left recursion in S will transform the grammar G
into an equivalent grammar G' defined by <T, N',S,P'> where: T={a,b,c};
N'={S,S',A} and
P′ : or P ′ :
S→ Ac S′ | a S′ | Ac | a S→ Ac S′ | a S′
S′→aS′ | AbS′ | a | Ab S′→aS′ | AbS′ | ԑ
A→aA | c A→aA | c
16
Elimination of indirect left recursion
To eliminate the case of indirect left recursion of a non-terminal A, it is necessary
to perform substitutions in such a way as to make a direct left recursion appear, and
then apply the previous transformation rules.
17
Example:
Let G be the grammar defined by <T,N,S,P> such that T={1,0}; N={S,A,B} and
P: S→ SA10 | AB
A→B1 | 0
B →S0 | 1
1. Ordering of non-terminals: S < A < B < S and S < S. The grammar G
presents an indirect left recursion in S and a direct left recursion in S.
18
Example:
Let G be the grammar defined by <T,N,S,P> such that T={1,0}; N={S,A,B} and
P: S→ SA10 | AB
A→B1 | 0
B →S0 | 1
1. Ordering of non-terminals: S < A < B < S and S < S. The grammar G
presents an indirect left recursion in S and a direct left recursion in S.
2. Successive substitutions:
Substitution of B in A: A→S01 | 11 | 0
Substitution of A in S: S→SA10 | S01B | 11B | 0B
3. Elimination of direct left recursion in S: S → 11BS' | 0BS’
S' → A10S' | 01BS' | ε
19
Example:
Let G be the grammar defined by <T,N,S,P> such that T={1,0}; N={S,A,B} and
P: S→ SA10 | AB
A→B1 | 0
B →S0 | 1
Thus, grammar G is transformed into an equivalent grammar G' defined as
by <T,N', S, P'> such that: T={1,0}; N'={S,S',A',B} and
P': S→11BS' | 0BS’
S'→ A ' 10S' | 01BS' | ε
A'→S01 | 11 | 0
B →S0 | 1
20
Factorization of a grammar
One of the conditions for performing a deterministic top-down parse is
factorization.
A grammar G defined by <T,N,S,P> is considered non-factorized if, in the set P,
there are production rules of the form: A→αX | αY | αZ | β such that α (T N)+
and β,X,Y,Z (T N)*. The factorization operation transforms this type of rule as
follows:
A→ αB | β
B→X|Y|Z
Factorization ensures that during syntactic analysis, when the top of the stack is a
non-terminal A and the current input symbol is a, only one right-hand side can be
applied to recognize the terminal. Factorization helps to guarantee the determinism
of the analysis method.
21
Example:
Let G be the grammar defined by <T,N,S,P> such that T={1,0}; N={S,A,B} and
P: S→ aS | abAS | Ab
A→ bA | bS | b
Generally speaking, it's preferable to find the longest prefix that is common
among the right-hand sides of a non-terminal A for factorization.
22
Example:
Let G be the grammar defined by <T,N,S,P> such that T={1,0}; N={S,A,B} and
P: S→ aS | abAS | Ab
A→ bA | bS | b
Thus, grammar G is transformed into an equivalent grammar G' defined as
by <T,N', S, P'> such that: T={1,0}; N'={S,S’,A, A’} and
P': S→ aS′ | Ab
S’→ S | bAS
A→ bA′
A’ → A | S | ε
23
Free ε grammars
A grammar G defined by <T,N,S,P> is said to be ε-free:
• If no production rule contains ε (the empty string),
• If the axiom S → ε, then S must not appear on the right-hand side of any
production rule in P.
Example:
Let G be a grammar defined by the tuple <T,N,S,P> where T={a,b}, N={S,A,B},
S is the start symbol, and P is the set of production rules:
S→ Ab | Bb | ε
A→aB | bSB | ε
B → AB | bS
24
Grammar G is not ε-free because there is a production rule S→ ε and the non-
terminal S appears on the right-hand side of multiple production rules.
Additionally, A→ ε. An equivalent ε-free grammar G' is obtained by applying
algorithm. It is defined by <T,N,S,P'> where: T={a,b}; N={S,A,B}
P':S′→S | ε
S→ Ab | Bb | b
A→aB | bSB | bB
B → AB | bS | B | b
Since the production rule B →B is useless, it is removed from the set of
production rules. Thus:
P’: S′→S | ε
S→ Ab | Bb | b
A→aB | bSB | bB
B → AB | bS | b
25
Normal Forms of Grammars
Chomsky Normal Form
A grammar is in Chomsky Normal Form if every production in G has one of the
following forms:
• A → BC, where A, B, and C are non-terminal symbols.
• A → a, where a is a terminal symbol.
• If ε ϵ L(G), then only S → ε is allowed, and S must not appear on the right-hand
side of any other production.
Lemma
For every type 2 grammar, there is an equivalent grammar G' in Chomsky Normal
Form.
The advantage of a Chomsky Normal Form grammar is that its syntax tree is binary,
which means that its traversal becomes rapid and the analysis is then accelerated.
26
Example
Convert the following grammar into Chomsky Normal Form:
Let G be the grammar defined by <T,N,S,P> where T={a,c,d}; N={S,A,B} and
P: S→a | A
A→B A | c | ε
B →dB
G is not in Chomsky Normal Form because it is not ε-free. The transformation of
the grammar into CNF requires several steps:
[Link] G ε-free: S→a | A | ε A→B A | c | B B →dB
[Link] the non-terminal A in S with its right-hand sides:
S→a | BA | c | B | ε A→B A | c | B B →dB
[Link] the non-terminal B in S with its right-hand sides
S→a | B A | c | dB | ε A→B A | c | dB B →dB
[Link] the terminal d in S, A, and B with a new non-terminal C:
P ' : S→a | B A | c |CB | ε A→B A | c |CB B →CB C →d
Thus, the grammar G' in Chomsky Normal Form, equivalent to G 27
Normal Forms of Grammars
Greibach Normal Form
A grammar is in Greibach Normal Form (GNF) if it is ε-free and every production
is of the form A → aα, where A N and a T and α N*.
Algorithm for Transformation
[Link] ε-freeness: Remove all productions that generate the empty string.
[Link] left recursion: Remove any direct or indirect left recursion.
[Link] a partial order: Define a partial order among the non-terminals based on
the left-hand side and right-hand side of productions.
[Link] from the largest non-terminal: Beginning with the largest non-terminal in the
partial order, replace productions of the form A → Aα with equivalent productions
where A does not appear at the beginning of the right-hand side.
[Link]: Repeat step 4 until all productions start with a terminal symbol.
28
Advantages
Every production in GNF starts with a terminal symbol. This property accelerates
parsing, as the parser can directly compare the current input symbol with the first
symbol of the production.
Example
G: A → AB | a B → CA | b C→c
1.G is ε-free
2. G is directly left-recursive at A.
After eliminating the direct left-recursion at A
P’: A→aA′ | a A′→B A′ | B B →CA | b C →c
29
Example
G: A → AB | a B → CA | b C→c
1.G is ε-free
2. G is directly left-recursive at A.
After eliminating the direct left-recursion at A
P’: A→aA′ | a A′→B A′ | B B →CA | b C →c
3. Establishing a partial order between the non-terminals: A' < B < C (because A'
calls B and B calls C)
4. Replacing C with its right-hand sides in B
A→aA′ | a A′→B A′ | B B →cA | b C →c
5. Replacing B with its right-hand sides in A’
A→aA′ | a A′→cAA′ | bA′ | cA | b B →cA | b C →c
30
Top-Down Parsing Methods
Top-down parsing methods aim to construct a parse tree starting from the root (the
axiom) and working down towards the leaves until the leaves correspond to the
input string being analyzed. Two main categories of parsers have been developed:
deterministic and non-deterministic. The latter, non-deterministic methods, are
generally not implemented due to their inefficiency.
31
Non-Deterministic Top-Down Parsing Methods
1. Parallel Top-Down Parsing
This approach involves replacing a non-terminal with one or more possible right-
hand sides of its production rules. Unlike deterministic parsers, all alternatives are
explored simultaneously. These replacements are maintained in parallel until it
becomes clear that a particular path cannot lead to a successful parse.
[Link] backtracking method operates sequentially.
It involves fully exploring one alternative before moving on to another. If a
particular choice leads to a dead end, the algorithm backtracks to a previous
decision point and explores a different alternative.
Each time the algorithm begins a new path, the current choice is pushed onto a
stack. If the choice turns out to be unsuccessful, the algorithm pops the top choice
off the stack and tries a different one.
If no alternative works, the input string is considered syntactically incorrect.
32
Deterministic Parsing Methods
Unlike non-deterministic methods, deterministic methods do not require
backtracking. At each step, only the single alternative that can potentially match the
input is selected.
I. LL(1) Parsing
The LL(1) parsing method is a predictive parsing technique. Given a grammar and
an input string, the goal is to determine which productions to apply starting from the
axiom in order to derive the input string. It is possible to construct a non-recursive
predictive parser using an explicit stack.
Informal definition of an LL(1) grammar
A grammar is said to be LL(1) if, by looking at only one character of the program to
be analyzed, we can decide which rule to apply.
33
LL(1) stands for:
• Left-to-right: The parser reads the input from left to right.
• Leftmost derivation: The parser constructs the parse tree by expanding the
leftmost non-terminal at each step.
• 1: The parser looks ahead one token to make parsing decisions.
In essence, an LL(1) parser is a top-down parser that reads the input from left to
right, derives the leftmost non-terminal at each step, and uses only one lookahead
token to make parsing decisions.
A grammar is LL(1) if:
1. G is not left-recursive, either directly or indirectly.
2. G is left-factored.
3. The parsing table is unambiguous.
34
Note:
To mark the successful completion of syntactic analysis of a grammar G defined
by <T, N, S, P>, this grammar is augmented with a new non-terminal symbol Z, as
the starting symbol, with an additional production rule Z → S#.
This gives rise to a new grammar G' defined by <T, N, S, P'> where P' is obtained
from P by adding the production rule Z → S#.
35
Construction of the LL(1) parsing table
The construction of the LL(1) parsing table associated with a grammar G defined by
<T, N, S, P> requires the definition of the set of first symbols and the set of follow
symbols for each non-terminal A N of the grammar G.
1. Calculation of first symbols
This step consists of defining the set of terminals (possibly including ε) that appear
directly or indirectly at the beginning of the right-hand sides of the productions of A.
The calculation of the set of first symbols of a non-terminal A is done according to
algorithm.
36
Algorithm of Deb
• If we have a rule: X → xβ, where X N, β (T N)*, and x T,
then x Deb(X)
• If we have a rule: X → ε, then ε Deb(X)
• If we have a rule: X → Y1Y2...Yn,
then: Deb(Y1)-{ε} Deb(X)
If Yi → ε, then Deb(Yi+1) -{ε} Deb(X)
If all Yn → ε, then {ε} Deb(X)
37
2. Calculating Follow Sets
The set of follow symbols associated with a non-terminal A N in a grammar G
defined by <T, N, S, P> consists of the terminals that appear directly or indirectly to
the right of the non-terminal in a right-most derivation.
If we have a rule A → Xa, where A , X N and a T.
Algorithm of Suiv
• If we have a rule A → aBβ, where A, B N, a, β (T N)* then:
If β T then β Suiv(B)
If β N then deb(β) -{Ɛ} Suiv(B)
If β →* Ɛ then Suiv(A) Suiv(B)
Note:
• If we have A → αB, then Suiv(A) Suiv(B)
• If we have Z → S#, then # Suiv(S)
38
Example
Let G be the grammar defined by <T,N,S,P> where T={+ - * / ( ) nb};
N={E,E’,T,T’,F} and P: Z →E#
E →T E′ 1
E′→+T E′ 2 | -T E′ 3 | ε 4
T →FT ′ 5
T ′→ FT ′ 6 | /FT ′ 7 | ε 8
F →(E) 9 | nb 10
Calculating the set of Deb
Deb(E) = Deb(T) = Deb(F) = { (, nb}
Debuts(E’) = {+,-, ε }
Debuts(T) = Debuts(F) = { (, nb}
Debuts(T’) = { *, /, ε }
Debuts(F) = { (, nb}
39
Example
Let G be the grammar defined by <T,N,S,P> where T={+ - * / ( ) nb};
N={E,E’,T,T’,F} and P: Z →E#
E →T E′ 1
E′→+T E′ 2 | -T E′ 3 | ε 4
T →FT ′ 5
T ′→ FT ′ 6 | /FT ′ 7 | ε 8
F →(E) 9 | nb 10
Calculating the set of Svt
Suiv(E) = { #, )}
Suiv(E’) = Suiv(E) = { #, )}
Suiv(T) = Deb(E’) - {ϵ } Suiv(E) = { +, -, ), #}
Suiv(T’) =Suiv(T)= { +, -, ), #}
Suiv(F) =Deb(T’) - {ϵ } Suiv(T)= { *, /, +, -, ), #}.
40
Example
Let G be the grammar defined by <T,N,S,P> where T={+ - * / ( ) nb};
N={E,E’,T,T’,F} and P: Z →E#
E →T E′ 1
E′→+T E′ 2 | -T E′ 3 | ε 4
T →FT ′ 5
T ′→ FT ′ 6 | /FT ′ 7 | ε 8
F →(E) 9 | nb 10
41
A parsing tableLL(1)
denoted by M, is a two-dimensional table that indicates for each non-terminal A and
each terminal a or #, the production rule to be applied.
Note:
• Any undefined entry in the LL(1) parsing table corresponds to an error.
• An LL(1) parsing table must be single-valued (at most one rule per entry). 42
Example
Let G be the grammar defined by <T,N,S,P> where T={+ - * / ( ) nb};
N={E,E’,T,T’,F} and P: Z →E#
E →T E′ 1
E′→+T E′ 2 | -T E′ 3 | ε 4
T →FT ′ 5
T ′→ FT ′ 6 | /FT ′ 7 | ε 8
F →(E) 9 | nb 10
The LL(1) parsing table is single-valued15/10/2024
implies G is LL(1). 43
LL(1) parsing algorithm
The input program includes an end-of-file
marker (#). At any given moment, the stack
contains symbols from the grammar, with #
also serving as the bottom-of-stack marker.
Initially, the stack contains #Axiom. The
parsing table is two-dimensional, with rows
representing the grammar's non-terminals
and columns representing the terminals.
T(N,a) provides the right-hand side of the
production to be applied, indicating the
parser's action.
44
Let's consider the grammar G from the
previous example. The steps for analyzing the
string nb+nb*nb#.
Z →E#
E →T E′
E′→+T E′ | -T E′ | ε
T →FT ′
T ′→ FT ′ | /FT ′ | ε
F →(E) | nb
45
Definition
A grammar G is said to be LL(1) if:
1.G is not left-recursive, neither directly nor indirectly.
2.G is left-factored.
[Link] every production of the form A→α1 | α2... | αn, where αi (T N),
then (a) Deb(αi) ∩ Deb(αj) = Ɐ i, j [1,n] with i ≠ j,
(b) If αi =>* ε, then Deb(αi) ∩ Svt(A) = ,
(c) If αi =>* ε, then αj ≠> ε Ɐ j ≠ i.
46
II. Recursive Descent Parsing
The parser associates a procedure with each non-terminal of the grammar. Parsing is
performed through procedure calls.
Example
Let G be the grammar defined by <T,N,S,P> where T={a b ( ) ,}; N={S,T} and
P: Z→S# S →a | b(T ) T →T,S | S
Checking LL(1) conditions:
1) The grammar G is directly left-recursive on T. The elimination of direct left
recursion on T is done as follows : T → S T’ | S T’ → ,S T’ | ,S
2) G' is left-factored.
47
G ’ be the grammar defined by <T,N,S,P> where T={a b ( ) ,}; N ’ ={S,T,T ’} and
P ’ : Z→S# S →a | b(T ) T → S T’ T’ → ,S T’ | Ɛ
Deb and Svt associated with non-terminals of grammar G’
Deb Svt
S a b # , )
T a b )
T’ , Ɛ )
3) The parsing table of the grammar G' is single-valued because:
- for S, Deb(a) ∩ Deb(b(T)) = ,
- for T, there is only one production,
- for T', Deb(,ST') ∩ Svt(T') = .
The grammar G' satisfies all the LL(1) conditions.
48
Definition of procedures:
Main program : Z→S# Procedure : S →a | b ( T )
49
Definition of procedures:
Procedure : T → S T’ Procedure : T’ → ,S T’ | Ɛ
50
Example
P ’ : Z→S# S →a | b(T ) T → S T’ T’ → ,S T’ | Ɛ
Analysis of the string b(a,a)# Analysis of the string b(ba)#
51
Remark
The LL(1) parsing method is conceptually simple: It produces compact and
efficient parsers. It is also easy to add error-handling features.
However, it requires transforming the grammar (eliminating left recursion,
factoring the grammar).
Moreover, the method is quite restrictive: it does not apply to all grammars.
It should be noted, however, that some parser generators use the LL(1) method,
such as JavaCC.
The recursive descent method is also simple, efficient, and easy to
implement. However, it has a major drawback: the method is tied to the
grammar. If a rule is modified, then the corresponding procedure will need to be
modified.
52
Bottom-up parsing
The principle of bottom-up parsing is to construct a parse tree starting from the base
(leaves) and working upwards towards the root (represented by the axiom). The
model used is Shift/Reduce, meaning there are two operations:
• Shift: This involves advancing the pointer over the input string by one lexical
unit.
• Reduce: This involves reducing a sequence of terminals and non-terminals to a
non-terminal using a production rule.
Bottom-up parsing algorithms are often more complex than top-down parsing
algorithms. However, they can be applied to a wider range of grammars.
For this reason, they are very commonly used.
They are the basis of the YACC system, which is used to write compilers under the
UNIX system.
53
LR(1) (Left to Right)
LR parsing methods are efficient methods of deterministic bottom-up parsing. They
execute a set of actions based on shifts and reductions in order to reach the
grammar's axiom.
The LR(k) parser operates in the following manner:
•Left-to-right (L): It reads the input terminals from left to right.
•Rightmost derivation (R): It constructs a parse tree based on a rightmost derivation.
•k-lookahead (k): It bases its parsing decisions on the next k input symbols.
A grammar is LR(0) if the contents of the stack are sufficient to determine the action
to be performed.
A grammar is LR(K) if, by looking at the next K characters of the input string, the
action to be performed can be determined.
54
Item-Based LR Parsing
An item is a grammar production with a dot moving through the right-hand side of
the production. The part to the left of the dot represents the subtree of the grammar
that has already been reduced at a given moment during the analysis, and the part
to the right represents what remains to be analyzed.
An LR(k) item has the following form: [A → α.β, w]
where A N; α, β (T N); w T {#}, such that:
• A → αβ is a rule of the grammar.
• α: represents the part that has already been reduced.
• β: represents the part that remains to be analyzed.
• k represents the number of lexical tokens to consult in order to decide on the
action to be performed (shift/reduce). If k = 0, the corresponding item is an LR(0)
item and has the form [A → α.β]
55
Construction of the set of LR(K) items
The construction of LR(K) item sets for a grammar requires the implementation of
two functions:
• Closure of an item:
• The goto function.
1. The calculation of the closure of an item is done using the algorithm
56
2. The goto function
The GOTO function, defined as GOTO(I, X), where:
• I represents the set of items,
• X is a symbol in the grammar G, X (T N)*, is the closure of the set formed
by all items of the form [A → αX.β, w] such that [A → α.Xβ, w] I.
57
Construction of the LR(k) analysis table
58
A grammar G is LR(K) using the item-set method if the LR(K) parsing table is
unambiguous.
Therefore, to verify if a grammar G is LR(1), we must:
1. Calculate the collection C of closures of LR(1) items.
2. Construct the LR(1) parsing table.
3. If the LR(1) parsing table is unambiguous, then G is LR(1)=>G is LR(K) ⱯK≥1.
Remark
The ability to handle a larger class of grammars makes LR parsing a more powerful
and flexible technique than LL parsing. When faced with a grammar that is not
LL(1), but is LR(1), LR parsing is the preferred choice.
Left recursion does not hinder bottom-up parsing
59
Example
60
LALR (1) Analyzer
The LALR(1) (Look Ahead LR(1)) parsing method is an intermediate method
between LR(1) and SLR(1) parsing. It represents a good compromise between the
LR(1) and SLR(1) parsing methods. Indeed,
- It is more powerful than the SLR(1) parsing method because it can handle a
larger class of grammars,
- In some cases, it produces smaller parsing tables than those produced by the
LR(1) parsing method.
The LALR(1) method is based on the calculation of LR(1) items. In order to reduce
the number of generated states, the idea is to group together sets of items that
have the same core to form a set consisting of the union of the two base sets with
different lookahead symbols. This operation can lead to a reduction in the size of
the LR(1) parsing table.
61
Construction of the LR(k) analysis table
The core of an LR(1) item is represented by the left-hand side that is located before
the dot.
62
Example
63
• La table d’analyse LALR(1) est mono-définie la grammaire G est LALR(1)
la grammaire G est LALR(k), k ≥ 1.
• The LALR(1) parsing table is actually obtained by merging the rows of the LR(1)
parsing table that correspond to item sets with the same core
Either to analyze the string ccdd#:
64