0% found this document useful (0 votes)
15 views35 pages

Syntax Analysis: COP5621 Compiler Construction

The document discusses the role of a parser in compiler construction, focusing on syntax analysis and error handling. It outlines various parsing techniques, including top-down and bottom-up methods, as well as the importance of grammars and error recovery strategies. Additionally, it covers concepts such as the viable-prefix property, FIRST and FOLLOW sets, and the characteristics of LL(1) grammars.

Uploaded by

sillymclaren6
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views35 pages

Syntax Analysis: COP5621 Compiler Construction

The document discusses the role of a parser in compiler construction, focusing on syntax analysis and error handling. It outlines various parsing techniques, including top-down and bottom-up methods, as well as the importance of grammars and error recovery strategies. Additionally, it covers concepts such as the viable-prefix property, FIRST and FOLLOW sets, and the characteristics of LL(1) grammars.

Uploaded by

sillymclaren6
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

Syntax Analysis
Part I
Chapter 4

COP5621 Compiler Construction


Copyright Robert van Engelen, Florida State University, 2005
2

Position of a Parser in the


Compiler Model
Token,
Source tokenval Parser
Lexical Intermediate
Program and rest of
Analyzer representation
Get next front-end
token
Lexical error Syntax error
Semantic error

Symbol Table
3

The Parser
• The task of the parser is to check syntax
• The syntax-directed translation stage in the
compiler’s front-end checks static semantics and
produces an intermediate representation (IR) of
the source program
– Abstract syntax trees (ASTs)
– Control-flow graphs (CFGs) with triples, three-address
code, or register transfer lists
– WHIRL (SGI Pro64 compiler) has 5 IR levels!
4

Error Handling
• A good compiler should assist in identifying and
locating errors
– Lexical errors: important, compiler can easily recover
and continue
– Syntax errors: most important for compiler, can almost
always recover
– Static semantic errors: important, can sometimes
recover
– Dynamic semantic errors: hard or impossible to detect
at compile time, runtime checks are required
– Logical errors: hard or impossible to detect
5

Viable-Prefix Property
• The viable-prefix property of LL/LR parsers
allows early detection of syntax errors
– Goal: detection of an error as soon as possible
without consuming unnecessary input
– How: detect an error as soon as the prefix of the
input does not match a prefix of any string in
the language
Error is
Error is detected here
… detected here …
Prefix Prefix DO 10 I = 1;0
for (;)
… …
6

Error Recovery Strategies


• Panic mode
– Discard input until a token in a set of designated
synchronizing tokens is found
• Phrase-level recovery
– Perform local correction on the input to repair the error
• Error productions
– Augment grammar with productions for erroneous
constructs
• Global correction
– Choose a minimal sequence of changes to obtain a
global least-cost correction
7

Grammars (Recap)
• Context-free grammar is a 4-tuple
G=(N,T,P,S) where
– T is a finite set of tokens (terminal symbols)
– N is a finite set of nonterminals
– P is a finite set of productions of the form
→
where   (NT)* N (NT)*
and   (NT)*
– S is a designated start symbol S  N
8

Notational Conventions Used


• Terminals
a,b,c,…  T
specific terminals: 0, 1, id, +
• Nonterminals
A,B,C,…  N
specific nonterminals: expr, term, stmt
• Grammar symbols
X,Y,Z  (NT)
• Strings of terminals
u,v,w,x,y,z  T*
• Strings of grammar symbols
,,  (NT)*
9

Derivations (Recap)
• The one-step derivation is defined by
A
where A →  is a production in the grammar
• In addition, we define
–  is leftmost lm if  does not contain a nonterminal
–  is rightmost rm if  does not contain a nonterminal
– Transitive closure * (zero or more steps)
– Positive closure + (one or more steps)
• The language generated by G is defined by
L(G) = {w | S + w}
10

Derivation (Example)
E→E+E
E→E*E
E→(E)
E→-E
E → id

E  - E  - id
E rm E + E rm E + id rm id + id
E * E
E + id * id + id
11

Chomsky Hierarchy: Language


Classification
• A grammar G is said to be
– Regular if it is right linear where each production is of
the form
A→wB or A→w
or left linear where each production is of the form
A→Bw or A→w
– Context free if each production is of the form
A→
where A  N and   (NT)*
– Context sensitive if each production is of the form
A→
where A  N, ,,  (NT)*, || > 0
– Unrestricted
12

Chomsky Hierarchy

L(regular)  L(context free)  L(context sensitive)  L(unrestricted)

Where L(T) = { L(G) | G is of type T }


That is, the set of all languages
generated by grammars G of type T

Examples:
Every finite language is regular
L1 = { anbn | n  1 } is context free
L2 = { anbncn | n  1 } is context sensitive
13

Parsing
• Universal (any C-F grammar)
– Cocke-Younger-Kasimi
– Earley
• Top-down (C-F grammar with restrictions)
– Recursive descent (predictive parsing)
– LL (Left-to-right, Leftmost derivation) methods
• Bottom-up (C-F grammar with restrictions)
– Operator precedence parsing
– LR (Left-to-right, Rightmost derivation) methods
• SLR, canonical LR, LALR
14

Top-Down Parsing
• LL methods (Left-to-right, Leftmost
derivation) and recursive-descent parsing
Grammar: Leftmost derivation:
E→T+T E lm T + T
T→(E) lm id + T
T→-E
T → id
lm id + id
E E E E

T T T T T T

+ id + id + id
15

Left Recursion (Recap)


• Productions of the form
A→A
|
|
are left recursive
• When one of the productions in a grammar
is left recursive then a predictive parser may
loop forever
16

General Left Recursion


Elimination
Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do
for j = 1, …, i-1 do
replace each
Ai → Aj 
with
Ai → 1  | 2  | … | k 
where
Aj → 1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai
enddo
17

Immediate Left-Recursion
Elimination
Rewrite every left-recursive production
A→A
|
|
|A
into a right-recursive production:
A →  AR
|  AR
AR →  AR
|  AR
|
18

Example Left Rec. Elimination


A→BC|a
B→CA|Ab Choose arrangement: A, B, C
C→AB|CC|a

i = 1: nothing to do
i = 2, j = 1: B→CA|Ab
 B→CA|BCb|ab
(imm) B → C A BR | a b BR
BR → C b BR | 
i = 3, j = 1: C→AB|CC|a
 C→BCB|aB|CC|a
i = 3, j = 2: C→BCB|aB|CC|a
 C → C A BR C B | a b BR C B | a B | C C | a
(imm) C → a b BR C B CR | a B CR | a CR
CR → A BR C B CR | C CR | 
19

Left Factoring
• When a nonterminal has two or more productions
whose right-hand sides start with the same
grammar symbols, the grammar is not LL(1) and
cannot be used for predictive parsing
• Replace productions
A →  1 |  2 | … |  n | 
with
A →  AR | 
AR → 1 | 2 | … | n
20

Predictive Parsing
• Eliminate left recursion from grammar
• Left factor the grammar
• Compute FIRST and FOLLOW
• Two variants:
– Recursive (recursive calls)
– Non-recursive (table-driven)
21

FIRST
• FIRST() = the set of terminals that begin all strings
derived from 

FIRST(a) = {a} if a  T
FIRST() = {}
FIRST(A) = A→ FIRST() for A→  P
FIRST(X1X2…Xk) =
if for all j = 1, …, i-1 :   FIRST(Xj) then
add non- in FIRST(Xi) to FIRST(X1X2…Xk)
if for all j = 1, …, k :   FIRST(Xj) then
add  to FIRST(X1X2…Xk)
22

FOLLOW
• FOLLOW(A) = the set of terminals that can
immediately follow nonterminal A

FOLLOW(A) =
for all (B →  A )  P do
add FIRST()\{} to FOLLOW(A)
for all (B →  A )  P and   FIRST() do
add FOLLOW(B) to FOLLOW(A)
for all (B →  A)  P do
add FOLLOW(B) to FOLLOW(A)
if A is the start symbol S then
add $ to FOLLOW(A)
23

LL(1) Grammar
• A grammar G is LL(1) if for each collections of
productions
A → 1 | 2 | … | n
for nonterminal A the following holds:

1. FIRST(i)  FIRST(j) =  for all i  j


2. if i *  then
2.a. j *  for all i  j
2.b. FIRST(j)  FOLLOW(A) = 
for all i  j
24

Non-LL(1) Examples

Grammar Not LL(1) because


S→Sa|a Left recursive
S→aS|a FIRST(a S)  FIRST(a)  
S→aR|
R→S| For R: S →*  and  →* 
S→aRa For R:
R→S| FIRST(S)  FOLLOW(R)  
25

Recursive Descent Parsing


• Grammar must be LL(1)
• Every nonterminal has one (recursive) procedure
responsible for parsing the nonterminal’s syntactic
category of input tokens
• When a nonterminal has multiple productions,
each production is implemented in a branch of a
selection statement based on input look-ahead
information
26

Using FIRST and FOLLOW to


Write a Recursive Descent Parser
procedure rest();
begin
expr → term rest if lookahead in FIRST(+ term rest) then
rest → + term rest match(‘+’); term(); rest()
else if lookahead in FIRST(- term rest) then
| - term rest match(‘-’); term(); rest()
| else if lookahead in FOLLOW(rest) then
term → id return
else error()
end;

FIRST(+ term rest) = { + }


FIRST(- term rest) = { - }
FOLLOW(rest) = { $ }
27

Non-Recursive Predictive
Parsing
• Given an LL(1) grammar G=(N,T,P,S)
construct a table M[A,a] for A  N, a  T
and use a driver program with a stack
input a + b $

stack
Predictive parsing
X output
program (driver)
Y
Z Parsing table
$ M
28

Constructing a Predictive Parsing


Table
for each production A →  do
for each a  FIRST() do
add A →  to M[A,a]
enddo
if   FIRST() then
for each b  FOLLOW(A) do
add A →  to M[A,b]
enddo
endif
enddo
Mark each undefined entry in M error
29

Example Table A→ FIRST() FOLLOW(A)


E → T ER ( id $)
ER → + T ER + $)
E → T ER
ER → + T ER |  ER →  
T → F TR T → F TR ( id +$)
TR → * F TR |  TR → * F TR * +$)
F → ( E ) | id TR →  
F→(E) ( *+$)
F → id id

id + * ( ) $
E E → T ER E → T ER
ER ER → + T ER ER →  ER → 
T T → F TR T → F TR
TR TR →  TR → * F TR TR →  TR → 
F F → id F→(E)
30

LL(1) Grammars are


Unambiguous
Ambiguous grammar A→ FIRST() FOLLOW(A)
S → i E t S SR | a S → i E t S SR i e$
SR → e S |  S→a a
E→b SR → e S e e$
SR →  
E→b b t
Error: duplicate table entry
a b e i t $
S S→a S → i E t S SR
SR → 
SR SR → 
SR → e S
E E→b
31

Predictive Parsing Program


push($)
(Driver)
push(S)
a := lookahead
repeat
X := pop()
if X is a terminal or X = $ then
match(X) // move to next token, a := lookahead
else if M[X,a] = X → Y1Y2…Yk then
push(Yk, Yk-1, …, Y2, Y1) // such that Y1 is on top
produce output and/or invoke actions
else error()
endif
until X = $
32

Example Table-Driven Parsing


Stack Input Production applied
$E id+id*id$
$ERT id+id*id$ E → T ER
$ERTRF id+id*id$ T → F TR
$ERTRid id+id*id$ F → id
$ERTR +id*id$
$ER +id*id$ TR → 
$ERT+ +id*id$ ER → + T ER
$ERT id*id$
$ERTRF id*id$ T → F TR
$ERTRid id*id$ F → id
$ERTR *id$
$ERTRF* *id$ TR → * F TR
$ERTRF id$
$ERTRid id$ F → id
$ERTR $
$ER $ TR → 
$ $ ER → 
33

Panic Mode Recovery


FOLLOW(E) = { $ ) }
FOLLOW(ER) = { $ ) }
Add synchronizing actions to FOLLOW(T) = { + $ ) }
undefined entries based on FOLLOW FOLLOW(TR) = { + $ ) }
FOLLOW(F) = { * + $ ) }

id + * ( ) $
E E → T ER E → T ER synch synch
ER ER → + T ER ER →  ER → 
T T → F TR synch T → F TR synch synch
TR TR →  TR → * F TR TR →  TR → 
F F → id synch synch F→(E) synch synch
synch: pop A and skip input till synch token
or skip until FIRST(A) found
34

Phrase-Level Recovery
Change input stream by inserting missing *
For example: id id is changed into id * id

id + * ( ) $
E E → T ER E → T ER synch synch
ER ER → + T ER ER →  ER → 
T T → F TR synch T → F TR synch synch
TR insert * TR →  TR → * F TR TR →  TR → 
F F → id synch synch F→(E) synch synch

insert *: insert missing * and redo the production


35

Error Productions
E → T ER
Add error production:
ER → + T ER | 
TR → F TR
T → F TR
to ignore missing *, e.g.: id id
TR → * F TR | 
F → ( E ) | id

id + * ( ) $
E E → T ER E → T ER synch synch
ER ER → + T ER ER →  ER → 
T T → F TR synch T → F TR synch synch
TR TR → F T R TR →  TR → * F TR TR →  TR → 
F F → id synch synch F→(E) synch synch

You might also like