0% found this document useful (0 votes)
4 views33 pages

Syntax Analysis and Context-Free Grammars

This document covers syntax analysis, focusing on context-free grammars (CFG) and their components, including terminal and nonterminal symbols. It explains the role of parsers in checking grammatical correctness and constructing intermediate representations, such as parse trees. Additionally, it discusses concepts like ambiguous grammars, left recursion, and left factoring to eliminate non-determinism in grammar.

Uploaded by

rajkumarpogula22
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views33 pages

Syntax Analysis and Context-Free Grammars

This document covers syntax analysis, focusing on context-free grammars (CFG) and their components, including terminal and nonterminal symbols. It explains the role of parsers in checking grammatical correctness and constructing intermediate representations, such as parse trees. Additionally, it discusses concepts like ambiguous grammars, left recursion, and left factoring to eliminate non-determinism in grammar.

Uploaded by

rajkumarpogula22
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

UNIT - II

Syntax Analysis(Part-A):
• Introduction Context-Free
Grammars
• Writing a Grammar
• Ambiguous Grammars

1
2.1 Introduction to Syntax
Analysis

2
Syntax
Analysis

Syntax Analysis
The parser (syntax analyzer) receives the source
code in the form of tokens from the lexical
analyzer and performs syntax analysis, which
create a tree-like intermediate representation that
depicts the grammatical structure of the token
stream.
Syntax analysis is also called parsing.
A typical representation is a abstract syntax
tree in which
each interior node represents an operation
the children of the node represent the
arguments of the operation

3
Position of Syntax
Analyzer

4
Role of
Parser

Parser
Checks the stream of words and their parts of
speech
(produced by the scanner) for grammatical
correctness
Determines if the input is syntactically well
formed Guides checking at deeper levels
than syntax (static
semantics checking)
Builds an IR representation of the code

5
2.2 Context-Free Grammar

Definition
A context-free grammar (CFG) has four components:
A set of terminal symbols, sometimes referred to as "tokens."
A set of nonterminal symbols. sometimes called "syntactic
variables."
One nonterminal is distinguished as the start symbol.

A set of productions in the form: LHS → RHS


whereLHS (called head, or left side) is a single nonterminal
symbol RHS (called body, or right side) consists of zero
or more terminals and nonterminals.

The terminals are the elementary symbols of the language defined by


the grammar.
Nonterminals impose a hierarchical structure on the language that is
key to
syntax analysis and translation.
Conventionally, the productions for the start symbol are listed first.
6
Benefits Offered by
Grammar
Grammars offer significant benefits for both
language designers and compiler writers:
A grammar gives a precise, yet easy-to-
understand
syntactic specification to a programming
language.
Parsers can automatically be constructed for
certain classes of grammars.
The parser-construction process can reveal
syntactic
ambiguities and trouble spots.
A grammar imparts structure to a language.
The structure is useful for translating source
programs
into correct object code and for detecting
errors. 7
CFG vs. RE

Grammars are a more powerful notation than


regular expressions.
Every construct that can be described by a
regular expression can be described by a
grammar, but not vice-versa.
Every regular language is a context-free
language, but not vice-versa.

8
CFG
Example

A CFG Grammar

9
Another CFG
Example

Grammar for simple arithmetic


expressions
1 expr → expr + term | expr - term | term

2 term → term * factor | term / factor |

3 factor factor → ( expr ) | id

where
expr, term, and factor are
nonterminals
id, +, -, *, /, (, and ) are terminals
expr is the start symbol

10
Notational
Conventions
To avoid confusion between terminals and nonterminals, the following
notational conventions for grammar will be used.

terminal symbols

lowercase letters like a, b, c.


digits, operator and punctuation symbols, such as +, *, (, ), 0, 1, ...,
9.
Boldface strings such as id, or if. Each of which represents
a single terminal symbol.

nonterminal symbols

uppercase letters early in the alphabet like


A, B, C. lowercase italic names such as expr,
or stmt.
Specific symbols begin with a uppercase
letter such as Expr, Stmt.
Unless stated otherwise, the head of the
first production is the start symbol. 11
Notational Conventions
(cont)

To avoid confusion between terminals and nonterminals, the following


notational conventions for grammar will be used.

Grammar symbols (i.e. either terminal or nonterminal)

uppercase letters late in the alphabet, such as X, Y, Z.


lowercase Greek letters, α, β, γ for example, represent strings of
grammar symbols. Thus, a production can be written as: A → α.

12
Derivatio
ns
Derivations
A grammar derives strings by beginning with the start symbol and
repeatedly replacing a nonterminal by the body of a production for
that nonterminal. This sequence of replacements is called
derivation.
Derivation Example
Given the grammar:
1 exp → exp op exp | ( exp ) | number
2 op → + | - | *
The following is a derivation for an expression. At each step the
grammar rule choice used for the replacement is given on the
right.
Context-Free
Language
New Notations: ⇒=∗ and = +

α 1⇒+=∗ α mn eans α de1 rives α in nzero or more steps.

α 1 = α n means α 1 derives α n in one or more steps.

Definition

If S =⇒∗ α, where S is the start symbol of grammar G, then α


is called a sentential form of G. A sentential form may contain
both terminals and nonterminals.
A sentence of G is a sentential form with no nonterminals.
The language generated by a grammar G is its set of sentences,
denoted as L(G).
A language that can be generated by a context-free grammar is
said to be a
context-free language.
If two grammars generate the same language, the grammars
are said to be
equivalent.
Leftmost and Rightmost
Derivations

The point of parsing is to construct a derivation.


At each step, we choose a nonterminal to
replace.
Different choices can lead to different
derivations
Two derivations are of interest
Leftmost derivation - replace leftmost

nonterminal at each lm step, denoted as: .
Rightmost derivation - replace rightmost

nonterminal at
each step, denoted ras:
m
.

Leftmost and rightmost are the two


systematic derivations. We don’t care
about randomly-ordered derivations!
15
Leftmost and Rightmost
Derivations
Leftmost Derivation of (number - number)*number

Rightmost Derivation of (number - number)*number

16
Parse
Trees

Definition
A parse Tree is a labeled tree representation of a
derivation that filters out the order in which
productions are applied to replace nonterminals.
The interior nodes are labeled by nonterminals
The leaf nodes are labeled by terminals
The children of each internal node A are
labeled, from left to right, by the symbols in
the body of the production by which this A was
replaced during the derivation.
Since a parse tree ignores variations in the order in which
symbols in sentential forms are replaced, there is a many-to-one
relationship between derivations and parse tree.

17
Leftmost and Rightmost
Derivations
The following is a parse tree for these two
derivations discussedhere.

18
2.2 .Writing a Grammar

1 Lexical Versus Syntactic Analysis


2. Eliminating Ambiguity
3. Elimination of Left Recursion
[Link] Factoring

19
2.2.1 Lexical Versus Syntactic
Analysis

20
2.2.2 Ambiguous Grammars

21
Ambiguity

22
Removing the ambiguity

23
24
Deeper Ambiguity

25
2.2.3 Left Recursion

26
Left Recursion

27
Removal of Immediate Left
Recursion

28
General Left Recursion

Bad News!
Can only be removed when there is
no empty-string production and no
cycle in the grammar.
Good News!!!!
Never seen in grammars of any
programming languages

29
2.2.4 Left Factoring

Left factoring is required to eliminate non-


determinism of a grammar.
Suppose a grammar, S -> abS | aSb
Here, S is deriving the same terminal a in
the production rule(two alternative choices
for S), which follows non-determinism. We
can rewrite the production to defer the
decision of S as-
S -> aS'
S' -> bS | Sb
Thus, S' can be replaced for bS or S
30
Left Factoring

Left factor causes non-LL(1)


Given A  X Y | X Z. Both A  X Y
and A  X Z can be chosen when A
is on top of stack and a token in
First(X) is the next token.

AXY|XZ
can be left-factored as
A  X A’ and A’  Y | Z
31
Example of Left Factor

ifSt  if ( exp ) st else st | if ( exp ) st


can be left-factored as
ifSt  if ( exp ) st elsePart
elsePart  else st | 

seq  st ; seq | st
can be left-factored as
seq  st seq’
seq’  ; seq | 

32
Next topic

Bottom-up parsing
Top-down parsing Shift-reduce parsers
Recursive-descent parsing
LL(1) parsing LR(0) parsing
LL(1) parsing LR(0) items
algorithm Finite automata of items
First and follow LR(0) parsing algorithm
sets LR(0) grammar
Constructing LL(1) SLR(1) parsing
parsing table SLR(1) parsing algorithm
Error recovery SLR(1) grammar
Parsing conflict

33

You might also like