0% found this document useful (0 votes)
7 views37 pages

Lec 4

The document provides an introduction to parsers, detailing their role in syntax analysis during the compilation process, including grammar verification and error detection. It discusses how parsers work, the use of context-free grammars, parse trees, and various parsing strategies such as top-down and bottom-up parsing. Additionally, it addresses issues like ambiguity in grammars and the importance of parse trees in understanding program structure.

Uploaded by

nayyeraameen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views37 pages

Lec 4

The document provides an introduction to parsers, detailing their role in syntax analysis during the compilation process, including grammar verification and error detection. It discusses how parsers work, the use of context-free grammars, parse trees, and various parsing strategies such as top-down and bottom-up parsing. Additionally, it addresses issues like ambiguity in grammars and the importance of parse trees in understanding program structure.

Uploaded by

nayyeraameen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

INTRODUCTION TO

PARSERS
Presented By:
DR. Ahmed Al-Badawy
Lecturer, Computer Science Department
LECTURE TOPICS
1. The Role of the Parser
2. How Parser Works
3. Context-Free Grammars
4. Parse Trees
5. Ambiguity
6. Top Down Parsing
THE ROLE OF THE PARSER
The parser (syntax analyzer) is the compiler component responsible for analyzing
the syntactic structure of a program. It receives tokens from the lexical analyzer and
checks whether they follow the grammatical rules of the programming language.
• Input from lexical analyzer – The parser processes the sequence of tokens
generated by the scanner.
• Grammar verification – It checks whether the token sequence conforms to the
language grammar defined by context-free rules.
• Program structure identification – The parser determines how tokens combine to
form expressions, statements, and program blocks.
• Hierarchical representation – It organizes tokens into a structured representation
such as a parse tree or abstract syntax tree (AST).
The parser therefore converts a linear stream of tokens into a structured
representation of the program that later compiler phases can analyze.
THE ROLE OF THE PARSER (CONTINUED)
The parser plays a critical role in connecting lexical analysis with later phases of compilation.
• Syntax error detection – Identifies grammatical errors such as missing symbols, misplaced
operators, or incorrect statement structures.
• Error reporting and recovery – Provides meaningful error messages and attempts to
continue parsing so multiple errors can be reported.
• Preparation for semantic analysis – Produces a parse tree or abstract syntax tree used by
semantic analysis for type checking and meaning validation.
• Foundation for code generation – The structural information produced by the parser helps
guide intermediate code generation and optimization.
In the overall compilation process:
Source Code → Lexical Analysis → Syntax Analysis (Parser) → Semantic Analysis →
Code Generation
The parser ensures that the program has a valid syntactic structure before deeper semantic
processing begins.
HOW THE PARSER WORKS
The parser processes the sequence of tokens produced by the lexical analyzer and organizes them according to the grammar rules of the
language.
• Token sequence input – The parser receives tokens such as identifiers, operators, and keywords from the lexical analyzer.
• Grammar rule application – Using context-free grammar rules, the parser determines how tokens can be combined to form valid constructs.
• Construction of syntax structure – The parser builds a parse tree or abstract syntax tree (AST) representing the hierarchical structure of the
program.
• Decision making during parsing – The parser chooses grammar rules based on the current input token and parsing strategy.
Example flow:
Tokens:
id + id * id
Parser interpretation:
Multiplication is evaluated before addition according to grammar rules.
Result:
The parser produces a structured representation showing that:
id * id is evaluated first, then added to id.
This structured representation is then passed to semantic analysis for further processing.
HOW THE PARSER WORKS
The parser processes the sequence of tokens produced by the lexical analyzer and organizes them according to the grammar rules of the
language.
• Token sequence input – The parser receives tokens such as identifiers, operators, and keywords from the lexical analyzer.
• Grammar rule application – Using context-free grammar rules, the parser determines how tokens can be combined to form valid constructs.
• Construction of syntax structure – The parser builds a parse tree or abstract syntax tree (AST) representing the hierarchical structure of the
program.
• Decision making during parsing – The parser chooses grammar rules based on the current input token and parsing strategy.
Example flow:
Tokens:
id + id * id
Parser interpretation:
Multiplication is evaluated before addition according to grammar rules.
Result:
The parser produces a structured representation showing that:
id * id is evaluated first, then added to id.
This structured representation is then passed to semantic analysis for further processing.
HOW THE PARSER WORKS (PARSING STRATEGIES
USED BY THE PARSER)
To analyze program structure, parsers use specific parsing strategies that determine how grammar rules
are applied to the input tokens.
• Top-Down Parsing – The parser begins with the start symbol of the grammar and attempts to derive the
input program by expanding grammar rules step by step.
• Bottom-Up Parsing – The parser starts with the input tokens and gradually reduces them into higher-level
grammar symbols until the start symbol is reached.
• Lookahead Tokens – Parsers often examine one or more upcoming tokens to decide which grammar rule
should be applied.
• Deterministic Parsing Decisions – Modern parsing techniques attempt to avoid backtracking by using
parsing tables or predictive rules.
These strategies allow the parser to efficiently analyze program structure and ensure that the input
follows the syntax rules of the programming language.
CONTEXT-FREE GRAMMARS (CFG)
A context-free grammar consists of several fundamental components that define how
strings in a language are generated.
• Terminals – Basic symbols of the language that appear in the final program (e.g.,
identifiers, operators, keywords).
• Non-terminals – Abstract symbols representing syntactic categories such as expressions
or statements.
• Productions (rules) – Definitions that describe how non-terminals can be expanded into
other symbols.
• Start symbol – The main non-terminal from which all valid program structures are
derived.
Together, these components allow the grammar to generate all syntactically valid
programs in the language.
CONTEXT-FREE GRAMMARS (CFG)
Context-Free Grammars are formal systems used to describe the syntax of
programming languages. They define how valid sentences or program structures can be
formed from basic symbols.
• Formal syntax description – CFG provides a mathematical way to specify
programming language structure.
• Used in compiler design – Parsers rely on CFG rules to analyze source code.
• Defines valid program constructs – Statements, expressions, and program blocks are
described using grammar rules.
• Foundation of syntax analysis – CFG is the basis for most parsing techniques used in
compilers.
In compilers, CFG helps transform a sequence of tokens into a structured representation
of the program.
PRODUCTION RULES
Production rules define how symbols in the grammar can be replaced to form valid strings.
A production rule is typically written as:
A→α
Where:
• A – A non-terminal symbol
• α – A sequence of terminals and/or non-terminals
Example grammar for arithmetic expressions:
E→E|T
E→T
T→T*F
T→F
F → (E)
F → id
These rules describe how expressions are constructed using operators, identifiers, and parentheses.
PARSE TREES IN CONTEXT-FREE GRAMMARS
A parse tree visually represents how a string is derived from the grammar rules.
Key characteristics:
• Root node – Represents the start symbol of the grammar.
• Internal nodes – Represent non-terminal symbols.
• Leaf nodes – Represent terminal symbols (tokens).
• Tree structure – Shows the hierarchical organization of program elements.
Example expression:
a+b*c
The parse tree reveals that:
b*c
is grouped together before addition due to operator precedence.
Parse trees help compilers understand the structure and meaning of expressions.
CFG EXAMPLE
Grammer: Example Expression Generated
Expression → Expression + Term a+b*c
Expression → Expression – Term Using the grammar:
Expression → Term Expression → Expression + Term
Term → Term * Factor Term → Term * Factor
Term → Term / Factor Factor → identifier
Term → Factor So the structure becomes:
Factor → ( Expression ) Expression
→ Expression + Term
Factor → identifier → identifier + Term
→ identifier + Term * Factor
Factor → number → identifier + identifier * identifier
PARSE TREE
A parse tree is a graphical representation that shows how a string in a
language is derived from the grammar rules.
• It represents the syntactic structure of a program.
• It shows how grammar productions are applied to generate a string.
• The tree structure reflects the hierarchical organization of language
constructs.
• Parse trees are produced during syntax analysis (parsing).
The parser uses grammar rules to construct a parse tree that explains how the
input program follows the language grammar.
PARSE TREE (STRUCTURE OF A PARSE TREE)
A parse tree has a well-defined structure based on grammar rules.
• Root node represents the start symbol of the grammar.
• Internal nodes represent non-terminal symbols.
• Leaf nodes represent terminal symbols (tokens).
• Each branch corresponds to the application of a production rule.
The parse tree clearly shows how a sentence is generated from the
grammar.
PARSE TREE (EXAMPLE GRAMMAR FOR
EXPRESSIONS)
Consider the following grammar used to describe arithmetic expressions:
Expression → Expression + Term
Expression → Term
Term → Term * Factor
Term → Factor
Factor → ( Expression )
Factor → identifier
This grammar defines how expressions are formed using:
• Addition operator
• Multiplication operator
• Parentheses
• Identifiers
The parse tree shows how an expression is derived from these grammar rules.
PARSE TREE
(EXAMPLE PARSE
TREE)
Example:
a+b*c
PARSE TREE (IMPORTANCE OF PARSE TREES )

Parse trees are important in compiler design for several reasons.


• They reveal the syntactic structure of the program.
• They help determine operator precedence and associativity.
• They provide a structured representation for semantic analysis.
• They serve as the basis for building Abstract Syntax Trees (AST).
Without parse trees, the compiler cannot correctly interpret how program
components are related.
PARSE TREES VS ABSTRACT SYNTAX TREES

While parse trees contain all grammar details, compilers often simplify them.
Parse Tree
• Contains every grammar symbol.
• Shows complete derivation steps.
• Often large and complex.
Abstract Syntax Tree (AST)
• Removes unnecessary grammar nodes.
• Keeps only important structural information.
• Used for semantic analysis and code generation.
Therefore, the parse tree is usually converted into an AST for later compiler phases.
AMBIGUITY IN CONTEXT-FREE GRAMMARS

A grammar is ambiguous if there exists a string that can be generated by the


grammar in more than one way.
• The same input string can produce multiple parse trees.
• The string can have multiple leftmost or rightmost derivations.
• Different parse trees may represent different meanings of the program.
• Ambiguity creates problems for compilers because the parser cannot determine
the correct interpretation.
Ambiguous grammars make it difficult for the compiler to uniquely determine
the structure of a program.
EXAMPLE OF AN AMBIGUOUS GRAMMAR

Consider the following grammar for expressions:


Expression → Expression + Expression
Expression → Expression * Expression
Expression → identifier
Example expression:
a+b*c
This grammar is ambiguous because the expression can be interpreted in two
different ways depending on how the parse tree is constructed.
TOP-DOWN PARSING

Top-down parsing is a parsing technique where the parser starts from the
start symbol and tries to derive the input string.
• Builds the parse tree from root to leaves
• Expands non-terminals step by step
• Matches generated symbols with input tokens
• Used in many simple and efficient parsing techniques
TOP-DOWN PARSING

Top-down parsing attempts to reconstruct the derivation of the input


string.
• Start with the start symbol
• Apply production rules to expand symbols
• Continue until only terminals remain
• Compare generated terminals with input
This process ensures the input belongs to the language.
LEFTMOST DERIVATION

Top-down parsing uses leftmost derivation.


• Always expand the leftmost non-terminal first
• Provides a deterministic order of rule application
• Matches recursive structure of grammar
Example:
Expression ⇒ Expression + Term
⇒ Term + Term
⇒ identifier + identifier
PARSE TREE CONSTRUCTION

During parsing, a parse tree is constructed incrementally.


• Root node is the start symbol
• Internal nodes represent non-terminals
• Leaves represent terminals (tokens)
• Each expansion adds branches to the tree
The tree shows how the input is derived from the grammar.
RECURSIVE DESCENT PARSING

Recursive descent parsing is a top-down parsing technique where each


non-terminal in the grammar is implemented as a recursive procedure.
• Each grammar rule corresponds to a function
• The parser processes input by calling these functions
• Functions expand non-terminals and match terminals
• The structure of the parser mirrors the grammar
RECURSIVE DESCENT PARSING

Recursive descent parsing works by breaking the problem into smaller


subproblems.
• Start with the start symbol function
• Call functions corresponding to grammar rules
• Match input tokens step by step
• Return success or failure
This creates a natural mapping between grammar rules and code
structure.
EXAMPLE GRAMMAR

Consider the grammar:


Expression → Expression + Term | Term
Term → Term * Factor | Factor
Factor → ( Expression ) | identifier
Each non-terminal will be implemented as a function:
parseExpression()
parseTerm()
parseFactor()
RECURSIVE DESCENT FUNCTIONS (CONCEPT)

Each function follows the grammar rules.


Example:
function parseExpression():
parseTerm()
if next token is '+':
match '+'
parseExpression()
• Functions call each other recursively
• Matching ensures correct syntax
• Input pointer moves forward
MATCHING TERMINALS

The parser must match expected tokens.


• A match() function is used
• It checks the current input token
• If matched → move to next token
• If not → report syntax error
Example:
match(identifier)
match('+')
PROBLEM OF LEFT RECURSION

Recursive descent parsers cannot handle left-recursive grammars.


Example:
Expression → Expression + Term
Problem:
• Infinite recursion
• Function calls itself without consuming input
This must be eliminated before parsing.
GRAMMAR TRANSFORMATION FOR RECURSIVE
DESCENT (LEFT RECURSION ELIMINATION)
To use recursive descent, grammars must be rewritten.
Example:
Expression → Term Expression'
Expression' → + Term Expression' | ε
Benefits:
• Removes left recursion
• Makes parsing possible
• Enables efficient implementation
ADVANTAGES AND LIMITATIONS

Recursive descent parsing has clear trade-offs.


Advantages:
• Simple and easy to implement
• Code structure mirrors grammar
• Good for small or simple languages
Limitations:
• Cannot handle left recursion
• May require grammar rewriting
• Not suitable for complex grammars without modification
BACKTRACKING IN TOP-DOWN PARSING

Backtracking is a technique used in some top-down parsers when the parser tries multiple
production rules to match the input.
• The parser selects a rule and attempts to apply it
• If the rule fails, it returns (backtracks) to try another rule
• Continues until a valid match is found or all options fail
• Common in naive recursive descent parsers
Backtracking allows the parser to handle grammars with multiple alternatives.
BACKTRACKING IN TOP-DOWN PARSING

The parser explores different possibilities when multiple rules exist.


• Choose a production rule for a non-terminal
• Try to match input using that rule
• If mismatch occurs → revert to previous state
• Try another alternative rule
This process is similar to trial-and-error parsing.
BACKTRACKING IN TOP-DOWN PARSING

Consider the grammar: Steps:


S→aA Try S → a A
S→aB Match a, then expect c
A→c Input has d → failure
B→d Backtrack and try S → a B
Input: Match a d → success
ad
To be matched
PROBLEMS WITH BACKTRACKING

Backtracking introduces several inefficiencies.


• May try many unnecessary alternatives
• Can lead to exponential time complexity
• Difficult to implement efficiently
• Not suitable for real-world compilers
Because of these issues, modern parsers aim to avoid backtracking.
ELIMINATING BACKTRACKING

Backtracking can be avoided using improved parsing techniques.


• Left factoring the grammar to remove ambiguity in choices
• Using predictive parsing (LL(1))
• Utilizing lookahead tokens to select correct rules
• Designing grammars that allow deterministic parsing
These approaches make parsing more efficient and practical for compiler implementation.

You might also like