Compiler Construction
(CS4623)
Course Instructor: Ms. Tayyaba Zaheer
Department of Computer Science,
Capital University of Science and Technology, Islamabad
Fall Semester, 2024
Compiler Construction
Lecture # 06 – Part 01
Topics
• Lexical Analysis Phase – Completed
• Input: Sequence of Characters
• Output: Sequence of Valid Tokens
• Generates Stream of Tokens which is the input of 2 nd phase of compiler i.e., Syntax Analysis Phase
• Detects only Lexical Errors
• If any token is misspelled, then generates error message
• No error message would be generated if either any statement is not meaningful or there is any issue in the
structure of the statement
• Lexical Analyzer (Scanner) Vs. Syntax Analyzer (Parser)
• Why Separate Scanner and Parser?
• Syntax Analysis Phase – Also known as Hierarchical Analysis or Parsing
3
Lexical Analyzer (Scanner) Vs.
Syntax Analyzer (Parser)
Scanner Parser
Scan Input Program Perform Syntax Analysis
Identify Tokens Create abstract representation of the code
Insert tokens into Symbol Table Update Symbol Table Entries
It generates Lexical Errors It generates a Parse Tree of the Source Code
4
Why Separate Scanner and
Parser?
• The simplicity of design
• It eases the process of lexical analysis and the syntax
analysis by eliminating unwanted tokens
• To improve compiler efficiency
• Helps you to improve compiler efficiency
• Specialization
• Specialized techniques can be applied to improves
the lexical analysis process
5
Why Separate Scanner and Parser?
• Portability
• Only the Scanner requires to communicate with the
outside world
• Higher Portability
• Input-device-specific peculiarities restricted to the
Scanner
6
Syntax Analysis Phase (Parser)
• It is the second phase of Compiler design
• Input: Sequence of Valid Tokens (Stream of
Tokens)
• Output: Parse Tree
7
Syntax Analysis Phase (Parser)
• It is the second phase of Compiler design
• It takes sequence of Valid Tokens as Input.
• Builds a Parse Tree from these tokens.
8
Syntax Analysis Phase (Parser)
• If tree is built successfully, Syntax Analysis Phase
passes the tree to Semantic Analysis Phase
• Basically checks if the given input is in the correct
syntax or not
• Error generated by this phase is called Syntax Error
9
Syntax Analysis Phase (Parser)
• Syntax means
• the Sequence of Words or
• Arrangement of Words
• to make a Sentence
10
Syntax Analysis Phase (Parser)
• E.g. Basic structure of English
• First Subject (s) then Verb (v) then any Object (ob)
• s+v+ob
• Muhammad is reading a book
11
Syntax Analysis Phase (Parser)
input
sequence of
output
Tokens (provided Parser Parse Tree
by the Lexical
Analysis Phase)
Syntax Error
12
Syntax Analysis Phase (Parser)
• As DFA decides Valid and Invalid Tokens in
Scanner.
• How Parser is generated?
13
Syntax Analysis Phase (Parser)
• Structure of any programming language is
defined using Grammars
• With the help of Context Free Grammars the
structure of the programming language is
defined
14
Syntax Analysis Phase (Parser) -
Example
Productions/ Rules/ Grammar:
15
Syntax Analysis Phase (Parser) -
Example
• Statement: The girl purchased a car
• Parser takes input of a sentence and a grammar
(set of rules)
• Generates parse tree
• If parse tree is generated then the statement is
valid or the syntax of the statement is correct
16
Syntax Analysis Phase (Parser) -
Example
Sentence
Noun Phrase Verb Phrase
(Subject) (Predicate)
Noun Phrase
Article Noun Verb
(Object)
The girl purchased
Article Noun
a car
17
Syntax Analysis Phase (Parser) -
Example
• Statement: Purchased a car the girl
• This statement cannot be generated using the Set
of Rules (Grammar)
• Similarly, the parser uses the Grammar and check
the input statement according to those rules
18
Syntax Analysis Phase (Parser)
• If Parse Tree is generated then Valid Statement.
• If no parse tree can be generated then Invalid
Statement.
19
Syntax Analysis Phase (Parser)
• As we know that Syntax of Programming
Language is described by CFG (Context Free
Grammar)
• So, first it is needed to understand that what is
CFG?
20
Context Free Grammar (CFG)
• It is the combination of 4 tuples:
1. Start Symbol (S)
2. Set of non-terminals (V)
3. Set of terminals (T)
4. Production Rules (P)
• Mathematically
21
Context Free Grammar (CFG)
• Terminals: (Represented by small alphabets)
• A terminal is a symbol that does not appear on left side of
a production
• All tokens such as plus sign, *, -, small alphabets from a-z,
identifiers etc. – We can’t replace them
• Non-Terminals: (Represented by Capital Letters)
• Non-terminal symbols are those symbols that can be
replaced
• Appears on the left side of production rules
22
Context Free Grammar (CFG) -
Examples
1. Example# 1
S = {S}
V={S,B}
T=(a,c,b}
P=2
Input String: abbc 23