0% found this document useful (0 votes)

11 views22 pages

Mod 4 Toc

The document outlines the phases of a compiler, detailing the process of translating high-level language programs into machine-level code through stages like lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and target code generation. It also explains the language processing system, including the roles of the preprocessor, compiler, assembler, linker, and loader. Additionally, it discusses token specification and recognition in lexical analysis, input buffering techniques, and the differences between compilers, interpreters, and assemblers.

Uploaded by

nidhimaa1820

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views22 pages

Mod 4 Toc

Uploaded by

nidhimaa1820

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1. Illustrate the various phases of a compiler.

Show the translation for an

assignment: Position = initial + rate * 60. Clearly indicate the output of
each phase

A compiler is a system program that translates a high-level language

program into an equivalent machine-level program.
This translation is carried out in a sequence of well-defined phases.

1. Lexical Analysis (Scanner)

• Converts the source program into a sequence of tokens
• Removes white spaces and comments
• Creates entries in the symbol table
Input: Source program
Output: Token stream
Example tokens:
<id, position> <=> <id, initial> <+> <id, rate> <*> <num, 60>
2. Syntax Analysis (Parser)
• Checks whether the token sequence follows grammar rules
• Constructs a parse tree / syntax tree
• Detects syntax errors
Input: Tokens
Output: Parse tree

3. Semantic Analysis
• Checks meaning of statements
• Verifies:
o Type compatibility
o Variable declarations
o Scope rules
Input: Parse tree
Output: Annotated syntax tree

4. Intermediate Code Generation

• Produces an intermediate representation (IR)
• Machine-independent
• Common forms:
o Three-address code
o Quadruples
o Triples
Input: Annotated syntax tree
Output: Intermediate code

5. Code Optimization
• Improves intermediate code for:
o Speed
o Memory usage
• Does not change program meaning
Examples:
• Constant folding
• Dead code elimination
• Strength reduction

6. Target Code Generation

• Converts optimized IR into machine-dependent code
• Produces assembly or machine instructions
Input: Optimized IR
Output: Target machine code

Supporting Activities
• Symbol Table Management
• Error Handling

Translation of the Assignment Statement

Statement
Position = initial + rate * 60

1. Lexical Analysis Output

The source statement is broken into tokens.
<id, position>
<=>
<id, initial>
<+>
<id, rate>
<*>
<num, 60>

2. Syntax Analysis Output (Parse Tree)

=
/ \
Position +
/ \
initial *
/\
rate 60
This confirms the statement is syntactically correct.

3. Semantic Analysis Output

• Check:
o position, initial, rate are declared
o Types are compatible (e.g., integer / float)
Annotated tree (types attached):
rate : int
60 : int
rate * 60 : int
initial : int
initial + (rate * 60) : int
position : int
✔ No semantic errors.
4. Intermediate Code Generation
Three-Address Code
t1 = rate * 60
t2 = initial + t1
position = t2

5. Code Optimization
In this example, no further optimization is required.
(Already minimal form.)

6. Target Code Generation (Sample)

Assembly-like Code
LOAD rate
MUL 60
STORE t1
LOAD initial
ADD t1
STORE position

Final Summary Table

Compiler Phase Output

Lexical Analysis Tokens

Syntax Analysis Parse tree

Semantic Analysis Annotated tree

Intermediate Code Three-address code

Optimization Optimized IR
Compiler Phase Output

Target Code Machine / assembly code

2. Explain language processing system phases in detail with suitable
diagram

A Language Processing System is a collection of programs that translate a

program written in a high-level language into machine-executable code
and then execute it.
It bridges the gap between human-readable languages and machine
language.

Phases of the Language Processing System

Preprocessor
Purpose:
Prepares the source program before actual compilation.
Functions:
• Macro expansion (#define)
• File inclusion (#include)
• Conditional compilation (#if, #ifdef)
• Removal of comments
Input: High-level source program
Output: Modified source program (expanded code)

Compiler
Purpose:
Translates the high-level language into assembly or intermediate code.
The compiler itself works in phases:
• Lexical analysis
• Syntax analysis
• Semantic analysis
• Intermediate code generation
• Code optimization
• Target code generation
Input: Preprocessed source code
Output: Assembly / object code (with possible errors reported)

Assembler
Purpose:
Converts assembly language into machine-level object code.
Functions:
• Converts mnemonics into opcodes
• Resolves symbolic addresses
• Produces object file
Input: Assembly code
Output: Object (machine) code

Linker
Purpose:
Combines multiple object files into a single executable file.
Functions:
• Links library functions
• Resolves external references
• Produces executable file
Input: Object files
Output: Linked executable file

Loader
Purpose:
Loads the executable program into main memory and starts execution.
Functions:
• Allocates memory
• Loads program into RAM
• Transfers control to the program
Input: Executable file
Output: Program execution

Interpreter (Alternative to Compiler)

• Translates and executes line by line
• Does not produce object code
• Slower than compiler
• Used in scripting languages
In some systems, interpreter replaces compiler + assembler stages.

Summary Table
Phase Input Output Function

Preprocessor Source code Expanded code Macro handling

Expanded Assembly/Object
Compiler Translation
code code

Assembly Opcode
Assembler Object code
code generation

Linker Object files Executable Linking

Loader Executable Running program Execution

3. List the difference between compiler interpreter and assembler

Feature Compiler Interpreter Assembler

High-level
Input High-level Assembly
language (C,
Language language language
Java, etc.)

Machine / object No permanent Machine

Output
code machine code (object) code

Translates
Translation Translates entire Translates line
instruction by
Method program at once by line
instruction
Feature Compiler Interpreter Assembler

Slow (interprets
Execution Fast (after
during Very fast
Speed compilation)
execution)

Error Reports errors Reports errors Reports syntax

Reporting after compilation line by line errors

Memory More (stores

Less Moderate
Requirement object code)

Object code can No reusable Object code

Reusability
be reused code reusable

Difficult (errors Easy (step-by-

Debugging Moderate
shown together) step)

GCC, Java
Examples Python, Ruby MASM, NASM
Compiler

Part of Alternative to
Phase in LPS After compiler
compilation compiler
4. Explain recognize the tokens for the below given lexems.
i. Identifier
ii. keywords
iii. Relational operator
iv. Unsigned number
v. Whitespace
A lexeme is a sequence of characters in the source program.
A token is a category (type) to which a lexeme belongs.
Lexical analyzers recognize tokens using patterns, usually expressed as
regular expressions.

i. Identifier
Meaning
Identifiers are names used to identify:
• variables
• functions
• arrays
• user-defined entities
Rules
• Must start with a letter
• Followed by letters or digits
• Cannot start with a digit
• Cannot be a keyword
Regular Expression
(letter)(letter | digit)*
Examples
x
sum
total1
Position
Token form
<id, lexeme>

ii. Keywords
Meaning
Keywords are reserved words with predefined meaning in a language.
Rules
• Fixed set of words
• Cannot be used as identifiers
Examples
if, while, int, float, return, for
Recognition method
• First recognized as identifier
• Then matched against keyword table
Token form
<keyword, if>

iii. Relational Operators

Meaning
Used to compare two operands.
Common relational operators
< <= > >= == !=
Regular Expressions
< | <= | > | >= | == | !=
Examples
a<b
x >= y
Token form
<relop, <>
<relop, <=>

iv. Unsigned Number

Meaning
A number without sign (+ or −).
Types
• Integer
• Real (optional, depending on syllabus)
Regular Expression (Integer)
digit+
Regular Expression (Real)
digit+ . digit+
Examples
10
345
60
3.14
Token form
<num, value>

v. Whitespace
Meaning
Characters used to separate tokens.
Includes
• Space
• Tab
• New line
Representation
blank | tab | newline
Regular Expression
( blank | tab | newline )+
Handling
• Ignored by lexical analyzer
• Not passed to parser

Summary Table (Very Useful in Exam)

Token Pattern / Rule Example

Identifier letter(letter digit)*

Keyword Fixed words while

Relational operator <, <=, >, >=, ==, != a <= b

Unsigned number digit+ 60

Whitespace space, tab, newline (ignored)

5. Demonstrate the concept of input buffering and sentinels in lexical

analysis
Input buffering is a technique used by the lexical analyzer to efficiently read
characters from the source program by minimizing the number of
input/output operations.
Instead of reading one character at a time from disk (which is slow),
characters are read in blocks (buffers).

Why input buffering is needed

• Disk access is slow
• Lexical analysis needs frequent lookahead
• Reading one character at a time reduces performance
Input buffering improves speed and efficiency.

Two-Buffer Scheme (Standard Method)

In this method:
• Two buffers of equal size are used
• While one buffer is being processed, the other is filled
• This avoids frequent disk reads
Sentinel Concept
What is a sentinel?
A sentinel is a special character (usually EOF) placed at the end of each
buffer.

Why sentinels are used

Without sentinel:
• Every character read needs a boundary check
• Increases overhead
With sentinel:
• No need to check buffer end every time
• When forward reaches sentinel → buffer is refilled
Sentinels eliminate explicit end-of-buffer checks.

How it works (Step-by-Step)

1. Characters are read into Buffer 1
2. A sentinel (EOF) is placed at the end
3. forward pointer scans characters
4. When forward reaches sentinel:
o Buffer 2 is loaded
o Sentinel is placed at its end
5. Scanning continues seamlessly

Example
Source input:
position = rate * 60
Buffer content:
Buffer1: p o s i t i o n EOF
Buffer2: = r a t e * 6 0 EOF
Lexical analyzer:
• Reads identifiers, operators, numbers
• Switches buffers automatically using sentinel

Advantages of Input Buffering with Sentinels

• Reduces number of I/O operations
• Faster lexical analysis
• Simple control logic
• Efficient lookahead handling

Summary Table
Concept Purpose

Input Buffering Efficient character reading

Two-Buffer Scheme Continuous scanning

Sentinel Avoid buffer boundary checks

lexemeBegin Start of token

forward Scans characters

Example Input (Source Program)

rate = a1 + 25
Assume:
• Buffer size = 8 characters
• Sentinel symbol = EOF
• Two buffers are used

Step 1: Fill the two buffers

Buffer–1
Index: 0 1 2 3 4 5 6 7
Buffer1: r a t e = EOF
Buffer–2
Index: 0 1 2 3 4 5 6 7
Buffer2: a 1 + 2 5 EOF

Step 2: Initialize pointers

lexemeBegin → Buffer1[0]
forward → Buffer1[0]

Step 3: Scan first lexeme (rate)

forward position Character Action

Buffer1[0] r letter → continue

Buffer1[1] a letter → continue

Buffer1[2] t letter → continue

Buffer1[3] e letter → continue

Buffer1[4] space lexeme ends

Token recognized: <id, rate>

Now:
lexemeBegin → Buffer1[4]
forward → Buffer1[4]

Step 4: Skip whitespace

forward Character Action

B1[4] space ignore

B1[5] = operator found

Token: <=>
Update pointers:
lexemeBegin → Buffer1[6]
forward → Buffer1[6]

Step 5: Sentinel encountered

forward Character Action

B1[7] EOF switch buffer

Buffer–2 is already filled

Move forward to Buffer2[0]
forward → Buffer2[0]

Step 6: Scan identifier (a1)

forward Character Action

B2[0] space ignore

B2[1] a start identifier

B2[2] 1 continue

B2[3] space lexeme ends

Token: <id, a1>

Update:
lexemeBegin → Buffer2[3]
forward → Buffer2[3]

Step 7: Scan operator (+)

forward Character Action

B2[4] + operator

Token: <+>
Step 8: Scan number (25)
forward Character Action

B2[5] 2 digit

B2[6] 5 digit

B2[7] EOF number ends

Token: <num, 25>

Final Tokens Generated

<id, rate>
<=>
<id, a1>
<+>
<num, 25>

Key Observations (VERY IMPORTANT)

1. lexemeBegin marks start of token
2. forward scans characters
3. Sentinel (EOF) tells lexer to switch buffers
4. No explicit buffer-end checking is needed

6. Explain specification and recognition of tokens

In lexical analysis, the source program is converted into a sequence of tokens.
This process involves two important tasks:
1. Specification of tokens – defining what tokens look like
2. Recognition of tokens – identifying tokens from the input program
Token Specification
What is token specification?
Token specification defines the structure or pattern of tokens using formal
rules.
These rules describe how lexemes are formed.
Components involved
• Token: A category (identifier, keyword, operator, etc.)
• Lexeme: Actual character sequence in the program
• Pattern: Rule describing the form of lexemes

How tokens are specified?

Tokens are specified using Regular Expressions (REs).
Examples of token specification
Token Regular Expression

Identifier `letter(letter

Keyword Fixed strings (if, while, int)

Number digit+

Relational operator `<

Whitespace `(space

Example
For the statement:
count = rate * 60
Token specifications identify:
• count, rate → identifiers
• = , * → operators
• 60 → number
Token Recognition
What is token recognition?
Token recognition is the process of matching input characters against token
specifications to identify valid tokens.
It is performed by the lexical analyzer (scanner).

How tokens are recognized?

Token recognition uses:
• Finite Automata (FA)
• Transition diagrams
• Lexical analyzer generator tools (e.g., Lex)

Recognition process (step-by-step)

1. Input characters are read from source program
2. Characters are grouped into lexemes
3. Lexemes are matched against regular expressions
4. Corresponding tokens are generated
5. Whitespace and comments are ignored

Example of token recognition

Input:
x1 = 20
Recognized tokens:
<id, x1>
<=>
<num, 20>
Relationship between Specification and Recognition
• Specification tells what a token looks like
• Recognition tells how the token is identified in input
Specification is design-time, recognition is run-time.

Role of Lexical Analyzer

The lexical analyzer:
• Uses token specifications (REs)
• Recognizes tokens using finite automata
• Supplies tokens to the syntax analyzer
• Maintains symbol table entries
• Reports lexical errors

Advantages
• Simplifies syntax analysis
• Improves compiler efficiency
• Enables modular compiler design
• Allows use of automated tools (Lex)

Compiler Design and Language Processing
No ratings yet
Compiler Design and Language Processing
42 pages
Compiler Design Overview and Phases
100% (1)
Compiler Design Overview and Phases
193 pages
UNIT1 Compiler
No ratings yet
UNIT1 Compiler
195 pages
Compiler Phases and Lexical Analysis Guide
No ratings yet
Compiler Phases and Lexical Analysis Guide
46 pages
Compiler Concepts and Phases Explained
No ratings yet
Compiler Concepts and Phases Explained
19 pages
CD
No ratings yet
CD
46 pages
Lexical Analysis and Parsing Overview
No ratings yet
Lexical Analysis and Parsing Overview
24 pages
Compiler Design 4799dee2 b703 4a1b 9a7e D0e6c4a48d2f
No ratings yet
Compiler Design 4799dee2 b703 4a1b 9a7e D0e6c4a48d2f
67 pages
Language Processors and Compiler Phases
No ratings yet
Language Processors and Compiler Phases
14 pages
Compiler Design and Phases Overview
No ratings yet
Compiler Design and Phases Overview
22 pages
Detailed Compiler Phases and Examples
No ratings yet
Detailed Compiler Phases and Examples
30 pages
Compiler Design Principles Overview
No ratings yet
Compiler Design Principles Overview
71 pages
Ut 3
No ratings yet
Ut 3
51 pages
Compiler Design Fundamentals and Techniques
No ratings yet
Compiler Design Fundamentals and Techniques
45 pages
Compiler Module 1 Key Concepts and Questions
No ratings yet
Compiler Module 1 Key Concepts and Questions
14 pages
Compiler Design and Phases Overview
No ratings yet
Compiler Design and Phases Overview
49 pages
Compiler Phases and Lexical Analysis Guide
No ratings yet
Compiler Phases and Lexical Analysis Guide
17 pages
50XE
No ratings yet
50XE
54 pages
UNIT1 Compiler Notes
No ratings yet
UNIT1 Compiler Notes
258 pages
CD Mod 1
No ratings yet
CD Mod 1
105 pages
Phases of Compiler Explained
No ratings yet
Phases of Compiler Explained
17 pages
Compilerdesignnotes
100% (1)
Compilerdesignnotes
210 pages
1-M1-Language Processor-The Structure of A Compiler
No ratings yet
1-M1-Language Processor-The Structure of A Compiler
56 pages
Compiler Design Principles and Phases
No ratings yet
Compiler Design Principles and Phases
20 pages
Compiler Design and Phases Explained
No ratings yet
Compiler Design and Phases Explained
76 pages
Compiler Phases and Lexical Analysis Guide
No ratings yet
Compiler Phases and Lexical Analysis Guide
12 pages
Compiler Notes 1
No ratings yet
Compiler Notes 1
20 pages
Compiler Design: Overview and Phases
No ratings yet
Compiler Design: Overview and Phases
174 pages
Unit1 1
No ratings yet
Unit1 1
209 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
27 pages
Compiler Design Question Bank: Unit 1
No ratings yet
Compiler Design Question Bank: Unit 1
46 pages
Compiler Design Principles and Techniques
No ratings yet
Compiler Design Principles and Techniques
118 pages
Compiler Design and Language Processing
No ratings yet
Compiler Design and Language Processing
122 pages
Introduction to Compiler Design
No ratings yet
Introduction to Compiler Design
30 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
52 pages
Compiler Design: Language Processing Phases
No ratings yet
Compiler Design: Language Processing Phases
79 pages
CDTheory
No ratings yet
CDTheory
24 pages
Compiler Overview and Phases Explained
No ratings yet
Compiler Overview and Phases Explained
35 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
30 pages
TK3163 Sem2 2020 1MyCh1.1-1.2 Intro-20200211121547
No ratings yet
TK3163 Sem2 2020 1MyCh1.1-1.2 Intro-20200211121547
39 pages
Compiler Design: Lexical Analysis Overview
No ratings yet
Compiler Design: Lexical Analysis Overview
95 pages
Compiler Mid-Term Notes (Complete)
No ratings yet
Compiler Mid-Term Notes (Complete)
56 pages
Compiler Design: Language Processing Phases
No ratings yet
Compiler Design: Language Processing Phases
81 pages
Compiler Phases and Language Processing
No ratings yet
Compiler Phases and Language Processing
7 pages
Introduction to Compiler Design Concepts
No ratings yet
Introduction to Compiler Design Concepts
27 pages
Comprehensive Guide to Compiler Design
No ratings yet
Comprehensive Guide to Compiler Design
47 pages
Compiler Design and Language Processing
No ratings yet
Compiler Design and Language Processing
19 pages
Compiler Design: Token Recognition and Phases
No ratings yet
Compiler Design: Token Recognition and Phases
75 pages
Unit 1
No ratings yet
Unit 1
14 pages
Module 1
100% (1)
Module 1
135 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
22 pages
Introduction to Compilers Overview
No ratings yet
Introduction to Compilers Overview
77 pages
Phases of Compiler Design Explained
No ratings yet
Phases of Compiler Design Explained
6 pages
Movie Hub Web Application Overview
No ratings yet
Movie Hub Web Application Overview
5 pages
Java Collections and File I/O Overview
No ratings yet
Java Collections and File I/O Overview
44 pages
Advanced Java Programming Exam Questions
No ratings yet
Advanced Java Programming Exam Questions
3 pages
RCV-UM-01695-EN - RCVcalc Valve Sizing Software Installation Manual
No ratings yet
RCV-UM-01695-EN - RCVcalc Valve Sizing Software Installation Manual
2 pages
Selenium Automation Testing Notes
No ratings yet
Selenium Automation Testing Notes
4 pages
Game Template Development Guide
No ratings yet
Game Template Development Guide
3 pages
BOU CSE Admission 2025 Deadline
No ratings yet
BOU CSE Admission 2025 Deadline
30 pages
Software Development Workflow and Tools
No ratings yet
Software Development Workflow and Tools
169 pages
Essential Linux Commands Guide
No ratings yet
Essential Linux Commands Guide
14 pages
CodersBrain Technology Company Profile
No ratings yet
CodersBrain Technology Company Profile
9 pages
C++ Inheritance & Polymorphism Tasks
0% (1)
C++ Inheritance & Polymorphism Tasks
4 pages
U2020 V300R019 Performance Management Guide
No ratings yet
U2020 V300R019 Performance Management Guide
27 pages
Install VMware Workstation on Windows 10
No ratings yet
Install VMware Workstation on Windows 10
19 pages
OracleAI AutonomousDB 1
No ratings yet
OracleAI AutonomousDB 1
12 pages
LIFO Memory Block Design Implementation
No ratings yet
LIFO Memory Block Design Implementation
2 pages
Motor Control Function Block Overview
No ratings yet
Motor Control Function Block Overview
5 pages
WordPress wp-config.php Configuration Guide
0% (1)
WordPress wp-config.php Configuration Guide
2 pages
Age of Empires II Game Hacking Guide
No ratings yet
Age of Empires II Game Hacking Guide
7 pages
Mtech Sms API v3
No ratings yet
Mtech Sms API v3
3 pages
CLIPS Interfaces Guide Version 6.24
No ratings yet
CLIPS Interfaces Guide Version 6.24
119 pages
Essentials of Requirements Engineering
No ratings yet
Essentials of Requirements Engineering
32 pages
Transforming Data in Power BI Lab
No ratings yet
Transforming Data in Power BI Lab
11 pages
C Programming Principles and Functions
0% (1)
C Programming Principles and Functions
52 pages
Caravansary Help: Hostel Management System
No ratings yet
Caravansary Help: Hostel Management System
10 pages
Java Classes and Objects Explained
No ratings yet
Java Classes and Objects Explained
18 pages
JNTUK MCA R20 Syllabus Overview
No ratings yet
JNTUK MCA R20 Syllabus Overview
87 pages
ACG Users Guide
No ratings yet
ACG Users Guide
50 pages
Tushar Metri: Systems Engineer Profile
No ratings yet
Tushar Metri: Systems Engineer Profile
1 page
Chandrachur Mukherjee's Profile Summary
No ratings yet
Chandrachur Mukherjee's Profile Summary
1 page
Understanding OCL in Model Driven Engineering
No ratings yet
Understanding OCL in Model Driven Engineering
65 pages