0% found this document useful (0 votes)

3 views6 pages

Complier Construction

The document discusses the role of lexical analysis in compilers, focusing on the creation and management of symbol tables, input buffering techniques, and scanning strategies. It explains how the lexical analyzer identifies tokens, updates the symbol table, and utilizes input buffering for efficient character reading and token recognition. Additionally, it compares single and double buffer schemes, highlighting their advantages and disadvantages.

Uploaded by

turyamanyak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views6 pages

Complier Construction

Uploaded by

turyamanyak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Questions

a) Describe with an illustration a symbol table in relation to lexical analysis

b) Describe the formation and storage of the symbol table entry by the lexical analyzer

c) Explain the role of the lexical analyser in creating and updating the symbol table

d) Explain the concept of input buffering during the lexical analysis phase of a compiler.
Why is input buffering necessary? In your explanation, discuss the concept of reading
ahead and how it assists token recognition. Use appropriate illustrations, including a
diagram showing the input buffer and the movement of pointers, to support your
explanation.

e) Lexical analysis is carried out using a systematic scanning strategy. Analyse how this
scanning approach enables the lexical analyser to accurately determine the end of a
lexeme. In your discussion, clearly explain the mechanism used to detect lexeme
boundaries, especially in cases where a sequence of characters could match more than
one possible token (e.g., relational operators or keywords versus identifiers). Support
your explanation with a well-chosen example.

f) Compare single and double buffer schemes in lexical analysis and discuss the
advantages and disadvantages of each of these

Solutions

a) Lexical analysis is the process of breaking a program’s source code into small meaningful units
called tokens. Lexical analysis relies on symbol tables to efficiently recognize tokens.

A symbol table is a data structure used to store information about identifiers

Illustration of a symbol table.

Name Token Type Scope

Int keyword Local
X Identifier Global
; Separator global
3.14 Constant local

Reference: Aho et al., Compilers: Principles, Techniques, and Tools, Chapter 2.

b) Formation of a Symbol Table Entry

During lexical analysis, the lexical analyzer scans the source program character by character to
identify tokens.

When it encounters an identifier (for example a variable name), the following steps occur:

I. Token Recognition: The lexical analyzer recognizes a sequence of characters as an

identifier according to language rules.
II. Searching the Symbol Table: The lexical analyzer checks whether the identifier already
exists in the symbol table. If it exists, the analyzer returns a pointer/reference to that
entry and if it does not exist, a new entry is created.
III. Creating the Entry: If the identifier is new, the lexical analyzer creates a symbol table
entry containing important attributes such as: Identifier name, Token type (identifier),
Scope information, etc.
IV. Insertion into the Symbol Table: The newly created entry is inserted into the symbol
table so that other phases of the compiler can access it.

The lexical analyzer then returns a token along with a pointer to the symbol table entry to the
parser.

Symbol table entries are stored using efficient data structures to allow fast searching and
insertion. Most commonly used is the Hash Table

c) The role of a lexical analyzer in creating and updating a symbol table

1. Token Identification: The lexical analyzer reads the source code character by character
and groups them into meaningful sequences called tokens.

2. Insertion of Identifiers: Upon identifying a new identifier, the lexical analyzer inserts it
into the symbol table

3. Avoiding Redundancy: Before inserting an identifier, the lexical analyzer checks whether
it already exists in the symbol table within the current scope. If it does, it avoids
duplication.

4. Supporting Semantic Analysis: By populating the symbol table with accurate

information about identifiers, the lexical analyzer enables later stages of compilation
especially semantic analysis to perform type checking, scope resolution, and error
detection effectively.
5. Efficient Lookup and Update: The lexical analyzer facilitates operations such as insert(),
lookup() and update() on the symbol table, ensuring that identifier information is
accessible and modifiable throughout the compilation process

d) Input Buffering is a technique used to speed up reading characters from the source file.

Instead of fetching one character at a time, the compiler reads a block of characters into
memory (the buffer). The lexical analyzer then processes characters directly from this buffer,
which is much faster.

Why is Input Buffering Necessary?

I. Efficiency: Reading data in blocks reduces the number of disk accesses.

II. Look ahead (reading ahead): Sometimes the scanner needs to peek at upcoming
characters to decide what token is being formed.

Reading Ahead and Token Recognition

Reading ahead means the scanner doesn’t just look at the current character, it may need to
check the next one (or more).

Illustration showing input buffer and movement of pointers

Input Buffer

The buffer holds a block of characters from the source code for example

Int count=1%...

Each character is stored in a cell for easy scanning.

Pointers

lexeme_begin: Points to the start of the current token (I in int).

forward: Moves ahead to find the end of the token (=in count=).

Token Recognition

As forward moves, the characters between lexeme_begin and forward form a lexeme. Once a
valid token is recognized (like int, count, or =), the compiler processes it. Then lexeme_begin is
reset to the next token’s start.

e) The mechanism for detecting lexeme boundaries is primarily driven by a finite state machine
(FSM) or deterministic finite automaton (DFA), which models the recognition of token patterns.
As the scanner reads each character, it transitions between states based on the input.

A lexeme ends when:

 The FSM reaches an accepting state (indicating a valid token) and the next character
would cause a transition to an invalid or error state for the current pattern.
 No further valid extension is possible, adhering to the "maximal munch" rule (also called
the longest-match principle). This rule prioritizes the longest possible valid lexeme
starting from the current position to avoid prematurely truncating tokens.
 Explicit delimiters like spaces, punctuation, or newlines intervene, which are often
ignored (as whitespace) or treated as separate tokens.
 In cases of ambiguity (where a character sequence could match multiple possible
tokens) the lexer resolves this using the longest-match rule combined with rule
priorities.

Lookahead (peeking at one or more subsequent characters without consuming them) is often
employed to disambiguate without backtracking, ensuring efficiency.

This approach prevents errors like splitting a compound operator into separate tokens or
misclassifying a keyword as an identifier. Keywords are typically handled by first matching the
pattern for identifiers (e.g., letter followed by letters/digits), then checking the resulting string
against a keyword table. If it matches a reserved keyword, it's tokenized as such; otherwise, it's
an identifier.

Example: Disambiguating Relational Operators

Consider a programming language like C or Java, where relational operators include >, >=, ==,
and =. Suppose the input stream is x >= y == z;. The lexer starts at 'x' (matches identifier
pattern), ends at space (boundary), then skips whitespace.

 At '>', the FSM enters a state for potential '>' or '>=;

 It looks ahead to the next character '='. Since '>=' is a longer valid match (per maximal
munch), it consumes both, tokenizing >= as a single relational operator token rather than
separate '>' and '='.

 After skipping whitespace, at 'y' (identifier), ends at space.

 At '=', the FSM checks ahead: next is another '=', so it matches the longer == (equality
operator) instead of single '=' (assignment).

 If the input were x > y = z;, the '>' would be tokenized alone (no following '='), and '='
would be assignment (no following '=').

Without longest-match and lookahead, the lexer might incorrectly split >= into '>' and '=',
leading to syntax errors later. This scanning strategy ensures accurate boundaries by greedily
extending matches while respecting language rules, making the process deterministic and
efficient.

f) Comparison of Single Buffer and Double Buffer Schemes in Lexical Analysis

1. Single Buffer Scheme: The source program is read into one memory buffer. The lexical
analyzer scans characters from this buffer to form tokens. When the buffer becomes empty, it is
refilled with the next block of characters from the input file.

Advantages:
- Simple to implement.
- Requires less memory since only one buffer is used.

Disadvantages:
- When the buffer is exhausted, the lexical analyzer must pause while the buffer is refilled.
- It may be difficult to handle lexemes that span across the boundary of two input blocks.
- Less efficient because frequent I/O operations may interrupt token recognition.

2. Double Buffer Scheme: Two buffers are used instead of one. Each buffer holds a block of
input characters. While the lexical analyzer scans characters from one buffer, the other buffer
can be refilled with new input. Special markers called sentinels are often placed at the end of
each buffer to signal when the analyzer should switch to the other buffer.
Advantages:
- More efficient because the next buffer can be loaded while scanning continues.
- Reduces the number of input/output interruptions.
- Makes it easier to handle lexemes that cross buffer boundaries.
- Supports faster lexical analysis.

Disadvantages:
- Slightly more complex to implement.
- Requires more memory because two buffers are maintained.

Unit 2
No ratings yet
Unit 2
20 pages
Lexical Analyzer Overview and Functions
No ratings yet
Lexical Analyzer Overview and Functions
56 pages
Lexical Analysis and Input Buffering
No ratings yet
Lexical Analysis and Input Buffering
37 pages
Regular Expressions and Definitions Explained
No ratings yet
Regular Expressions and Definitions Explained
61 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
17 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
55 pages
Lexical Analysis: Tokens and Recognition
No ratings yet
Lexical Analysis: Tokens and Recognition
58 pages
Unit 2 - Lexical Analyzer
No ratings yet
Unit 2 - Lexical Analyzer
34 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
10 pages
Input Buffering in Lexical Analysis
No ratings yet
Input Buffering in Lexical Analysis
26 pages
CD Module2
No ratings yet
CD Module2
23 pages
Lexical Analysis and Input Buffering
No ratings yet
Lexical Analysis and Input Buffering
23 pages
Understanding Lexical Analyzers
No ratings yet
Understanding Lexical Analyzers
24 pages
Compiler Basics: Lexical Analysis Overview
No ratings yet
Compiler Basics: Lexical Analysis Overview
23 pages
Lexical Analysis and Regex Fundamentals
No ratings yet
Lexical Analysis and Regex Fundamentals
41 pages
Compiler Design Principles Overview
No ratings yet
Compiler Design Principles Overview
27 pages
CD Unit Test 1 Notes
No ratings yet
CD Unit Test 1 Notes
14 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
16 pages
Compiler Design Exam Questions & Solutions
No ratings yet
Compiler Design Exam Questions & Solutions
13 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
68 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
62 pages
Chapter Two
No ratings yet
Chapter Two
13 pages
Language Processors and Lexical Analysis
No ratings yet
Language Processors and Lexical Analysis
85 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
6 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
22 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
20 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
52 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
71 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
17 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
5 pages
Finite Automata and Lexical Analysis Guide
0% (1)
Finite Automata and Lexical Analysis Guide
13 pages
Finite Automata and Lexical Analysis
No ratings yet
Finite Automata and Lexical Analysis
28 pages
Lexical Analyzer in Compiler Design
No ratings yet
Lexical Analyzer in Compiler Design
30 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
73 pages
Language Processing and Lexical Analysis
No ratings yet
Language Processing and Lexical Analysis
60 pages
Lexical Analyzer Role and Error Recovery
No ratings yet
Lexical Analyzer Role and Error Recovery
8 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
59 pages
Lexical Analyzer Design and Functions
No ratings yet
Lexical Analyzer Design and Functions
15 pages
Phases of Compiler Explained
No ratings yet
Phases of Compiler Explained
15 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
12 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
71 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
63 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
58 pages
Compiler Lexical Analysis Explained
No ratings yet
Compiler Lexical Analysis Explained
45 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
63 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
48 pages
Lexical Analysis in Compiler Construction
No ratings yet
Lexical Analysis in Compiler Construction
8 pages
Understanding Compilers and Lexical Analysis
No ratings yet
Understanding Compilers and Lexical Analysis
36 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
16 pages
Lexical Analysis: Tokens and Buffering Techniques
No ratings yet
Lexical Analysis: Tokens and Buffering Techniques
16 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
36 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
102 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
26 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
44 pages
Lexical Error Handling Techniques
No ratings yet
Lexical Error Handling Techniques
56 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
6 pages
Lexical Analyzer Overview and Functions
No ratings yet
Lexical Analyzer Overview and Functions
15 pages
Lexical Analysis Techniques and Tools
No ratings yet
Lexical Analysis Techniques and Tools
88 pages
Lexical Analyzer Design Overview
No ratings yet
Lexical Analyzer Design Overview
43 pages
Example: Disambiguating Relational Operators
No ratings yet
Example: Disambiguating Relational Operators
1 page
1st Lecture Compiler (1) Phoenix
No ratings yet
1st Lecture Compiler (1) Phoenix
44 pages
Urgent Help: GPS Emergency Alert System
No ratings yet
Urgent Help: GPS Emergency Alert System
8 pages
Industrial Training Report at Fundi Bots
No ratings yet
Industrial Training Report at Fundi Bots
34 pages
Industrial Training Report at Fundibots
No ratings yet
Industrial Training Report at Fundibots
28 pages
Understanding Non-Deterministic Finite Automata
No ratings yet
Understanding Non-Deterministic Finite Automata
27 pages
NFA to DFA Conversion Guide
No ratings yet
NFA to DFA Conversion Guide
30 pages
Understanding Non-Deterministic Finite Automata
No ratings yet
Understanding Non-Deterministic Finite Automata
25 pages
ET-BERT: Transformer for Encrypted Traffic
No ratings yet
ET-BERT: Transformer for Encrypted Traffic
11 pages
AJA FS4 Manual v2.0r1
No ratings yet
AJA FS4 Manual v2.0r1
136 pages
Butterworth Filter Implementations
No ratings yet
Butterworth Filter Implementations
4 pages
Speccy System Overview for Windows 10
No ratings yet
Speccy System Overview for Windows 10
69 pages
C Programming: Overview and Basics
No ratings yet
C Programming: Overview and Basics
59 pages
SolarX Robot Training Manual
No ratings yet
SolarX Robot Training Manual
16 pages
Programming Homework Solutions Summary
No ratings yet
Programming Homework Solutions Summary
93 pages
Smart Dustbin Project Using Arduino
No ratings yet
Smart Dustbin Project Using Arduino
9 pages
Azure DevOps Workflow Guide
No ratings yet
Azure DevOps Workflow Guide
2 pages
BOSS - Suporte - GT-10 - Drivers e Atualizações
No ratings yet
BOSS - Suporte - GT-10 - Drivers e Atualizações
4 pages
RAM and ROM Memory Concepts
No ratings yet
RAM and ROM Memory Concepts
19 pages
Fainguelernt 2008
No ratings yet
Fainguelernt 2008
4 pages
Makalah Isas Io w10 Final
No ratings yet
Makalah Isas Io w10 Final
22 pages
Mastech HY3005 Power Supply Overview
100% (1)
Mastech HY3005 Power Supply Overview
2 pages
HP Laptop System Information Report
No ratings yet
HP Laptop System Information Report
36 pages
Driver Signature Verification Report
No ratings yet
Driver Signature Verification Report
60 pages
Voice-Based Email System for the Blind
No ratings yet
Voice-Based Email System for the Blind
6 pages
Electrical Engineer with AI Expertise
No ratings yet
Electrical Engineer with AI Expertise
2 pages
Encoder Design in Digital Systems
No ratings yet
Encoder Design in Digital Systems
15 pages
Radio - 4499 Data Sheet
0% (1)
Radio - 4499 Data Sheet
2 pages
PCM System Quantization Concepts
No ratings yet
PCM System Quantization Concepts
7 pages
Controlling Your Pumps Has Never Been Easier: Level Control For Emptying and Filling Applications
No ratings yet
Controlling Your Pumps Has Never Been Easier: Level Control For Emptying and Filling Applications
8 pages
Wi-Fi Smart Wall Switch User Manual
No ratings yet
Wi-Fi Smart Wall Switch User Manual
50 pages
A Program Is Not Considered Finished Until
No ratings yet
A Program Is Not Considered Finished Until
10 pages
Enhanced Student ID System Using RFID
No ratings yet
Enhanced Student ID System Using RFID
7 pages
GameCenterBizApplication Startup Log
No ratings yet
GameCenterBizApplication Startup Log
12 pages
QLC Project Block Diagram Overview
No ratings yet
QLC Project Block Diagram Overview
37 pages
Topin GPS Communication Protocol Guide
No ratings yet
Topin GPS Communication Protocol Guide
30 pages
Blackbox RS232 Parallel
No ratings yet
Blackbox RS232 Parallel
71 pages
(QCX Blendexpert Software Manual)
No ratings yet
(QCX Blendexpert Software Manual)
74 pages

Complier Construction

Uploaded by

Complier Construction

Uploaded by

Questions

a) Describe with an illustration a symbol table in relation to lexical analysis

A symbol table is a data structure used to store information about identifiers

Illustration of a symbol table.

Name Token Type Scope

Reference: Aho et al., Compilers: Principles, Techniques, and Tools, Chapter 2.

b) Formation of a Symbol Table Entry

I. Token Recognition: The lexical analyzer recognizes a sequence of characters as an

c) The role of a lexical analyzer in creating and updating a symbol table

4. Supporting Semantic Analysis: By populating the symbol table with accurate

Why is Input Buffering Necessary?

I. Efficiency: Reading data in blocks reduces the number of disk accesses.

Reading Ahead and Token Recognition

Illustration showing input buffer and movement of pointers

Each character is stored in a cell for easy scanning.

lexeme_begin: Points to the start of the current token (I in int).

A lexeme ends when:

Example: Disambiguating Relational Operators

 At '>', the FSM enters a state for potential '>' or '>=;

 After skipping whitespace, at 'y' (identifier), ends at space.

f) Comparison of Single Buffer and Double Buffer Schemes in Lexical Analysis

You might also like