0% found this document useful (0 votes)

13 views24 pages

Understanding Lexical Analysis Basics

Uploaded by

Jayesh Wagh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views24 pages

Understanding Lexical Analysis Basics

Uploaded by

Jayesh Wagh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lexical Analysis

Acknowledgement
• Alfred V Aho, Monica S. Lam, Ravi Sethi,
Jeffrey D Ullman- “Compilers- Principles,
Techniques and Tools”

Girish Kumar Patnaik 2

Lexical Analysis
• Main task of the lexical analyzer is to read
the input characters of the source program,
group them into lexemes, and produce as
output a sequence of tokens for each lexeme
in the source program.
• The stream of tokens is sent to the parser for
syntax analysis.

Girish Kumar Patnaik 3

The Role of t he Lexical
Analyzer
Token,
Source Lexical tokenval
Program Parser
Analyzer
Get next
token

error error

Symbol Table

Girish Kumar Patnaik 4

The Role of the Lexical Analyzer
• Other tasks besides identification of
lexemes
– stripping out comments and whitespace
– Correlating error messages generated by the
compiler with the source program
• line number with each error message
• copy of the source program with the error messages
inserted

Girish Kumar Patnaik 5

Lexical Analysis Versus Parsing
• Simplicity of design is the most important
consideration
– Separation of lexical and syntactic analysis
– For example, deal with comments and whitespace
• Compiler efficiency is improved
– specialized techniques can be applied
• Compiler portability is enhanced
– Input-device-specific peculiarities can be restricted to
the lexical analyzer
Girish Kumar Patnaik 6
Tokens, Patterns, and Lexemes
• A token is a pair consisting of a token name and
an optional attribute value.
• A pattern is a description of the form that the
lexemes of a token may take.
• A lexeme is a sequence of characters in the source
program that matches the pattern for a token and is
identified by the lexical analyzer as an instance of
that token.

Girish Kumar Patnaik 7

Tokens, Patterns, and Lexemes
• A token is a classification of lexical units
– For example: id and num
• Lexemes are the specific character strings that
make up a token
– For example: abc and 123
• Patterns are rules describing the set of lexemes
belonging to a token
– For example: “letter followed by letters and digits” and
“non-empty sequence of digits”
Girish Kumar Patnaik 8
Tokens, Patterns, and Lexemes

Girish Kumar Patnaik 9

Attributes of Tokens

y := 31 + 28*x Lexical analyzer

<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>

token
tokenval
Parser
(token attribute)
Girish Kumar Patnaik 10
Attributes of Tokens

Girish Kumar Patnaik 11

Lexical Errors
• The simplest recovery strategy is "panic mode"
recovery. We delete successive characters from
the remaining input, until the lexical analyzer can
find a well-formed token at the beginning of what
input is left.
• Other possible error-recovery actions are:
– Delete one character from the remaining input.
– Insert a missing character into the remaining input.
– Replace a character by another character.
– Transpose two adjacent characters.

Girish Kumar Patnaik 12

Exercises
Divide the following C + + program:

f loat limitedSquare (x) f loat x {

/* returns x-squared , but never more than 1 00 */
return (x<=- 10 . 0 / / x>=1 0 . 0) ? 1 00 : x*x ;
}
into appropriate lexemes.

Girish Kumar Patnaik 13

Input Buffering
• Need to look one or more characters
beyond the next lexeme before we can be
sure we have the right lexeme.
• Single-character operators like - , =, or <
could also be the beginning of a two-
character operator like - > , ==, or <=.

Girish Kumar Patnaik 14

Input Buffering: Buffer Pairs

• Each buffer is of the same size N

• Read N characters into a buffer
• Pointer “lexemeBegin”, marks the beginning of the current
lexeme
• Pointer “forward” scans ahead until a pattern match is found
• If reached the end of one of the buffers, then reload the other
buffer from the input
Girish Kumar Patnaik 15
Input Buffering: Sentinels

• For each character read, we make two tests:

– one for the end of the buffer,
– other to determine what character is read (the latter may be a
multiway branch)
• Combine the buffer-end test with the test for the current
character
• Each buffer hold a sentinel character at the end
• Sentinel is a special character that cannot be part of the
source program (eof) Girish Kumar Patnaik 16
The Lex and Flex Scanner
Generators
• Lex and its newer cousin flex are scanner
generators
• Systematically translate regular definitions
into C source code for efficient scanning
• Generated code is easy to integrate in C
applications

Girish Kumar Patnaik 17

Creating a Lexical Analyzer
with Lex and Flex
lex
source lex or flex [Link].c
program compiler
lex.l

[Link].c C [Link]
compiler

input sequence
stream [Link] of tokens

Girish Kumar Patnaik 18

Lex Specification
• A lex specification consists of three parts:
regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }
Girish Kumar Patnaik 19
Regular Expressions in Lex
x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x
[^xyz] x, y
y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1 r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
r1\r2 match r1 when followed by r2
{d} match the regular expression defined by d
Girish Kumar Patnaik 20
Example Lex Specification 1
Contains
%{ the matching
Translation #include <stdio.h> lexeme
%}
rules %%
[0-9]+ { printf(“%s\n”, yytext); }
.|\n { }
%% Invokes
main() the lexical
{ yylex(); analyzer
}

lex spec.l
gcc [Link].c -ll
./[Link] < spec.l
Girish Kumar Patnaik 21
Example Lex Specification 2
%{
#include <stdio.h> Regular
int ch = 0, wd = 0, nl = 0;
definition
Translation %}
delim [ \t]+
rules %%
\n { ch++; wd++; nl++; }
^{delim} { ch+=yyleng; }
{delim} { ch+=yyleng; wd++; }
. { ch++; }
%%
main()
{ yylex();
printf("%8d%8d%8d\n", nl, wd, ch);
}
Girish Kumar Patnaik 22
Example Lex Specification 3
%{
#include <stdio.h> Regular
%}
definitions
Translation digit [0-9]
letter [A-Za-z]
rules id {letter}({letter}|{digit})*
%%
{digit}+ { printf(“number: %s\n”, yytext); }
{id} { printf(“ident: %s\n”, yytext); }
. { printf(“other: %s\n”, yytext); }
%%
main()
{ yylex();
}

Girish Kumar Patnaik 23

Example Lex Specification 4
%{ /* definitions of manifest constants */
#define LT (256)
…
%}
delim [ \t\n]
ws {delim}+
letter [A-Za-z] Return
digit [0-9]
id {letter}({letter}|{digit})* token to
number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
%%
parser
{ws} { }
if {return IF;} Token
then {return THEN;}
else {return ELSE;}
attribute
{id} {yylval = install_id(); return ID;}
{number} {yylval = install_num(); return NUMBER;}
“<“ {yylval = LT; return RELOP;}
“<=“ {yylval = LE; return RELOP;}
“=“ {yylval = EQ; return RELOP;}
“<>“ {yylval = NE; return RELOP;}
“>“ {yylval = GT; return RELOP;}
“>=“
%%
{yylval = GE; return RELOP;} Install yytext as
int install_id() identifier in symbol table
Girish Kumar Patnaik 24
…

Lexical Analysis and Analyzer Generators
No ratings yet
Lexical Analysis and Analyzer Generators
69 pages
Lexical Analysis and Token Recognition
No ratings yet
Lexical Analysis and Token Recognition
5 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
22 pages
Lexical Analysis and Analyzers Explained
No ratings yet
Lexical Analysis and Analyzers Explained
63 pages
Scanner Generator in Compiler Design
No ratings yet
Scanner Generator in Compiler Design
17 pages
Lexical Analyzer Using LEX/Flex Tool
No ratings yet
Lexical Analyzer Using LEX/Flex Tool
8 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
39 pages
Lexical Analysis and Token Generation
No ratings yet
Lexical Analysis and Token Generation
66 pages
Unit 1
No ratings yet
Unit 1
43 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
36 pages
Lexical Analyzer with Flex and Lex
No ratings yet
Lexical Analyzer with Flex and Lex
8 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
40 pages
Lexical Analyzer Design Overview
No ratings yet
Lexical Analyzer Design Overview
43 pages
Lexical Analyzer Design and Tools
No ratings yet
Lexical Analyzer Design and Tools
20 pages
Lexical Analysis in Compiler Design
100% (1)
Lexical Analysis in Compiler Design
52 pages
Lexical Analyzer and Tokenization Overview
No ratings yet
Lexical Analyzer and Tokenization Overview
16 pages
Compiler Lexical Analyzer
No ratings yet
Compiler Lexical Analyzer
16 pages
Lexical Analyzer in Compiler Design
No ratings yet
Lexical Analyzer in Compiler Design
40 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
153 pages
Lexical Analysis with LEX and yytext
No ratings yet
Lexical Analysis with LEX and yytext
69 pages
Lexical Analyzer Creation Guide
No ratings yet
Lexical Analyzer Creation Guide
25 pages
Simple Lex Program Overview
No ratings yet
Simple Lex Program Overview
10 pages
Lexical Analysis in Compiler Design
100% (1)
Lexical Analysis in Compiler Design
69 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Lexical Analyzer in Compiler Design
No ratings yet
Lexical Analyzer in Compiler Design
37 pages
Lexical Analysis and Tokenization
No ratings yet
Lexical Analysis and Tokenization
37 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
38 pages
Unit 2
No ratings yet
Unit 2
20 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Module 5
No ratings yet
Module 5
78 pages
Lexical Analyzer: Token Recognition Guide
No ratings yet
Lexical Analyzer: Token Recognition Guide
16 pages
2 - Lexical Analysis Final
No ratings yet
2 - Lexical Analysis Final
90 pages
Lexical Analysis in Compilation Models
No ratings yet
Lexical Analysis in Compilation Models
194 pages
Compiler Design Fundamentals and Analysis
No ratings yet
Compiler Design Fundamentals and Analysis
46 pages
Chapter Two
No ratings yet
Chapter Two
13 pages
Lexical Analysis and Token Specification
No ratings yet
Lexical Analysis and Token Specification
28 pages
Lexical Analysis with Lex Compiler
No ratings yet
Lexical Analysis with Lex Compiler
19 pages
CD Unit Test 1 Notes
No ratings yet
CD Unit Test 1 Notes
14 pages
Compiler Design Principles Explained
No ratings yet
Compiler Design Principles Explained
53 pages
Lexical Analysis Errors in Compiler Design
100% (1)
Lexical Analysis Errors in Compiler Design
37 pages
UNIT-1 Compiler Design
No ratings yet
UNIT-1 Compiler Design
38 pages
CD Chapter 1
No ratings yet
CD Chapter 1
28 pages
CD Chapter 1
No ratings yet
CD Chapter 1
28 pages
Lexical Analyzer with Lex and Flex
No ratings yet
Lexical Analyzer with Lex and Flex
20 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
CD Chap 1
No ratings yet
CD Chap 1
28 pages
RMM TOCC Unit 1 Part 2 Lect02 Lexical Analysis
No ratings yet
RMM TOCC Unit 1 Part 2 Lect02 Lexical Analysis
68 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
73 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
14 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
150 pages
Lexical Analysis in Compiler Construction
No ratings yet
Lexical Analysis in Compiler Construction
8 pages
Lex Compiler Overview and Usage
No ratings yet
Lex Compiler Overview and Usage
4 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
23 pages
Lex Syntax and Regular Expressions Guide
No ratings yet
Lex Syntax and Regular Expressions Guide
27 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
22 pages
Lexical Analyzer Generator Overview
No ratings yet
Lexical Analyzer Generator Overview
27 pages
Optimizing Lennard-Jones Parameters in AI3/MM
No ratings yet
Optimizing Lennard-Jones Parameters in AI3/MM
10 pages
MC74HC173 CMOS Flip-Flop Datasheet
No ratings yet
MC74HC173 CMOS Flip-Flop Datasheet
6 pages
Nonlinear Beam-Plate Analysis in Abaqus
No ratings yet
Nonlinear Beam-Plate Analysis in Abaqus
7 pages
Natural Convection in Domestic Refrigerators
No ratings yet
Natural Convection in Domestic Refrigerators
10 pages
Emission Spectra and Bohr Model Insights
100% (1)
Emission Spectra and Bohr Model Insights
15 pages
Python Pandas Lab Exercises Guide
No ratings yet
Python Pandas Lab Exercises Guide
17 pages
Analog Circuit Fault Diagnosis Using WPA-GRNN
No ratings yet
Analog Circuit Fault Diagnosis Using WPA-GRNN
11 pages
Complete NMTC Class 5 and 6 Questions Bank With Soutions
No ratings yet
Complete NMTC Class 5 and 6 Questions Bank With Soutions
289 pages
Slurry TBM Excavation Management in Singapore
No ratings yet
Slurry TBM Excavation Management in Singapore
18 pages
Overview of ISRO's Launch Vehicles
No ratings yet
Overview of ISRO's Launch Vehicles
4 pages
BRTI Methodology Guide Overview
No ratings yet
BRTI Methodology Guide Overview
15 pages
Injection Quill Nozzle Data Sheet
No ratings yet
Injection Quill Nozzle Data Sheet
3 pages
5th Year Maths Exam - Christmas 2023
No ratings yet
5th Year Maths Exam - Christmas 2023
17 pages
Kaeser Sigma Controller CustomerManual - Ver3
100% (2)
Kaeser Sigma Controller CustomerManual - Ver3
62 pages
Class 3 National Science Talent Exam Solutions
No ratings yet
Class 3 National Science Talent Exam Solutions
3 pages
Standard Sizes for Super T Girder Design
No ratings yet
Standard Sizes for Super T Girder Design
1 page
PIC18 Instruction Cycle Analysis
No ratings yet
PIC18 Instruction Cycle Analysis
2 pages
Classification of Matter Lesson Plan
No ratings yet
Classification of Matter Lesson Plan
4 pages
Vipra Varna Kundali Analysis
No ratings yet
Vipra Varna Kundali Analysis
17 pages
A19BAC & A28AA Thermostat Installation Guide
No ratings yet
A19BAC & A28AA Thermostat Installation Guide
6 pages
Surface Chemistry Concepts for Class 11
No ratings yet
Surface Chemistry Concepts for Class 11
12 pages
Factors Influencing Orthodontic Forces
No ratings yet
Factors Influencing Orthodontic Forces
14 pages
Dpu 2950
0% (1)
Dpu 2950
32 pages
Database Management System Course Overview
No ratings yet
Database Management System Course Overview
3 pages
1118H Lathe Operation Manual & Parts List
No ratings yet
1118H Lathe Operation Manual & Parts List
68 pages
Wind Turbine State-Space Modeling Techniques
No ratings yet
Wind Turbine State-Space Modeling Techniques
33 pages
Manufacturing Engineer Profile: Anthony Song
No ratings yet
Manufacturing Engineer Profile: Anthony Song
2 pages
Makita 18V Battery Compatibility Chart
No ratings yet
Makita 18V Battery Compatibility Chart
1 page
Antigen-Antibody Reaction Insights
No ratings yet
Antigen-Antibody Reaction Insights
10 pages
CAPE Chemistry U1 P2 2021
100% (1)
CAPE Chemistry U1 P2 2021
16 pages

Understanding Lexical Analysis Basics

Uploaded by

Understanding Lexical Analysis Basics

Uploaded by

Lexical Analysis

Girish Kumar Patnaik 2

Girish Kumar Patnaik 3

Girish Kumar Patnaik 4

Girish Kumar Patnaik 5

Girish Kumar Patnaik 7

Girish Kumar Patnaik 9

y := 31 + 28*x Lexical analyzer

Girish Kumar Patnaik 11

Girish Kumar Patnaik 12

f loat limitedSquare (x) f loat x {

Girish Kumar Patnaik 13

Girish Kumar Patnaik 14

• Each buffer is of the same size N

• For each character read, we make two tests:

Girish Kumar Patnaik 17

Girish Kumar Patnaik 18

Girish Kumar Patnaik 23

You might also like