0% found this document useful (0 votes)

27 views62 pages

Lexical Analysis in Compiler Design

The document outlines the role of a lexical analyzer in compiler design, emphasizing the separation of lexical analysis and parsing for improved efficiency and portability. It discusses tokens, patterns, and lexemes, along with input buffering techniques, error recovery methods, and the specification of tokens using regular expressions. Additionally, it covers the construction of finite automata and transition diagrams for recognizing tokens in programming languages.

Uploaded by

AAA School

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views62 pages

Lexical Analysis in Compiler Design

Uploaded by

AAA School

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Prof.

Maya B S
Assistant Professor
Department of CS&E
BIT, Bangalore
Outline
 Role of lexical analyzer
 Input Buffering
 Specification of tokens
 Recognition of tokens
The role of lexical analyzer
token
Source Lexical To semantic
program Parser analysis
Analyzer
getNextToken

Symbol
table
Why to separate Lexical analysis
and parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Tokens, Patterns and Lexemes
 A token is a pair a token name and an optional token
value
 A pattern is a description of the form that the lexemes
of a token may take.
 A lexeme is a sequence of characters in the source
program that matches the pattern for a token
Example
Token Informal description Sample lexemes
if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);

Attributes for tokens
 E = M * C ** 2
 <id, pointer to symbol table entry for E>
 <assign-op>
 <id, pointer to symbol table entry for M>
 <mult-op>
 <id, pointer to symbol table entry for C>
 <exp-op>
 <number, integer value 2>
Lexical errors
 Some errors are out of power of lexical analyzer to
recognize:
 fi (a == f(x)) …
 However it may be able to recognize errors like:
 d = 2r
 Such errors are recognized when no pattern for tokens
matches a character sequence
Error recovery
 Panic mode: successive characters are ignored until we
reach to a well formed token.
 Delete one character from the remaining input
 Insert a missing character into the remaining input
 Replace a character by another character
 Transpose two adjacent characters
Input buffering
The lexical analyser uses two pointers to read tokens.
 Lexeme-begin Pointer (lb)marks the beginning of
the current lexeme ,whose extent are attempting to
determine.
 Search-pointer(forward-pointer) (sp)that keep track
of the portion of the input string scanned.
Initially both pointers point to the beginning of a lexeme
in figure.
The forward pointer “sp” then start scanning forward to
search for the end of lexeme. The end of the lexeme is
indicated by blank space .
Lb sp
Begin I := I + 1 : J
 Reading input character by character from the secondary
storage is costly.

 A block of data is read first into a buffer and then scanned by

the lexical analyser.

 because of the amount of time taken to process character and

the large number of character that must be processed during the
compilation of a large source program , specialized buffering
technique have been developed to reduce the amount of
overhead required to process a single input character.
Two methods are available
 One buffer scheme: There are problems if a lexeme crosses the
buffer boundary. To scan the rest of the lexeme, the buffer has
to be refilled there by overwriting the first part of the lexeme.

 Two buffer scheme: Buffer1 and buffer2 are scanned

alternatively when the end of the current buffer is reached,
other buffer is filled. Hence the problem encountered in the
previous method is solved.
Two buffer scheme
 Each buffer is of the same size N.

 N is usually of disk block 4096bytes.

 Using one system read command we can read N character into

buffer, rather than using one system call per character.

 If fewer than N character remain in the input file, then a special

character represented by eof marks the end of the source file
and is different from any possible character of the source
program.
E = M * C * * 2 eof
 Advancing forward requires that we first test whether we have
reached the end of one of the buffer and if so, we must reload
the other buffer from the input and move forward to the
beginning of the newly loaded buffer.

 Each time we advance forward for each character read, we

make two test.

 One for the end of the buffer

 One to determine what character is read.

 We can combine the buffer end test with the test for current
character if we extend each buffer to hold a sentinel character
at the end.
 The sentinels is a special character that cannot be part of the
source program and a natural choice is the character eof.

 Eof retains its use as a marker for the end of the entire input.

 Any eof that appears other than at the end of buffer means the
input is at an end.
Algorithm for forward pointer with sentinels

E = M eof * C * * 2 eof eof

Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Specification of tokens
 Strings and language:
Alphabet: An alphabet is any finite set of symbols.
Symbols are letter,digit,punctuation.
Ex: set {0,1} is binary alphabet.
String: A string over an alphabet is finite sequence of symbol
drawn from that alphabet.
The length of string is |S|.
Ex: S=banana, |S|= 6
Empty string epsilon.(Ɛ)
1.Ɛs=sƐ=s (empty string is identity under concatenation)
2. x=dog, y=cat, xy=dogcat
3. S^3 =SSS
Language: A language is any countable set of string over some
fixed alphabet.
 Empty set {Ɛ} ,set containing only the empty string , are called
abstract language.

Operations on language:
 UNION
 CONCATENATION
 CLOSURE Kleene , Positive
operations Definition and notation

Union of two languages L and M L U M = {s | s is in L or s is in M}

Concatenation of two languages L and LM = {st | s is in L and t is in M}

The Kleene Closure of a language L L* = Zero or more occurrence of

language L.

The positive Closure of a language L L+= one or more occurrence of language

L.
 Let L be the set of letter {A,B,--Za,b---z},the alphabet of
uppercase and lower case and it is language ,all of whose string
happen to be length one.
 Let D be the set of digits {0,1,---9} ,alphabet of digits and it is
language of string length one.
 Some of the other language can be constructed from language L and D
using operation.
1. L U D= is the set of letter and digits , strictly speaking the language with -
--------- string of length one, each of which strings either one letter or one
digits.
2. LD=is the set of -----------string of length two, each consisting of one
letter followed by one digit.
3. L4 = is the set of all ----------letter string.
4. D+ = is the set of string of one or more digits.
5. L(L U D)*= is the set of all string of letter and digits beginning with
letter.
Regular Expressions
 In theory of compilation regular expressions are used to
formalize the specification of tokens
 Regular expressions are means for specifying regular languages
 Example:
 Letter_(letter_ | digit)*

 Each regular expression is a pattern specifying the form of

strings.
 RE are built recursively out of smaller RE, using below rules.
 Each RE ‘r’ denotes a language L(r ), which is also define
recursively from the language denoted by ‘r’ subexpressions.
The rules denote RE over some alphabet ∑ & language that those expression
denote.

Basis: There are two rules that form basis

1. Ɛ is a regular expression, L(Ɛ) = {Ɛ},i,.e language contain empty
string.
2. If ‘a’ is a symbol in ∑ then a is a regular expression, L(a) =
{a},i.e language with one string of length one.
Induction: suppose r and s are RE denoting language L(r )&
L(s )respectively.
 (r) | (s) is a regular expression denoting the language L(r) ∪ L(s)
 (r)(s) is a regular expression denoting the language L(r)L(s)
 (r)* is a regular expression denoting (L(r))*
 (r) is a regular expression denting L(r), this rule say that we can
add additional pair of parenthesis around expressions without
changing the language.
Examples of RE
 Let ∑={a,b}
1. RE a|b denotes language ----------
2. (a|b)(a|b) denotes language_________
3. a * denotes language_________
4. a|a*b denotes language______________
5. (a|b)*___________
6. (a*b*)*______________
Algebraic laws of RE

A language that can be defined by RE is called a “regular set”

If two RE r and s denote the same regular set, then both are
equivalent r=s.

Ex=(a|b)= (b|a)

Below table show the algebraic laws that hold for arbitrary RE r,
s,and t.
No LAW Description

1 r|s=s|r | is commutative

2 r |(s|t)=(r|s)|t | is associative

3 r(st)=(rs)t Concatenation is associative

4 r(s|t)=rs|rt Concatenation distributes over |

5 Ɛr=rƐ=r Ɛ is the identity for concatenation

6 r= (r|Ɛ) Ɛ is guaranteed in a closure

7 r**=r* * is idempotent
Regular definitions
 We may give names to RE and use those names in subsequent expressions,
if the names were themself symbols.
 If ∑ is an alphabet of basic symbols, then regular definition is a sequence of
definition of the form.
d1 -> r1
d2 -> r2
…
dn -> rn
Where ,
 Each di is a new symbol , not in ∑ and not the same as any other of the di
and
 Each ri is RE over the alphabet ∑ U {d1,d2,….di-1}

Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions
 One or more instances: (r)+= rr*
 Zero of one instances: r? = r| Ɛ
 Character classes: [abc]=[a|b|c]
[Link] Regular Definition for C language identifier in both
extended and short notation.
 letter_ -> [A-Za-z_]
 digit -> [0-9]
 id -> letter_(letter|digit)*
2. Write a RD for unsigned number(integer or floating point) are string
5280, 0.01234,6.336E4, 1.89E-4

 digit0|1|2|-----|9
 digits digit digit*
 optionalFraction.digits | Ɛ
 optionalExponent(E(+|-| Ɛ)digits)| Ɛ
 Number digits optionalFraction optionalExponent

Short notation:
 digit[0-9]
 digits digits+
 Number digits(.digits)?(E[+-]?digits)?
Recognition of tokens
 Study about how to take pattern for all needed tokens and build
a piece of code that examines the input string and find a prefix
that is a lexeme matching one of the character.
 Starting point is the language grammar to understand the
tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.) for braching
statement
 The next step is to define all the terminals recognize from
lexical analysis phase for patterns using RD:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
 Write regular definition to handle whitespaces:
ws -> (blank | tab | newline)+
Below table hows the each lexeme , which token name is returned
to the parser and what attribute value.
Lexeme(pattern) Token-name Attribute value
Any ws - -
if if -
Then Then -
Else Else -
Any id Id Pointer to table entry
Any number Number Pointer to table entry
< Relop LT
> Relop GT
<= Relop LE
= Relop EQ
<> Relop NT
>= Relop GE
Transition diagrams
 An intermediate step in the construction of a lexical analyser ,
we first convert pattern into flowcharts called transition
diagram(TD).

 TD have a collection of nodes or circles called states.

 Each states represents a condition that could occur during the

process of scanning the input looking for a lexeme that matches
one of several patterns.

 Edges are directed from one state of the TD to another.

 Each edge is labelled by a symbol or set of symbols.

 Assume all our TD are deterministic, meaning that there is
never more than one edge out of a given state with a given
symbol among its labels.
 Certain states are said to be accepting or final states. These
states indicate that lexeme has been found. We always indicate
an accepting state by a double circle.
 If there is an action to be taken , typically returning s token and
attribute value to the parser . We shall attach that action to the
accepting state.
 In addition , if it is necessary to retract the forward pointer one
position (i.e the lexeme does not include the symbol that got us
to accepting state), then we shall additionally place a * near
that accepting state.
 One state is designated the start state or initial state, it is
indicated by an edge, labelled start entering from no where.
Finite Automata
 Regular expressions = specification
 Finite automata = implementation

 A finite automaton consists of

 An input alphabet 
 A set of states S
 A start state n
 A set of accepting states F  S
 A set of transitions state input state

35
Finite Automata

 Transition
s1 a s2
 Is read
In state s1 on input “a” go to state s2

 If end of input
 If in accepting state => accept, othewise => reject
 If no transition possible => reject

36
Finite Automata State Graphs

 A state

• The start state

• An accepting state

a
• A transition

37
1.A Simple Example

 A finite automaton that accepts only “1”

 A finite automaton accepts a string if we can follow

transitions labeled with the characters in the string
from the start to some accepting state

38
2. Another Simple Example

 A finite automaton accepting any number of 1’s

followed by a single 0
 Alphabet: {0,1}

 Check that “1110” is accepted but “110…” is not

39
3. Another Example
 Alphabet {0,1}
 What language does this recognize?

1 0

0 0

1
1

40
Epsilon Moves

 Another kind of transition: -moves


A B

• Machine can move from state A to state B

without reading input

41
Deterministic and Nondeterministic Automata
 Deterministic Finite Automata (DFA)
 One transition per input per state
 No -moves
 Nondeterministic Finite Automata (NFA)
 Can have multiple transitions for one input in a given state
 Can have -moves
 Finite automata have finite memory
 Need only to encode the current state

42
Execution of Finite Automata

 A DFA can take only one path through the state graph
 Completely determined by input

 NFAs can choose

 Whether to make -moves
 Which of multiple transitions for a single input to take

43
Acceptance of NFAs

 An NFA can get into multiple states

0 1

• Input: 1 0 1

• Rule: NFA accepts if it can get in a final state

44
NFA vs. DFA (1)

 NFAs and DFAs recognize the same set of languages

(regular languages)

 DFAs are easier to implement

 There are no choices to consider

45
NFA vs. DFA (2)

 For a given language the NFA can be simpler than the

DFA
1
0 0
NFA
0

1 0
0 0
DFA
1
1

• DFA can be exponentially larger than NFA

46
Regular Expressions to Finite Automata

 High-level sketch

NFA

Regular
expressions DFA

Lexical Table-driven
Specification Implementation of DFA

47
[Link] diagrams for relational operator (VVIMP)
 Transition diagram for relop, we must retract state 4,
has * to indicate that we must retract the input one
position.
Architecture of a transition-diagram-based lexical analyzer for relational
operator
TOKEN getRelop()
{
TOKEN retToken = new (RELOP)
while (1) { /* repeat character processing until a
return or failure occurs */
switch(state) {
case 0: c= nextchar();
if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else fail(); /* lexeme is not a relop */
break;
case 1: c=nextchar();
If(c==“=‘”) state=2;
Else if(c==“>”) state=3;
Else state= 4;
Break;
Case 2: [Link]=LE;
return(retToken);
break;
case 3: [Link]=NE;
return(retToken);
break;
Case 4: retract();
[Link] = LT;
return(retToken);
Break;
Case 5: [Link]=EQ;
return(retToken);
break;
Case 6: c=nextchar();
if(c==“=“) state=7;
else state=8;
Case 7:[Link]=GE;
return(retToken);
break;
case 8: retract();
[Link] = GT;
}
 2. Transition diagram for reserved words and identifiers.

Keywords like if,then,else,char,const etc are reserved words, so they are not
identifier even though they look like identifier. Above diagram work for both
identifier and reserved words.

Install the reserved word in the symbol table initially.

A field of the symbol table entry indicates that these string are never ordinary
identifier and tells which token they represent.
When we find an identifier ,a call to installID place it in the
symbol table if it is not already there and returns a pointer to the
symbol table entry for the lexeme found.

Any identifier not in the symbol table during lexical analysis

cannot be a reserved word, so it is id.
Program segment for identifier or reserverd
{starting in state 0}
If the next character is a letter then
Advance the input;
{now in state 1}
While the next character is a letter or digit do
Advance the input; {stay in state 1}
End while;
{go to state 2, without advancing the input}
Accept;
Else
{error or other cases}
End if
3. Transition diagram for unsigned numbers.
4. Transition diagram for whitespace
 Write separate transition diagram for each keyword If, then,
case, char, continue.
Write transition diagram for C comment lines.
 {state 1}
 If the next character is “/” then
 Advance the input ; {state 2;}
 If the next character is “*” then
 Advance the input ; {state 3;}
 Done:=false;
 While not done do
 While the next input character is not “*” do
 Advance the input;
 End while;
 Advance the input; {state 4;} other *

 / 2 * * /
1 2 3 other 4 5
2
 While the next input character is “*” do
 Advance the input;
 End while;
 If the next input character is “/” then
 Done:=true;
 End if
 Advance the input;
 End while;
 Accept;{state 5;}
 Else { other processing}
 End if;
 Else {other processing}
 End if;
Lexical Analyzer Generator - Lex

Lex Source program Lexical [Link].c

lex.l Compiler

[Link].c
C [Link]
compiler

Sequence
Input stream [Link]
of tokens
Structure of Lex programs

Syntax:

declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Example for relational operator
%{
Int installID() {/* funtion to install the
/* definitions of manifest constants
lexeme, whose first character is
LT, LE, EQ, NE, GT, GE, pointed to by yytext, and whose
IF, THEN, ELSE, ID, NUMBER, RELOP */ length is yyleng, into the symbol
%} table and return a pointer thereto
*/
/* regular definitions }
delim [ \t\n]
ws {delim}+ Int installNum() { /* similar to
installID, but puts numerical
letter [A-Za-z]
constants into a separate table */
digit [0-9]
}
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?

%%
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);}
…
Thank You

EContent 11 2025 08 12 13 15 41 Unit2 SSPDF 2025 07 15 07 41 15
No ratings yet
EContent 11 2025 08 12 13 15 41 Unit2 SSPDF 2025 07 15 07 41 15
126 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
35 pages
Token Specification in Compiler Design
No ratings yet
Token Specification in Compiler Design
9 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
121 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
11 pages
Lexical Analyzer: Concepts & Techniques
No ratings yet
Lexical Analyzer: Concepts & Techniques
73 pages
Compiler Lexical Analysis Explained
No ratings yet
Compiler Lexical Analysis Explained
45 pages
Lexical Analysis in Compiler Design
100% (1)
Lexical Analysis in Compiler Design
69 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
122 pages
Lexical Analysis Concepts and Techniques
No ratings yet
Lexical Analysis Concepts and Techniques
92 pages
Lexical Analyzer in Compiler Design
No ratings yet
Lexical Analyzer in Compiler Design
64 pages
Lexical Analyzer Overview and Functions
No ratings yet
Lexical Analyzer Overview and Functions
64 pages
Lexical Analyzer in Compiler Design
No ratings yet
Lexical Analyzer in Compiler Design
89 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
85 pages
Lexical Analysis Overview and Techniques
No ratings yet
Lexical Analysis Overview and Techniques
160 pages
Lexical Analysis Fundamentals and Techniques
No ratings yet
Lexical Analysis Fundamentals and Techniques
102 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
16 pages
Lexical Analysis Patterns Explained
No ratings yet
Lexical Analysis Patterns Explained
55 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
20 pages
Lexical Analysis and Finite Automata Guide
No ratings yet
Lexical Analysis and Finite Automata Guide
104 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
27 pages
Input Buffering in Lexical Analysis
No ratings yet
Input Buffering in Lexical Analysis
28 pages
Lexical Analyzer Techniques and Concepts
No ratings yet
Lexical Analyzer Techniques and Concepts
84 pages
Lexical Analyzer Overview and Concepts
No ratings yet
Lexical Analyzer Overview and Concepts
84 pages
Lexical Analyzer in Compiler Design
No ratings yet
Lexical Analyzer in Compiler Design
30 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
93 pages
Lexical Analyzer Design Overview
No ratings yet
Lexical Analyzer Design Overview
43 pages
Lexical Analysis and Scanning Techniques
No ratings yet
Lexical Analysis and Scanning Techniques
31 pages
CD Module2
No ratings yet
CD Module2
23 pages
Lexical Analysis and Input Buffering
No ratings yet
Lexical Analysis and Input Buffering
37 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
60 pages
Lexical Analyzer in Compiler Design
No ratings yet
Lexical Analyzer in Compiler Design
36 pages
Compiler Design: Lexical Analysis Explained
No ratings yet
Compiler Design: Lexical Analysis Explained
92 pages
Lexical Analyzer: Role and Functions
No ratings yet
Lexical Analyzer: Role and Functions
62 pages
ACD - Unit2 Notes - Part2
No ratings yet
ACD - Unit2 Notes - Part2
12 pages
Lexical Analyzer Functions Explained
No ratings yet
Lexical Analyzer Functions Explained
62 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
53 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
102 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
38 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
57 pages
Compiler Design Principles Overview
No ratings yet
Compiler Design Principles Overview
27 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
52 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
44 pages
Lexical Analysis: Tokens and Automata
No ratings yet
Lexical Analysis: Tokens and Automata
49 pages
Input Buffering in Lexical Analysis
No ratings yet
Input Buffering in Lexical Analysis
26 pages
Understanding Tokens in Compiler Design
No ratings yet
Understanding Tokens in Compiler Design
23 pages
Understanding Lexical Analyzers
No ratings yet
Understanding Lexical Analyzers
24 pages
Unit 2
No ratings yet
Unit 2
20 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
67 pages
Lexical Analysis in Compiler Design
100% (1)
Lexical Analysis in Compiler Design
59 pages
Lexical Analysis Overview and Concepts
100% (3)
Lexical Analysis Overview and Concepts
51 pages
Lexical Analysis and Input Buffering
No ratings yet
Lexical Analysis and Input Buffering
23 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
62 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
153 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical Analysis and Token Recognition
No ratings yet
Lexical Analysis and Token Recognition
52 pages
Lexical Analysis and Token Generation
No ratings yet
Lexical Analysis and Token Generation
80 pages
Token Specification in Compiler Design
No ratings yet
Token Specification in Compiler Design
10 pages
Lexical Analysis and Analyzers Explained
No ratings yet
Lexical Analysis and Analyzers Explained
63 pages
Non-Vanishing Feynman Diagrams in QED
No ratings yet
Non-Vanishing Feynman Diagrams in QED
20 pages
Lidar-Radar Fusion for Obstacle Detection
No ratings yet
Lidar-Radar Fusion for Obstacle Detection
11 pages
Basic Math Statistics Unit Overview
No ratings yet
Basic Math Statistics Unit Overview
31 pages
Traverse Surveying: Techniques & Concepts
No ratings yet
Traverse Surveying: Techniques & Concepts
24 pages
Understanding Auxiliary Views in Drawing
No ratings yet
Understanding Auxiliary Views in Drawing
22 pages
Electronic Devices and Circuits
No ratings yet
Electronic Devices and Circuits
10 pages
Maxima and Minima Assignment Class XII
No ratings yet
Maxima and Minima Assignment Class XII
2 pages
Model Selection vs. Averaging in Statistics
No ratings yet
Model Selection vs. Averaging in Statistics
16 pages
Vector Analysis in Electromagnetism
100% (2)
Vector Analysis in Electromagnetism
27 pages
Samarth Physics Examination Details
No ratings yet
Samarth Physics Examination Details
1 page
Temperature Effects on Bridge Decks
No ratings yet
Temperature Effects on Bridge Decks
12 pages
Class 4 Maths Formative Assessment 3
No ratings yet
Class 4 Maths Formative Assessment 3
4 pages
AI Search Techniques Overview
No ratings yet
AI Search Techniques Overview
5 pages
Teaching the Converse of Pythagorean Theorem
No ratings yet
Teaching the Converse of Pythagorean Theorem
16 pages
Sugarcane Harvester Chopper Performance Evaluation
No ratings yet
Sugarcane Harvester Chopper Performance Evaluation
13 pages
Unit III Inner Product Spaces
No ratings yet
Unit III Inner Product Spaces
35 pages
Math Exam for Grade 5 Students
No ratings yet
Math Exam for Grade 5 Students
3 pages
Class XII Linear Programming MCQs
No ratings yet
Class XII Linear Programming MCQs
5 pages
Digital Electronics Question Bank PDF
100% (24)
Digital Electronics Question Bank PDF
40 pages
Dynamic Production Planning with ARMA
No ratings yet
Dynamic Production Planning with ARMA
5 pages
Amazon Valuation Analysis and DCF Model
50% (2)
Amazon Valuation Analysis and DCF Model
65 pages
P5 Maths Average Word Problems
No ratings yet
P5 Maths Average Word Problems
4 pages
Mean Value Theorem Explained
No ratings yet
Mean Value Theorem Explained
8 pages
Two-Digit Number Problems and Divisibility Tests
No ratings yet
Two-Digit Number Problems and Divisibility Tests
45 pages
Panel Data Regression Analysis: OLS vs TSLS
No ratings yet
Panel Data Regression Analysis: OLS vs TSLS
10 pages
Understanding Orthographic Projections
No ratings yet
Understanding Orthographic Projections
21 pages
Structural Behavior of Encased Steel Beams
No ratings yet
Structural Behavior of Encased Steel Beams
14 pages
Unit V - Paths, Path Products and Regular Expressions
100% (1)
Unit V - Paths, Path Products and Regular Expressions
32 pages
Understanding Numbers in English
No ratings yet
Understanding Numbers in English
26 pages
CorpusSearch Guide
No ratings yet
CorpusSearch Guide
104 pages

Lexical Analysis in Compiler Design

Uploaded by

Lexical Analysis in Compiler Design

Uploaded by

Prof.

id Letter followed by letter and digits pi, score, D2

printf(“total = %d\n”, score);

 A block of data is read first into a buffer and then scanned by

 because of the amount of time taken to process character and

 Two buffer scheme: Buffer1 and buffer2 are scanned

 N is usually of disk block 4096bytes.

 Using one system read command we can read N character into

 If fewer than N character remain in the input file, then a special

 Each time we advance forward for each character read, we

 One for the end of the buffer

 One to determine what character is read.

E = M eof * C * * 2 eof eof

Union of two languages L and M L U M = {s | s is in L or s is in M}

Concatenation of two languages L and LM = {st | s is in L and t is in M}

The Kleene Closure of a language L L* = Zero or more occurrence of

The positive Closure of a language L L+= one or more occurrence of language

 Each regular expression is a pattern specifying the form of

Basis: There are two rules that form basis

A language that can be defined by RE is called a “regular set”

3 r(st)=(rs)t Concatenation is associative

4 r(s|t)=rs|rt Concatenation distributes over |

5 Ɛr=rƐ=r Ɛ is the identity for concatenation

6 r*= (r|Ɛ)* Ɛ is guaranteed in a closure

 TD have a collection of nodes or circles called states.

 Each states represents a condition that could occur during the

 Edges are directed from one state of the TD to another.

 Each edge is labelled by a symbol or set of symbols.

 A finite automaton consists of

• The start state

 A finite automaton that accepts only “1”

 A finite automaton accepts a string if we can follow

 A finite automaton accepting any number of 1’s

 Check that “1110” is accepted but “110…” is not

 Another kind of transition: -moves

• Machine can move from state A to state B

 NFAs can choose

 An NFA can get into multiple states

• Rule: NFA accepts if it can get in a final state

 NFAs and DFAs recognize the same set of languages

 DFAs are easier to implement

 For a given language the NFA can be simpler than the

• DFA can be exponentially larger than NFA

Install the reserved word in the symbol table initially.

Any identifier not in the symbol table during lexical analysis

Lex Source program Lexical [Link].c

You might also like

6 r= (r|Ɛ) Ɛ is guaranteed in a closure