0% found this document useful (0 votes)

8 views17 pages

Compiler Phases and Lexical Analysis Guide

The document provides an overview of compilers, including their phases, such as lexical and syntax analysis, and the concepts of tokens and lexemes. It explains the differences between single-pass and multi-pass compilers, as well as the bootstrapping process in compiler design. Additionally, it covers tools like LEX and YACC for generating lexical analyzers and parsers, respectively.

Uploaded by

alkamansi03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views17 pages

Compiler Phases and Lexical Analysis Guide

Uploaded by

alkamansi03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT I

Compiler
 Compiler is a translator program that translates a program written in (High Level
Language)) the source program and translates it into an equivalent program
in (Machine Level Language) the target program.
 As an important part of a compiler is error showing to the programmer.
Assembler
 Programs known as assembler were written to automate the translation of assembly
language into machine language
language.
The input to an assembler program is called source program,, the output is a machine
language translation
Phases of a Compiler
 A compiler operates in phases. A phase is a logically interrelated operation that
takes source program in one representation and produces output in another
representation.

Tokens and lexemes

In lexical analysis, tokens and lexemes are key concepts. A token is a category, like a
keyword or identifier, representing units of meaning. A pattern defines the structure that
matches a token, such as a regular expression for an identifier. A lexeme is the actual
sequence of characters in the source code that matches a token’s pattern.
In programming, a token is the smallest unit of meaningful data; it may be an identifier,
keyword, operator, or symbol. A token represents a series or sequence of characters that
cannot be decomposed further. In languages such as C, some examples of tokens would
include:
 Keywords : Those reserved words in C like ` int `, ` char `, ` float `, ` const `, ` goto `,
etc.
 Identifiers: Names of variables and user-defined functions.
 Operators : ` + `, ` – `, ` * `, ` / `, etc.
 Delimiters /Punctuators: Symbols used such as commas ” , ” semicolons ” ; ” braces
` {} `.

Example 1:
int a = 10; //Input Source code
Tokens
int (keyword), a(identifier), =(operator), 10(constant) and ;(punctuation-semicolon)
Answer – Total number of tokens = 5

Example 2:
int main() {
// printf() sends the string inside quotation to
// the standard output (the display)
printf("Welcome to GeeksforGeeks!");
return 0;
}
Tokens
'int', 'main', '(', ')', '{', 'printf', '(', ' "Welcome to GeeksforGeeks!" ',
')', ';', 'return', '0', ';', '}'

Answer – Total number of tokens = 14

Lexeme
A lexeme is a sequence of source code that matches one of the predefined patterns and
thereby forms a valid token. For example, in the expression `x + 5`, both `x` and `5` are
lexemes that correspond to certain tokens. These lexemes follow the rules of the language
in order for them to be recognized as valid tokens.
A lexeme is an actual character sequence forming a specific instance of a token, such as
num.

Lexeme- A lexeme is a string of character that is the lowest level syntactic unit in the
programming language.

Token- The token is a syntactic category that forms a class of lexemes that means which
class the lexeme belong is it a keyword or identifier or anything else. One of the major
tasks of the lexical analyzer is to create a pair of lexemes and tokens, that is to collect all the
characters.
Let us take an example:-
if(y<= t)
y=y-3;

Token Lexeme
KEYWORD if
LEFT PARENTHESIS (
IDENTIFIER y
< =COMPARISON =
IDENTIFIER t
RIGHT PARENTHESIS )

IDENTIFIER y

ASSGNMENT =

IDENTIFIER y

ARITHMATIC -

INTEGER 3

SEMICOLON ;

Lexical Analysis
 Lexical analysis is the first phase of a compiler. It takes the source code from user and
produces the output in the form of tokens. The lexical analyzer breaks these syntaxes
into a series of tokens, by removing any whitespace or comments in the source code.

x = a+b*50
x -> Identifier- (id, 1)
= -> Operator - Assignment
a -> Identifier- (id, 2)
+ -> Operator - Binary Addition
b -> Identifier- (id, 3)
* -> Operator - Multiplication
50 -> Constant - Integer

Now, the final tokenized expression is given below.

(id, 1) = (id, 2) + (Id, 3)*50

Syntax Analysis
 The second stage of translation is called syntax analysis or parsing. In this
phase expressions, statements, declarations are identified by using the results of
lexical analysis.
 A Syntax analyzer creates the syntactic structure of the given program.
Compiler Pass
A Compiler pass refers to the traversal of a compiler through the entire
program. Compiler passes are of two types Single Pass Compiler, and Two Pass
Compiler or Multi-Pass Compiler. These are explained as follows.
Types of Compiler Pass
1. Single Pass Compiler
If we combine or group all the phases of compiler design in a single module (unit) known
as a single pass compiler.

Single Pass Compiler: In the above diagram, there are all 6 phases are grouped in a single
module, some points of the single pass compiler are as:
 A one-pass/single-pass compiler is a type of compiler that passes through the part of
each compilation unit exactly once.
 Single pass compiler is faster and smaller than the multi-pass compiler.
 A disadvantage of a single-pass compiler is that it is less efficient in comparison with
the multipass compiler.
 A single pass compiler is one that processes the input exactly once, so going directly
from lexical analysis to code generator, and then going back for the next read.
Note: Single pass compiler almost never done, early Pascal compiler did this as an
introduction.
Problems with Single Pass Compiler
 We can not optimize very well due to the context of expressions are limited.
 As we can’t back up and process, it again sogrammar should be limited or simplified.
 Command interpreters such as bash/sh/tcsh can be considered Single pass compilers,
but they also execute entries as soon as they are processed.

Two-Pass compiler or Multi-Pass compiler

A Two pass/multi-pass Compiler is a type of compiler that processes the source
code or abstract syntax tree of a program multiple times. In multi-pass Compiler, we divide
phases into two or more passes as:

First Pass is referred as

 Front end
 Analytic part
 Platform independent
Second Pass is referred as
 Back end
 Synthesis Part
 Platform Dependent
Problems that can be Solved With Multi
Multi-Pass Compiler

First Pass: If we want to design a compiler for a different programming language for the
same machine. In this case for each programming language, there is a requirement to make
the Front end/first pass for each of them and only one Back end/second pass as:

Second Pass: If we want to design a compiler for the same programming language for
different machines/systems. In this case, we make different Back end for different
Machine/system and make only one Front end for the same programming language as:
Conclusion
In conclusion, the choice between a single-pass and a two-pass compiler depends on
specific requirements and trade-offs. Single-pass compilers are faster and more compact but
have limitations in optimization and grammar complexity. Two-pass compilers, on the
other hand, offer greater flexibility for different programming languages and machine
systems but come at the cost of additional processing. Understanding these distinctions is
crucial for designing effective compilers.

Bootstrapping

Bootstrapping is an important technique in compiler design, where a basic compiler is

used to create a more advanced version of itself. This process helps in building compilers
for new programming languages and improving the ones already in use. By starting with a
simple compiler, bootstrapping allows gradual improvements and makes the compiler more
efficient over time.
 Bootstrapping relies on the idea of a self-compiling compiler, where each iteration
improves the compiler’s ability to handle more complex code.
 It simplifies the development cycle, allowing incremental improvements and faster
deployment of more robust compilers.
 Many successful programming languages, including C and Java, have used
bootstrapping techniques during their development.

A compiler can be used three distinct languages.

Source Language (S): This is the input language provided to the compiler. It represents the
code that needs to be compiled.
Target Language (T): This is the output language produced by the compiler after the
compilation process. It is the language that the source code is translated into.
Implementation Language (I): This is the language used to design and implement the
compiler itself.

The T-Diagram Representation

The relationship between these languages can be visually represented using T-diagrams.
The T-diagram illustrates how the source language (S) is transformed into the target
language (T) through the implementation language (I).
A compiler can be represented using a T Diagram. In this diagram, the source language of
the compiler is positioned at the top-left, the target language (the language produced by the
compiler) is placed at the top-right, and the language in which the compiler is implemented
is shown at the bottom.
Working of Bootstrapping
Bootstrapping is the process of creating compilers. It involves a methodology where a
slightly more complicated compiler is created using a simple language (such as assembly
language). This slightly more complicated compiler, in turn, is used to create an even more
advanced compiler, and this process continues until the desired result is achieved.
Here’s a step-by-step look at how bootstrapping works in compiler design:

Step 1: Start with a Basic Compiler

The first step is to create a basic compiler that can handle the most essential features of a
programming language. This simple compiler is often written in assembly language or
machine language to make it easier to build.

Step 2: Use the Basic Compiler to Create a More Advanced Version

Once the basic compiler is ready, it is used to compile a more advanced version of itself.
This new version can handle more complex features, like better error checking and
optimizations.

Step 3: Gradually Improve the Compiler

With each new version, the compiler becomes more capable. The process is repeated, and
each iteration adds more features, making the compiler stronger and more efficient.
For example, let’s assume a compiler which takes C language as input and generates an
assembly language as an output.
1. To generate this compiler we first write a compiler for a small subset of C language i.e.
C0 in assembly language. Subset of C means C language with less functionality.
Here in the T diagram, source language is subset of C (C0), target language is Assembly
language and implementation language is also Assembly language.
2. Then using the C0 language, we create a compiler for the C language. This compiler C0,
takes C language as source language and generates assembly language as target language as
shown below.

By the help bootstrapping we generated compiler for C language in C language itself i.e.
self-compiling compiler.

Cross-compiler
A cross compiler is a compiler capable of creating executable code for a platform other
than the one on which the compiler is running.
Across compiler is a compiler that runs on one m
machine
achine and produces object code for another
machine. For example, a compiler that runs on a windows 7 PC but generates code that runs
on Android Smartphone is a cross compiler.
The cross compiler is used to implement the compiler which is characterized by three distinct
languages:
Source Language (S)
Target Language (T)
Implementation Language (I)

LEX in Compiler Design

It is a tool or software which automatically generates a lexical analyzer (finite Automata). It
takes as its input a LEX source program and produces lexical Analyzer as its output. Lexical
Analyzer will convert the input string entered by the user into tokens as its output.

Lex is a tool or a computer program that generates Lexical Analyzers (converts the stream
of characters into tokens). The Lex tool itself is a compiler. The Lex compiler takes the
input and transforms that input into input patterns.
Function of Lex
1. In the first step the source code which is in the Lex language having the file name ‘File.l’
gives as input to the Lex Compiler commonly known as Lex to get the output as [Link].c.
2. After that, the output [Link].c will be used as input to the C compiler which gives the
output in the form of an ‘[Link]’ file, and finally, the output file [Link] will take the stream of
character and generates tokens as output.
Lex File Format: A Lex program consists of three parts and is separated by % delimiters:-

Declarations
%%
Translation rules
%%
Auxiliary procedures

Declarations: The declarations include declarations of variables.

Transition rules: These rules consist of Pattern and Action.

Auxiliary procedures: The Auxilary section holds auxiliary functions used in the actions.

For example:
declaration
number[0-9]
%%
translation
if {return (IF);}
%%
auxiliary function
int numberSum()

Example
/*** Definition section ***/

%{
/* C code to be copied verbatim */
#include <stdio.h>
%}

%%
/*** Rules section ***/

/* [0-9]+ matches a string of one or more digits */

[0-9]+ {
/* yytext is a string containing the matched text. */
printf("Saw an integer: %s\n", yytext);
}

.|\n { /* Ignore all other characters. */ }

%%
/*** C Code section ***/

int main(void)
{
/* Call the lexer, then quit. */
yylex();
return 0;
}

YACC in Compiler Design

YACC (Yet Another Compiler Compiler) is a tool used to generate a parser. YACC translates
a given Context Free Grammar (CFG) specifications (input in input_file.y) into a C
implementation ([Link].c) of a corresponding push down automaton (i.e., a finite state machine
with a stack). This C program when compiled yields an executable parser.
 An input with a .y extension is given.
 The file is invoked, and 2 files, [Link].h and [Link].c, are created. These files contain long
codes implementing the LARl (1) Parser for grammar.
 This file then provides yyparse.y, which tries to parse a valid sentence successfully.
The structure of YACC programs
A YACC program consists of three sections: Declarations, Rules and Auxiliary functions.
(Note the similarity with the structure of LEX programs).

DECLARATIONS

RULES

AUXILIARY FUNCTIONS
Examples of DFA and NFA
Example
Design a FA with ∑ = {0, 1} accepts those string which starts with 1 and ends with 0.
The FA will have a start state q0 from which only the edge with input 1 will go to the next
state.

Example
Design FA with ∑ = {0, 1} accepts even number of 0's and even number of 1's.
Solution:
This FA will consider four different sstages
tages for input 0 and input 1. The stages could be:

Example
Design FA with ∑ = {0, 1} accepts the set of all strings with three consecutive 0's.
Solution:
The strings that
at will be generated for this particular languages are 000, 0001, 1000, 10001, ....
in which 0 always appears in a clump of 3. The transition graph is as follows:

Example
Design a FA with ∑ = {0, 1} accepts the strings with an even numb
number
er of 0's followed by
single 1.
Solution:
The DFA can be shown by a transition diagram as:
Design an NFA with ∑ = {0, 1} accepts all string in which the third symbol from the right
end is always 0.
Solution:

Thus we get the third symbol from the right end as '0' always. The NFA can be:

Example
Convert
ert the given NFA to DFA.

Solution: For the given transition diagram we will first construct the transition table.

DFA
Example
Convert the given NFA to DFA.

DFA

Compiler Design Overview and Phases
100% (1)
Compiler Design Overview and Phases
120 pages
Compiler Design Overview and Phases
100% (1)
Compiler Design Overview and Phases
193 pages
UNIT1 Compiler
No ratings yet
UNIT1 Compiler
195 pages
Compiler Functions and Stages Explained
No ratings yet
Compiler Functions and Stages Explained
13 pages
Compiler Design: Lexical Analysis Overview
No ratings yet
Compiler Design: Lexical Analysis Overview
95 pages
Compiler Design: Phases and Tools
100% (1)
Compiler Design: Phases and Tools
36 pages
Automata in Compiler Design Overview
No ratings yet
Automata in Compiler Design Overview
55 pages
Compiler Book Note
No ratings yet
Compiler Book Note
68 pages
Compiler Design Syllabus Overview
No ratings yet
Compiler Design Syllabus Overview
60 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
31 pages
Compiler Design Concepts and Tools
No ratings yet
Compiler Design Concepts and Tools
31 pages
Compiler Design Principles Explained
No ratings yet
Compiler Design Principles Explained
68 pages
Overview of Compiler Design Concepts
No ratings yet
Overview of Compiler Design Concepts
5 pages
Compiler Design and Language Processing
No ratings yet
Compiler Design and Language Processing
42 pages
Compiler Design Overview and Classification
No ratings yet
Compiler Design Overview and Classification
43 pages
Lect Notes
100% (1)
Lect Notes
40 pages
Compiler Design and Phases Overview
No ratings yet
Compiler Design and Phases Overview
22 pages
Compiler Construction Overview and Phases
No ratings yet
Compiler Construction Overview and Phases
36 pages
Turing Machines and Compiler Design Overview
No ratings yet
Turing Machines and Compiler Design Overview
18 pages
Overview of Compiler Design Phases
No ratings yet
Overview of Compiler Design Phases
39 pages
Introduction to Compiler Design
No ratings yet
Introduction to Compiler Design
30 pages
Compiler Design: Language Processing
No ratings yet
Compiler Design: Language Processing
12 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
67 pages
DFAs for Binary String Languages
No ratings yet
DFAs for Binary String Languages
23 pages
Compiler Construction Overview
No ratings yet
Compiler Construction Overview
41 pages
Language Processors and Compiler Phases
No ratings yet
Language Processors and Compiler Phases
14 pages
Compiler Design and Language Processing
No ratings yet
Compiler Design and Language Processing
19 pages
Introduction to Compiler Basics
No ratings yet
Introduction to Compiler Basics
29 pages
Introduction to Compiler Design
No ratings yet
Introduction to Compiler Design
8 pages
Compiler Design: Lexical Analysis Overview
No ratings yet
Compiler Design: Lexical Analysis Overview
54 pages
Compiler Design Overview and Structure
No ratings yet
Compiler Design Overview and Structure
23 pages
Compiler Design and Language Processing
No ratings yet
Compiler Design and Language Processing
15 pages
Compiler Design Fundamentals and Techniques
No ratings yet
Compiler Design Fundamentals and Techniques
45 pages
Compiler Design Guide for GATE PYQ
No ratings yet
Compiler Design Guide for GATE PYQ
20 pages
Compiler Design Course Material
No ratings yet
Compiler Design Course Material
118 pages
Compiler Design Complete Notes
No ratings yet
Compiler Design Complete Notes
20 pages
50XE
No ratings yet
50XE
54 pages
Compiler Design: Unit 1 Overview
100% (1)
Compiler Design: Unit 1 Overview
13 pages
Compiler Design Overview and Concepts
No ratings yet
Compiler Design Overview and Concepts
31 pages
Automata in Compiler Design Overview
No ratings yet
Automata in Compiler Design Overview
638 pages
Ut 3
No ratings yet
Ut 3
51 pages
Lexical Analysis and Compiler Basics
No ratings yet
Lexical Analysis and Compiler Basics
109 pages
Compiler Phases and Lexical Analysis
No ratings yet
Compiler Phases and Lexical Analysis
46 pages
Compiler Design: Lexical Analysis Overview
No ratings yet
Compiler Design: Lexical Analysis Overview
60 pages
Compiler Design Overview and Phases
100% (1)
Compiler Design Overview and Phases
66 pages
Overview of Compiler Design History
No ratings yet
Overview of Compiler Design History
20 pages
23cs1531 CD - It Notes
No ratings yet
23cs1531 CD - It Notes
74 pages
Compiler Phases and Functions Explained
No ratings yet
Compiler Phases and Functions Explained
31 pages
Compiler Conts 13-11-2015
No ratings yet
Compiler Conts 13-11-2015
7 pages
Overview of Compiler Design Phases
No ratings yet
Overview of Compiler Design Phases
111 pages
Compiler Design: Lexical & Syntax Analysis
No ratings yet
Compiler Design: Lexical & Syntax Analysis
31 pages
Compiler Overview and Phases Explained
No ratings yet
Compiler Overview and Phases Explained
35 pages
System Software-Module 4
No ratings yet
System Software-Module 4
11 pages
Nondeterministic Finite Automata in Compilers
No ratings yet
Nondeterministic Finite Automata in Compilers
15 pages
Introduction to Compilers Overview
No ratings yet
Introduction to Compilers Overview
77 pages
Understanding Compiler Construction Basics
No ratings yet
Understanding Compiler Construction Basics
36 pages
Compiler Design and Structure Overview
No ratings yet
Compiler Design and Structure Overview
23 pages
Project Schedule Development Guide
No ratings yet
Project Schedule Development Guide
2 pages
Android OS Framework and Source Structure
No ratings yet
Android OS Framework and Source Structure
4 pages
Think Python: Learn Programming Basics
No ratings yet
Think Python: Learn Programming Basics
296 pages
ACF Composer Compatibility Report
No ratings yet
ACF Composer Compatibility Report
10 pages
Android Workshop Overview and Details
100% (2)
Android Workshop Overview and Details
72 pages
Coding Basics for Web Development
No ratings yet
Coding Basics for Web Development
14 pages
Understanding Perl Variables and Syntax
No ratings yet
Understanding Perl Variables and Syntax
53 pages
Vivo Gallery Debugging Errors Log
No ratings yet
Vivo Gallery Debugging Errors Log
11 pages
CC Manual
No ratings yet
CC Manual
36 pages
DA1458x Software Release Notes V 3 0 6 0
No ratings yet
DA1458x Software Release Notes V 3 0 6 0
13 pages
HTML Tags and Table Creation Guide
No ratings yet
HTML Tags and Table Creation Guide
34 pages
HTML, CSS, and JavaScript Practice Lab
No ratings yet
HTML, CSS, and JavaScript Practice Lab
3 pages
Comprehensive Bike Registration Data
No ratings yet
Comprehensive Bike Registration Data
1 page
Python Fundamentals Mind Map
No ratings yet
Python Fundamentals Mind Map
5 pages
CLIPS Interfaces Guide Version 6.24
No ratings yet
CLIPS Interfaces Guide Version 6.24
119 pages
Elements of Compiler Design Overview
No ratings yet
Elements of Compiler Design Overview
2 pages
Dynamic Data Structures: Linked Lists
No ratings yet
Dynamic Data Structures: Linked Lists
68 pages
Understanding C++ Templates and Functions
No ratings yet
Understanding C++ Templates and Functions
37 pages
Exploratory Test Charter for Game of Greed
No ratings yet
Exploratory Test Charter for Game of Greed
7 pages
Python Flow of Control for Class 11
No ratings yet
Python Flow of Control for Class 11
48 pages
Cadence Genus Synthesis Tutorial
No ratings yet
Cadence Genus Synthesis Tutorial
13 pages
Class XII Computer Science Test II 2025-26
No ratings yet
Class XII Computer Science Test II 2025-26
4 pages
ANSI SQL Clauses Exercise Guide
No ratings yet
ANSI SQL Clauses Exercise Guide
4 pages
Multithreading in Software Development
No ratings yet
Multithreading in Software Development
8 pages
ROSP Mini Project
No ratings yet
ROSP Mini Project
26 pages
Overview of Loader Types and Functions
No ratings yet
Overview of Loader Types and Functions
17 pages
Class XII Computer Science Sample Paper
No ratings yet
Class XII Computer Science Sample Paper
8 pages
IncrediBuild and Intel Parallel Tools
No ratings yet
IncrediBuild and Intel Parallel Tools
17 pages
Text to Sign Language Emoji Converter
No ratings yet
Text to Sign Language Emoji Converter
3 pages
FortiWeb™ Web Application Firewall (PDFDrive)
No ratings yet
FortiWeb™ Web Application Firewall (PDFDrive)
414 pages