Software Engineering – UNIT-2
Language Processors and Assemblers
1. Language Processors and Tools
1.1 What is a Language Processor?
A language processor is system software that translates a program written in one language
into another language.
Commonly:
Source Language Target Language Processor
High-level language Machine language Compiler / Interpreter
Assembly language Machine language Assembler
1.2 Language Processing Activities
When a source program is translated, the language processor performs the following main
activities:
1. Lexical Analysis
Reads the source program character by character.
Groups characters into meaningful units called tokens.
Example:
int a = 10;
Tokens:
int → keyword
a → identifier
= → operator
10 → constant
2. Syntax Analysis
Checks whether the sequence of tokens follows the grammar rules.
Builds a parse tree or syntax structure.
Example:
a = 10;
is syntactically correct.
3. Semantic Analysis
Checks meaning of statements.
Checks:
o type compatibility
o undeclared variables
o multiple declarations
Example:
int a;
a = "hello"; ❌ type mismatch
4. Intermediate Code Generation
Generates machine-independent intermediate code.
Example:
t1 = 10
a = t1
5. Code Optimization
Improves performance and reduces resource usage.
Example:
a = 2 * 4
Optimized to:
a = 8
6. Target Code Generation
Generates machine-dependent code (assembly / machine code).
Summary of Language Processing Activities
Source Program
↓
Lexical Analysis
↓
Syntax Analysis
↓
Semantic Analysis
↓
Intermediate Code
↓
Optimization
↓
Target Code
1.3 Language Processing Tools
Some important tools used during language processing:
Lexical analyzer generators → LEX
Parser generators → YACC
Code generators
Debuggers
Profilers
Below are short and clear class-notes for the given tools used in Language Processors /
Compiler Design.
1. Lexical Analyzer Generators → LEX
LEX is a tool used to automatically generate a lexical analyzer (scanner).
Meaning
A lexical analyzer generator produces a program that can recognize tokens such as keywords,
identifiers, operators, numbers, etc.
Role of LEX
Takes regular expressions as input
Generates a C program for the scanner
The generated scanner:
o reads the source program
o groups characters into tokens
o sends tokens to the parser
Working idea
Input (patterns in LEX file)
↓
LEX tool
↓
Lexical analyzer program
Main features
Easy specification of tokens
Automatically handles pattern matching
Works together with YACC
Example use
LEX is used to detect:
identifiers
keywords
constants
operators
2. Parser Generators → YACC
YACC (Yet Another Compiler Compiler) is a tool used to generate a syntax analyzer
(parser).
Meaning
A parser generator creates a parser automatically from a grammar specification.
Role of YACC
Takes context-free grammar rules as input
Produces a C program for the parser
The parser checks whether the program follows grammar rules
Working idea
Grammar rules
↓
YACC
↓
Parser program
Important points
YACC mainly generates LALR parsers
Works with tokens generated by LEX
Used to build parse tree or syntax structure
Common use
LEX + YACC are commonly used together:
LEX → tokens
YACC → syntax checking
3. Code Generators
A code generator is a tool or compiler phase that produces the target code from the
intermediate representation.
Meaning
It converts:
Intermediate code → machine code / assembly code
Main responsibilities
Select appropriate machine instructions
Assign registers
Produce efficient target code
Output of code generator
Assembly code
or
Object code
Importance
Affects performance of final program
Ensures correct mapping of operations to hardware
4. Debuggers
A debugger is a tool used to find and fix errors in a program.
Purpose
Help programmers observe program execution
Locate logical and runtime errors
Main functions
Set breakpoints
Execute program step by step
Inspect variable values
Trace program flow
Examples of errors handled
wrong output
runtime crashes
incorrect logic
5. Profilers
A profiler is a performance analysis tool.
Meaning
It measures how a program behaves during execution.
Main objectives
Find time-consuming functions
Identify performance bottlenecks
Measure memory usage
Typical information given by a profiler
function execution time
number of function calls
CPU usage
memory consumption
Importance
Profilers are mainly used for:
program optimization
improving execution speed
2. Symbol Table
2.1 Definition
A symbol table is a data structure used by a compiler or assembler to store information about
identifiers.
2.2 Information Stored in a Symbol Table
For each symbol (identifier), it stores:
name
type
scope
memory location (address)
size
parameter information (for functions)
2.3 Why Symbol Table is Needed
To check declarations
To support type checking
To generate correct addresses
To support scope handling
2.4 Operations on Symbol Table
Main operations:
insert(symbol, information)
lookup(symbol)
update(symbol)
2.5 Example
Name Type Scope Address
a int local 1000
sum function global 2000
3. Search and Allocation Data Structures
These data structures are used mainly to implement:
symbol tables
literal tables
label tables
3.1 Search Data Structures
Used for fast lookup.
(a) Linear List
Simple list of symbols.
Search time is O(n).
(b) Binary Search Tree
Faster search.
Average search time O(log n).
(c) Hash Table (Most commonly used)**
Key → hash function → table index
Very fast lookup.
Average time O(1).
3.2 Allocation Data Structures
Used for memory allocation and management.
(a) Stack Allocation
Used for local variables and parameters.
Follows LIFO.
(b) Heap Allocation
Used for dynamic memory.
Memory can be allocated and freed in any order.
(c) Free List
Keeps track of free memory blocks.
4. LEX and YACC – Overview
4.1 LEX
LEX is a lexical analyzer generator.
Purpose
Automatically generates a lexical analyzer.
Input
Token specifications using regular expressions.
Output
C program implementing yylex().
Example rule in LEX
[0-9]+ { return NUMBER; }
Role of LEX
Converts input stream into tokens.
Passes tokens to the parser.
4.2 YACC
YACC stands for Yet Another Compiler Compiler.
Purpose
Generates a parser.
Input
Grammar rules written in BNF-like format.
Output
C program implementing yyparse().
Example grammar
E : E '+' T
| T
;
Role of YACC
Performs syntax analysis.
Builds parse tree.
Relationship between LEX and YACC
Input Program
↓
LEX → produces tokens
↓
YACC → checks grammar and structure
5. Assemblers
5.1 Elements of Assembly Language Programming
Main elements:
(1) Mnemonics
Symbolic names of machine instructions.
Examples:
MOV
ADD
SUB
JMP
(2) Operands
Registers, memory locations, or constants.
Example:
MOV AX, BX
AX and BX are operands.
(3) Labels
Used to name addresses.
Example:
LOOP1:
ADD AX, BX
(4) Directives (Pseudo-ops)
They guide the assembler but do not generate machine code.
Examples:
DB
DW
EQU
ORG
END
(5) Comments
Used for readability.
5.2 Assembler Design
An assembler mainly performs the following functions:
1. Scan source program
2. Maintain symbol table
3. Translate mnemonics into opcodes
4. Assign addresses to labels
5. Produce object code
General structure of an assembler:
Source Program
↓
Lexical processing
↓
Symbol table handling
↓
Instruction translation
↓
Object code generation
5.3 Types of Assemblers
5.3.1 One-Pass Assembler
Definition
A one-pass assembler processes the source program only once.
Problem
Forward references cannot be resolved immediately.
Example:
JMP NEXT
...
NEXT: MOV AX, BX
Technique used
Backpatching
Forward reference lists
Advantages
Faster
Less memory usage
Disadvantages
More complex
Difficult symbol handling
5.3.2 Two-Pass Assembler
Definition
The source program is processed two times.
Pass 1
Assign addresses
Build symbol table
No object code is generated
Pass 2
Generate object code
Resolve all symbol references
Advantages
Simple design
Easy handling of forward references
Disadvantages
Requires two scans
Slightly slower
Comparison
Feature One-Pass Two-Pass
Number of scans 1 2
Forward reference handling Complex Easy
Implementation Difficult Simple
Speed Faster Slower
6. x86 Single-Pass Assembler Algorithm
This is an algorithmic view of a single-pass assembler for x86-like architecture.
Data Structures Used
SYMTAB – Symbol table
FWDREF list – list of unresolved references
LOCCTR – location counter
OPTAB – opcode table
Algorithm: x86 Single-Pass Assembler
Step 1
Initialize:
LOCCTR = starting address
SYMTAB = empty
FWDREF = empty
Step 2
Read next source statement.
Step 3
If statement contains a label:
If label is not in SYMTAB:
o Enter label with current LOCCTR
o Resolve any pending forward references for this label
Else:
o Report duplicate symbol error
Step 4
If statement is an instruction:
Search mnemonic in OPTAB
Generate opcode
For each operand:
o If symbol is already in SYMTAB:
Use its address
o Else:
Create an entry in SYMTAB with undefined address
Add current object code position to FWDREF list
Step 5
If statement is a directive:
Process directive
Update LOCCTR accordingly
Step 6
Store generated machine code with placeholder zeros for unresolved addresses.
Step 7
Update LOCCTR by instruction length.
Step 8
Repeat steps 2 to 7 until END statement is found.
Step 9
After end of program:
If any unresolved forward references remain:
o Report undefined symbol errors
Key idea of Single-Pass x86 assembler
Address resolution and code generation are done together
Forward references are handled using:
o lists of incomplete address fields
o backpatching when the label is defined