0% found this document useful (0 votes)
4 views17 pages

SS Unit-2

The document provides an overview of language processors and assemblers, detailing the functions and activities involved in language processing, including lexical, syntax, and semantic analysis, as well as code generation and optimization. It also discusses tools like LEX and YACC for lexical analysis and parsing, and the structure and types of assemblers, including one-pass and two-pass assemblers. Additionally, it covers the symbol table's role in storing identifier information and the algorithm for a single-pass assembler in x86 architecture.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views17 pages

SS Unit-2

The document provides an overview of language processors and assemblers, detailing the functions and activities involved in language processing, including lexical, syntax, and semantic analysis, as well as code generation and optimization. It also discusses tools like LEX and YACC for lexical analysis and parsing, and the structure and types of assemblers, including one-pass and two-pass assemblers. Additionally, it covers the symbol table's role in storing identifier information and the algorithm for a single-pass assembler in x86 architecture.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Software Engineering – UNIT-2

Language Processors and Assemblers

1. Language Processors and Tools


1.1 What is a Language Processor?

A language processor is system software that translates a program written in one language
into another language.

Commonly:

Source Language Target Language Processor


High-level language Machine language Compiler / Interpreter

Assembly language Machine language Assembler

1.2 Language Processing Activities

When a source program is translated, the language processor performs the following main
activities:

1. Lexical Analysis

 Reads the source program character by character.


 Groups characters into meaningful units called tokens.

Example:

int a = 10;

Tokens:

 int → keyword
 a → identifier
 = → operator
 10 → constant
2. Syntax Analysis

 Checks whether the sequence of tokens follows the grammar rules.


 Builds a parse tree or syntax structure.

Example:

a = 10;

is syntactically correct.

3. Semantic Analysis

 Checks meaning of statements.


 Checks:
o type compatibility
o undeclared variables
o multiple declarations

Example:

int a;
a = "hello"; ❌ type mismatch

4. Intermediate Code Generation

 Generates machine-independent intermediate code.

Example:

t1 = 10
a = t1

5. Code Optimization

 Improves performance and reduces resource usage.

Example:

a = 2 * 4

Optimized to:

a = 8
6. Target Code Generation

 Generates machine-dependent code (assembly / machine code).

Summary of Language Processing Activities


Source Program

Lexical Analysis

Syntax Analysis

Semantic Analysis

Intermediate Code

Optimization

Target Code

1.3 Language Processing Tools


Some important tools used during language processing:

 Lexical analyzer generators → LEX


 Parser generators → YACC
 Code generators
 Debuggers

Profilers

Below are short and clear class-notes for the given tools used in Language Processors /
Compiler Design.

1. Lexical Analyzer Generators → LEX


LEX is a tool used to automatically generate a lexical analyzer (scanner).

Meaning

A lexical analyzer generator produces a program that can recognize tokens such as keywords,
identifiers, operators, numbers, etc.

Role of LEX

 Takes regular expressions as input


 Generates a C program for the scanner
 The generated scanner:
o reads the source program
o groups characters into tokens
o sends tokens to the parser

Working idea
Input (patterns in LEX file)

LEX tool

Lexical analyzer program

Main features

 Easy specification of tokens


 Automatically handles pattern matching
 Works together with YACC

Example use

LEX is used to detect:

 identifiers
 keywords
 constants
 operators

2. Parser Generators → YACC


YACC (Yet Another Compiler Compiler) is a tool used to generate a syntax analyzer
(parser).

Meaning

A parser generator creates a parser automatically from a grammar specification.

Role of YACC

 Takes context-free grammar rules as input


 Produces a C program for the parser
 The parser checks whether the program follows grammar rules

Working idea
Grammar rules

YACC

Parser program

Important points

 YACC mainly generates LALR parsers


 Works with tokens generated by LEX
 Used to build parse tree or syntax structure

Common use

LEX + YACC are commonly used together:

LEX → tokens
YACC → syntax checking

3. Code Generators
A code generator is a tool or compiler phase that produces the target code from the
intermediate representation.

Meaning

It converts:

Intermediate code → machine code / assembly code

Main responsibilities

 Select appropriate machine instructions


 Assign registers
 Produce efficient target code

Output of code generator

 Assembly code
or
 Object code

Importance

 Affects performance of final program


 Ensures correct mapping of operations to hardware

4. Debuggers
A debugger is a tool used to find and fix errors in a program.

Purpose

 Help programmers observe program execution


 Locate logical and runtime errors

Main functions

 Set breakpoints
 Execute program step by step
 Inspect variable values
 Trace program flow

Examples of errors handled

 wrong output
 runtime crashes
 incorrect logic

5. Profilers
A profiler is a performance analysis tool.

Meaning

It measures how a program behaves during execution.

Main objectives

 Find time-consuming functions


 Identify performance bottlenecks
 Measure memory usage

Typical information given by a profiler

 function execution time


 number of function calls
 CPU usage
 memory consumption

Importance

Profilers are mainly used for:

 program optimization
 improving execution speed

2. Symbol Table

2.1 Definition

A symbol table is a data structure used by a compiler or assembler to store information about
identifiers.

2.2 Information Stored in a Symbol Table

For each symbol (identifier), it stores:

 name
 type
 scope
 memory location (address)
 size
 parameter information (for functions)

2.3 Why Symbol Table is Needed

 To check declarations
 To support type checking
 To generate correct addresses
 To support scope handling

2.4 Operations on Symbol Table

Main operations:

 insert(symbol, information)
 lookup(symbol)
 update(symbol)

2.5 Example
Name Type Scope Address
a int local 1000
sum function global 2000

3. Search and Allocation Data Structures


These data structures are used mainly to implement:

 symbol tables
 literal tables
 label tables

3.1 Search Data Structures

Used for fast lookup.

(a) Linear List

 Simple list of symbols.


 Search time is O(n).

(b) Binary Search Tree

 Faster search.
 Average search time O(log n).

(c) Hash Table (Most commonly used)**

 Key → hash function → table index


 Very fast lookup.
 Average time O(1).

3.2 Allocation Data Structures

Used for memory allocation and management.

(a) Stack Allocation

 Used for local variables and parameters.


 Follows LIFO.
(b) Heap Allocation

 Used for dynamic memory.


 Memory can be allocated and freed in any order.

(c) Free List

 Keeps track of free memory blocks.

4. LEX and YACC – Overview

4.1 LEX
LEX is a lexical analyzer generator.

Purpose

 Automatically generates a lexical analyzer.

Input

 Token specifications using regular expressions.

Output

 C program implementing yylex().

Example rule in LEX


[0-9]+ { return NUMBER; }
Role of LEX

 Converts input stream into tokens.


 Passes tokens to the parser.

4.2 YACC
YACC stands for Yet Another Compiler Compiler.

Purpose

 Generates a parser.

Input

 Grammar rules written in BNF-like format.

Output

 C program implementing yyparse().

Example grammar
E : E '+' T
| T
;

Role of YACC

 Performs syntax analysis.


 Builds parse tree.

Relationship between LEX and YACC


Input Program

LEX → produces tokens

YACC → checks grammar and structure

5. Assemblers

5.1 Elements of Assembly Language Programming


Main elements:

(1) Mnemonics

Symbolic names of machine instructions.

Examples:

 MOV
 ADD
 SUB
 JMP

(2) Operands

Registers, memory locations, or constants.

Example:

MOV AX, BX

AX and BX are operands.

(3) Labels

Used to name addresses.

Example:
LOOP1:
ADD AX, BX

(4) Directives (Pseudo-ops)

They guide the assembler but do not generate machine code.

Examples:

 DB
 DW
 EQU
 ORG
 END

(5) Comments

Used for readability.

5.2 Assembler Design


An assembler mainly performs the following functions:

1. Scan source program


2. Maintain symbol table
3. Translate mnemonics into opcodes
4. Assign addresses to labels
5. Produce object code

General structure of an assembler:

Source Program

Lexical processing

Symbol table handling

Instruction translation

Object code generation
5.3 Types of Assemblers

5.3.1 One-Pass Assembler

Definition

A one-pass assembler processes the source program only once.

Problem

Forward references cannot be resolved immediately.

Example:

JMP NEXT
...
NEXT: MOV AX, BX

Technique used

 Backpatching
 Forward reference lists

Advantages

 Faster
 Less memory usage

Disadvantages

 More complex
 Difficult symbol handling

5.3.2 Two-Pass Assembler


Definition

The source program is processed two times.

Pass 1

 Assign addresses
 Build symbol table
 No object code is generated

Pass 2

 Generate object code


 Resolve all symbol references

Advantages

 Simple design
 Easy handling of forward references

Disadvantages

 Requires two scans


 Slightly slower

Comparison

Feature One-Pass Two-Pass


Number of scans 1 2
Forward reference handling Complex Easy
Implementation Difficult Simple
Speed Faster Slower

6. x86 Single-Pass Assembler Algorithm


This is an algorithmic view of a single-pass assembler for x86-like architecture.

Data Structures Used

 SYMTAB – Symbol table


 FWDREF list – list of unresolved references
 LOCCTR – location counter
 OPTAB – opcode table

Algorithm: x86 Single-Pass Assembler

Step 1

Initialize:

 LOCCTR = starting address


 SYMTAB = empty
 FWDREF = empty

Step 2

Read next source statement.

Step 3

If statement contains a label:

 If label is not in SYMTAB:


o Enter label with current LOCCTR
o Resolve any pending forward references for this label
 Else:
o Report duplicate symbol error

Step 4

If statement is an instruction:
 Search mnemonic in OPTAB
 Generate opcode
 For each operand:
o If symbol is already in SYMTAB:
 Use its address
o Else:
 Create an entry in SYMTAB with undefined address
 Add current object code position to FWDREF list

Step 5

If statement is a directive:

 Process directive
 Update LOCCTR accordingly

Step 6

Store generated machine code with placeholder zeros for unresolved addresses.

Step 7

Update LOCCTR by instruction length.

Step 8

Repeat steps 2 to 7 until END statement is found.

Step 9

After end of program:

 If any unresolved forward references remain:


o Report undefined symbol errors

Key idea of Single-Pass x86 assembler


 Address resolution and code generation are done together
 Forward references are handled using:
o lists of incomplete address fields
o backpatching when the label is defined

You might also like