0% found this document useful (0 votes)
4 views17 pages

Compiler Design Notes

The document covers code generation, runtime storage management, and syntax-directed translation in programming languages. It discusses the design issues of code generators, the structure of target machines, and various intermediate code representations. Additionally, it explains the translation of assignment statements, boolean expressions, control statements, and array references in arithmetic expressions.

Uploaded by

first.nigam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views17 pages

Compiler Design Notes

The document covers code generation, runtime storage management, and syntax-directed translation in programming languages. It discusses the design issues of code generators, the structure of target machines, and various intermediate code representations. Additionally, it explains the translation of assignment statements, boolean expressions, control statements, and array references in arithmetic expressions.

Uploaded by

first.nigam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT 1: CODE GENERATION

1.1 Issues in the Design of a Code Generator


A code generator converts Intermediate Representation (IR) into target machine code. Key design
issues:

Issue Description
Input to Code Generator Syntactically & semantically correct IR + symbol table.
Target Program Absolute code, relocatable code, or assembly output.
Memory Management Map names in IR to runtime addresses using symbol table.
Instruction Selection Choose best machine instruction for each IR construct.
Register Allocation Decide which values live in registers. NP-complete in general.
Evaluation Order Order of computation affects register usage and efficiency.
Correctness Output must be semantically equivalent to the source program.

1.2 The Target Machine


A byte-addressable machine with general-purpose registers R0…Rn-1. Key addressing modes:

Mode Form Meaning


Register LD Ri, Rj Ri = Rj
Indexed LD Ri, a(Rj) Ri = Memory[a + Rj]
Indirect LD Ri, *Rj Ri = Memory[Memory[Rj]]
Immediate LD Ri, #c Ri = constant c
Memory LD Ri, M Ri = Memory[M]

Instruction Cost
Cost = 1 + (addressing mode cost of source) + (cost of destination). Register mode cost = 0.
Memory/indexed = 1. Immediate = 1.

1.3 Runtime Storage Management


Memory layout at runtime:

Segment Contents
Code Segment Machine instructions (static, read-only)
Static Data Global/static variables with fixed compile-time addresses
Segment Contents
Heap Dynamically allocated data (malloc/new) — grows upward
Stack Activation records for procedure calls — grows downward

Activation Record (Stack Frame)


Fields in a typical activation record (from high to low addresses):

Field Purpose
Return Value Space for the function's return value
Actual Parameters Arguments passed by the caller
Control Link (Dynamic) Pointer to caller's activation record
Access Link (Static) Pointer for non-local variable access (block-structured langs)
Saved Machine Status Return address, saved registers
Local Variables Space for procedure-local variables
Temporaries Space for intermediate computed values

1.4 Basic Blocks and Flow Graphs

Basic Block A maximal sequence of three-address instructions with exactly ONE entry
point (first instruction) and ONE exit point (last instruction). No branches into
the middle.

Algorithm to Find Basic Blocks


Step 1 — Identify Leaders (first instruction of each block):
• The first instruction of the program is a leader.
• Any target of a conditional or unconditional jump is a leader.
• Any instruction immediately following a conditional/unconditional jump is a leader.

Step 2 — Each basic block = from one leader up to (not including) the next leader.

Flow Graph
Directed graph where nodes = basic blocks, edges = possible flow of control:
• Edge B1 -> B2 if execution can go from B1 to B2 (jump or fall-through).
• Add a special ENTRY node and EXIT node.

Exam Tip — Drawing a Flow Graph


(1) Find all leaders, (2) Define blocks, (3) Add edges for sequential fall-through and jumps. Label
blocks B1, B2, B3…
1.5 Next-Use Information

Live Variable A variable x is live at point p if there is a path from p to a use of x with no
intervening assignment to x.

Computing Next-Use — Backward Scan


Scan the basic block from LAST to FIRST instruction:
• Initially: all variables are dead (no next use).
• For instruction i of the form x = y op z:
- Attach current next-use/liveness info for x, y, z to instruction i.
- Set x to dead (overwritten). Set y and z to live with next-use = i.

Backward scan example for basic block:


(1) t = a + b
(2) u = a - b
(3) v = t + u
(4) a = d
(5) d = v + u

Scanning from (5) to (1):


After (5): live={v,u} After (4): live={d,v,u}
After (3): live={t,u,d} After (2): live={a,b,t,d}
After (1): live={a,b,d}

1.6 A Simple Code Generator


Uses two descriptors maintained during block traversal:

Descriptor Tracks
Register Descriptor For each register: which variable's current value is stored there.
Address Descriptor For each variable: where its current value can be found (register,
memory, or both).

getReg Function
• If x already in register R: return R.
• If empty register available: return it.
• Else: choose register R holding variable y. If y is dead or also in memory, use R. Otherwise spill y
to memory first.

Code Generation for x = y op z


L = getReg(x, y, z)
if y is not in L: emit LD L, y' // load y to L
emit OP L, L, z' // z' = register or mem address of z
Update descriptors: L holds x; x's addr = L

1.7 DAG Representation of Basic Blocks

DAG Directed Acyclic Graph for a basic block. Leaves = input names/constants.
Interior nodes = operators. Each node can have MULTIPLE labels (variable
aliases).

Uses of DAG
• Common Subexpression Elimination (CSE): if node already exists for (op, n1, n2), reuse it.
• Identify variables dead on exit: nodes with no live labels can be skipped.
• Algebraic simplification: a+0=a, a*1=a detected at construction time.
• Dead code elimination: computations with no live-on-exit variable labels are not generated.

Solved Example
t1 = a + b; t2 = a + b; t3 = t1 * t2 DAG has ONE (+, a, b) node labeled {t1, t2}. t3 node = (*,
that_node, that_node). Result: only ONE addition generated — t3 = t1 * t1.

1.8 Peephole Optimization


Examine a small sliding window of instructions and replace with a more efficient sequence:

Optimization Before After


Redundant Load/Store LD R,a; ST a,R LD R,a (store is redundant)
Null Jump GOTO L; L: Delete the jump
Jump Chaining GOTO L1; L1: GOTO L2 GOTO L2
Dead Code GOTO L; x=5 (no label on x=5) Delete x=5
Strength Reduction x=x*2 x = x << 1
Algebraic Simplify x=x+0 Delete
Machine Idioms ADD R,R,#1 INC R (if available)
UNIT 2: SYNTAX DIRECTED TRANSLATION

2.1 Syntax Directed Translation Schemes


In SDT, each grammar production has semantic rules (or actions). Two attribute types:

Attribute Type Definition Computation Direction


Synthesized Computed from children's attributes Bottom-up (from leaves to
root)
Inherited Computed from parent or sibling attributes Top-down (from root to
leaves)

SDD Type Definition Compatible Parser


S-attributed Only synthesized attributes LR (bottom-up) parsers
L-attributed Inherited attrs depend only on left siblings/parent LL (top-down) parsers

SDD vs SDT
SDD = grammar + semantic rules (no order specified). SDT = grammar + embedded program
fragments (actions) at specific positions in productions.

2.2 Implementation of Syntax Directed Translators


For S-attributed SDDs: augment LR parser stack to hold attribute values. Compute attributes on
reduction. Stack holds (symbol, attribute) pairs.

For L-attributed SDDs: use recursive-descent parser. Inherited attributes = function parameters;
synthesized attributes = return values. Actions embedded directly in parsing procedures.

2.3 Intermediate Code Overview

IC Form Description Key Property


Postfix Notation Operators follow operands No parentheses needed; stack eval
Syntax Tree (AST) Hierarchical tree of operators & Omits grammar artifacts; compact
operands
Three-Address Code At most 1 op on RHS; ≤3 operands Most common; easy to optimize
Quadruples (op, arg1, arg2, result) records Easy reordering for optimization
Triples (op, arg1, arg2) — result = position Compact; hard to reorder
Indirect Triples Triples + separate pointer array Allows reordering; good for
optimizing compilers
2.4 Postfix Notation

Postfix (RPN) Operators come AFTER their operands. No parentheses needed. Evaluated
left-to-right using a stack.

Infix Postfix
a+b ab+
(a + b) * c ab+c*
a+b*c abc*+
(a - b) / (c + d) ab-cd+/
a*b+c*d ab*cd*+

Stack-Based Postfix Evaluation


• Scan left to right.
• Push operand: push onto stack.
• On operator: pop top two, apply op, push result.
• Final top-of-stack = answer.

2.5 Parse Trees vs Syntax Trees

Parse Tree Abstract Syntax Tree (AST)


Shows every grammar symbol Omits redundant nodes (parentheses, grammar
artifacts)
Interior nodes = non-terminals Interior nodes = operators
Leaves = terminals Leaves = operands (identifiers, constants)
Represents full derivation Represents semantic structure only
Larger, more verbose Smaller, more compact

Example — a = b + c * d
Parse tree: E -> E + T -> E + T * F -> ... AST: (=, a, (+, b, (*, c, d))) The AST drops E, T, F nodes;
keeps only meaningful operators.

2.6 Three-Address Code (TAC)


General form: x = y op z. At most ONE operator on the RHS.
TAC Instruction Example
Binary op t1 = a + b
Unary op t1 = -a
Copy x=y
Unconditional jump goto L
Conditional jump if x < y goto L
Indexed assignment (read) x = a[i]
Indexed assignment (write) a[i] = x
Address / pointer x = &y | x = *y | *x = y
Param + call param a1; param a2; call p, 2
Return return x

2.7 Quadruples vs Triples

Quadruples — (op, arg1, arg2, result)


For x = -(a + b * c):

Op Arg1 Arg2 Result


* b c t1
+ a t1 t2
uminus t2 t3
= t3 x

Triples — (op, arg1, arg2); result = index


Index Op Arg1 Arg2
(0) * b c
(1) + a (0)
(2) uminus (1)
(3) = x (2)

Quadruples Triples
Result Explicit name in field Implicit (position number)
Reordering Easy (just change result name) Hard (must update all references)
Memory More (4 fields) Less (3 fields)
Used in Most optimizing compilers Indirect triples for optimization
UNIT 3: SPECIFIC TRANSLATIONS

3.1 Translation of Assignment Statements


SDT grammar with semantic actions for expression TAC generation:

S -> id = E { gen([Link] '=' [Link]); }


E -> E1 + E2 { t = newtemp(); gen(t '=' [Link] '+' [Link]);
[Link]=t; }
E -> E1 * E2 { t = newtemp(); gen(t '=' [Link] '*' [Link]);
[Link]=t; }
E -> - E1 { t = newtemp(); gen(t '=' 'uminus' [Link]); [Link]=t; }
E -> ( E1 ) { [Link] = [Link]; }
E -> id { [Link] = [Link]; }

Example: x = a + b * c
t1 = b * c
t2 = a + t1
x = t2

3.2 Boolean Expressions

Method 1: Numerical Representation (TRUE=1, FALSE=0)


Boolean ops treated like arithmetic. Result stored in a temporary.
E -> E1 or E2: t=newtemp(); gen(t = [Link] 'or' [Link]); [Link]=t
E -> E1 and E2: t=newtemp(); gen(t = [Link] 'and' [Link]); [Link]=t
E -> not E1: t=newtemp(); gen(t = 'not' [Link]); [Link]=t
E -> id1 relop id2: t=newtemp();
gen('if' [Link] relop [Link] 'goto' next+3)
gen(t '=' '0'); gen('goto' next+2); gen(t '=' '1')

Method 2: Short-Circuit / Flow of Control (More Efficient)


Boolean value implicit in which branch is taken. Each boolean expression E has inherited attributes:
• [Link] = label to jump to when E is TRUE
• [Link] = label to jump to when E is FALSE

E -> E1 or E2:
[Link] = [Link]; [Link] = newlabel();
[Link] = [Link]; [Link] = [Link]
CODE: [E1 code] label([Link]): [E2 code]

E -> E1 and E2:


[Link] = newlabel(); [Link] = [Link];
[Link] = [Link]; [Link] = [Link]
CODE: [E1 code] label([Link]): [E2 code]

E -> not E1:


[Link] = [Link]; [Link] = [Link]

E -> id1 relop id2:


gen('if' id1 relop id2 'goto' [Link])
gen('goto' [Link])

3.3 Control Statements

If-Then-Else
S -> if E then S1 else S2
[Link] = newlabel() // L1
[Link] = newlabel() // L2
[Link] = [Link]
[Link] = [Link]

Generated code:
[E code]
L1: [S1 code]
goto [Link]
L2: [S2 code]

While Loop
S -> while E do S1
begin = newlabel()
[Link] = newlabel()
[Link] = [Link]
[Link] = begin

Generated code:
begin: [E code]
[Link]: [S1 code]
goto begin
[Link]/[Link]:

Repeat-Until
S -> repeat S1 until E
begin = newlabel()
[Link] = begin
[Link] = [Link]

Code: begin: [S1 code] [E code]


(loop if E is false; exit when E is true)
3.4 Postfix Translation
In top-down (LL) parsing, all actions occur AFTER children are parsed, producing a natural postfix
ordering. Inherited attributes pass information downward; synthesized attributes carry results upward.

Marker Nonterminal An empty production M -> epsilon inserted mid-rule to force an action at a
specific point during LL parsing. Used to embed 'mid-rule actions'.

3.5 Translation with a Top-Down Parser


Recursive descent parsers embed semantic actions directly in parsing procedures:

function E() -> place:


t1 = T() // parse first term
while lookahead == '+' or '-':
op = lookahead; match(op)
t2 = T()
t = newtemp()
emit(t '=' t1 op t2)
t1 = t
return t1

3.6 Array References in Arithmetic Expressions

One-Dimensional Arrays
For array A with element width w: addr(A[i]) = base(A) + i * w

Two-Dimensional Arrays (Row-Major — C convention)


For A[r][c] with c columns per row, element width w:
addr(A[i][j]) = base(A) + (i * c + j) * w

TAC for x = A[i][j]: (c = number of columns, w = element width)


t1 = i * c
t2 = t1 + j
t3 = t2 * w
x = A[t3] // indexed read

TAC for A[i][j] = x:


t1 = i * c; t2 = t1 + j; t3 = t2 * w
A[t3] = x // indexed write

Column-Major (Fortran convention)


addr(A[i][j]) = base(A) + (j * r + i) * w where r = number of rows.
3.7 Procedure Calls
Calling convention using param and call:

Source: result = p(a1, a2, a3)

TAC:
param a1
param a2
param a3
call p, 3 // 3 = number of arguments

If function returns value:


param a1; param a2
t = call f, 2
x = t

The callee's activation record is set up by the CALLEE (in most conventions). The caller pushes
params in order. Return address is saved in the AR.

3.8 Declarations
During declaration parsing, build the symbol table. Track offset for each variable:

D ->
T id { enter([Link], [Link], offset); offset += [Link]; }
D ->
D ; D
T ->
int { [Link] = integer; [Link] = 4; }
T ->
float { [Link] = real; [Link] = 8; }
T ->
array [num] of T1
{ [Link] = array([Link], [Link]); [Link] = [Link] * [Link]; }
T -> ^ T1 { [Link] = pointer([Link]); [Link] = 4; }

For nested scopes: push/pop scope table on enter/exit block.

3.9 Case (Switch) Statement Translation


Strategies based on number and density of cases:

Strategy When to Use Lookup Time


Series of conditional Few cases or sparse values O(n)
jumps
Jump table (indexed array Dense, contiguous case values O(1)
of labels)
Hash table Many sparse cases O(1) avg
Binary search on sorted Moderate cases, sorted O(log n)
labels
Jump table for: switch(i) { case 1: S1; case 2: S2; default: Sd }

t = i
if t < 1 goto Ldefault
if t > 2 goto Ldefault
goto table[t] // table[1]=L1, table[2]=L2
L1: [S1 code]; goto Lnext
L2: [S2 code]; goto Lnext
Ldefault: [Sd code]
Lnext:
UNIT 4: SYMBOL TABLES & RUNTIME ADMINISTRATION

4.1 Data Structures for Symbol Tables


A symbol table maps identifiers to their attributes (type, scope, offset, etc.).

Data Structure Lookup Pros / Cons


Linear List O(n) Simple. Fine for small tables only.
Hash Table O(1) avg Best in practice. Collision handling needed (chaining or
open addressing).
Binary Search Tree O(log n) Supports ordered traversal. Useful for ordered output.
Linked List of Tables O(n) per scope Naturally models nested scopes (one table per scope).

Symbol Table Entry Fields


Field Contents
name Identifier string (or index into string table)
type int, float, array, pointer, function signature, etc.
scope / level Block nesting level or scope number
offset Byte offset within the activation record / data segment
other Param count (for functions), array dimensions, line number, etc.

4.2 Representing Scope Information

Scope The region of program text where a declaration is visible. In block-structured


languages scopes nest. Inner scope shadows outer.

Most-Closely-Nested Rule
For a reference to x: use the declaration of x in the innermost enclosing scope.

Implementation: Stack of Symbol Tables


• Enter block (begin { }): PUSH a new empty hash table onto the scope stack.
• Declare name x: INSERT into the TOP table.
• Lookup name x: SEARCH from TOP down to bottom; return first match.
• Exit block (end { }): POP the top table; those declarations go out of scope.

Block-Structured Languages
C, Pascal, Java — each { } is a new scope. The scope stack models this naturally: top of stack =
current block, bottom = global scope.

Display Array (for Nested Procedures)


display[i] = pointer to the most recent AR at static nesting level i. Enables O(1) access to variables in
any enclosing scope — no need to traverse static links at runtime.

4.3 Run Time Administration — Overview

Strategy Characteristics Used For


Static Allocation Addresses fixed at compile time. No stack Global vars, constants, simple
needed. languages (FORTRAN-style)
Stack Allocation AR pushed/popped on call/return. Most procedural languages: C,
Supports recursion. Pascal, Java methods
Heap Allocation Explicit alloc/free (malloc/new). Flexible Dynamically created objects,
lifetimes. linked lists, closures

4.4 Implementation of Simple Stack Allocation


For languages WITHOUT nested procedures (e.g., C with no closures):

Event Stack Action


Program start Initialize stack pointer SP to base of stack area.
Procedure call Save return address. Advance SP by size of callee's AR.
Access local var x M[SP + offset(x)] — offset known at compile time.
Access parameter a M[SP + param_offset(a)] — offset in caller's AR.
Return from procedure Restore SP to value before call. Jump to return address.

Stack layout at runtime (stack grows downward):

High addresses:
+-------------------------+
| AR for main() | <- SP at program start
| return addr, locals |
+-------------------------+
| AR for f() | <- SP after calling f()
| params, locals, temps |
+-------------------------+
| AR for g() | <- SP after calling g() from f()
| ... |
+-------------------------+
Low addresses (stack grows down)
4.5 Storage Allocation in Block-Structured Languages
In block-structured languages (Pascal, Algol, nested functions), procedures can access non-local
variables. Requires two pointers in each AR:

Link Points To Purpose


Dynamic Link (Control CALLER's activation record Restore SP/environment on
Link) return
Static Link (Access Statically ENCLOSING procedure's AR Access non-local variables at
Link) compile-time-known nesting
depth

Accessing Non-Local Variables


To access a variable declared at nesting level k, from code at nesting level m (m > k): follow (m - k)
static links. The offset within that AR is known at compile time.

Example nesting:
procedure A (level 1) declares x
procedure B (level 2) declares y
procedure C (level 3) accesses x from A

From C, to access x in A:
C_AR --static link--> B_AR --static link--> A_AR --> x at fixed offset
(2 static link hops: m - k = 3 - 1 = 2)

Heap Allocation in Block-Structured Languages


If a procedure returns a pointer to a local variable, that variable cannot be on the stack (dangling
pointer). It must be allocated on the HEAP instead. Modern languages (Java, Go) use escape
analysis to decide stack vs heap.

QUICK REVISION SHEET

CODE GENERATION (Unit 1)


Issues: instruction selection, register alloc (NP-complete), eval order, correctness DAG: each node =
unique value; multiple labels = aliases; reuse = CSE Basic block: max sequence, single entry + single
exit; flow graph = BBs as nodes Next-use: backward scan; attach live/next-use info at each instruction
getReg: prefer variable already in register; else empty; else spill dead/memory-backed var Peephole:
redundant LD/ST, null jumps, jump chaining, dead code, strength reduction

INTERMEDIATE CODE & TRANSLATION (Units 2–3)


Postfix: ops after operands; a+b*c = a b c * + TAC: max 1 op on RHS; Quad=(op,a1,a2,res) easy
reorder; Triple=(op,a1,a2) pos=result SDT: S-attributed=synthesized only (LR parser); L-
attributed=left-to-right (LL parser) Boolean: numerical (0/1) OR short-circuit ([Link]/[Link] labels —
no actual value) Control: if-then-else and while use [Link], [Link], begin, [Link] labels Array:
addr(A[i])=base+i*w; addr(A[i][j])=base+(i*c+j)*w (row-major) Proc call: param a1; param a2; call p, n
— callee sets up own AR Switch: jump table for dense cases O(1); cond jumps for sparse O(n)

SYMBOL TABLE & RUNTIME (Unit 4)


Hash table: O(1) avg lookup — standard choice in production compilers Scope: push new table on
block entry; insert in TOP; lookup from TOP down; pop on exit AR fields: return val | params |
dynamic link | static link | saved status | locals | temps Dynamic link: restores caller's SP on return
Static link: accesses non-local vars; follow (m-k) hops for var at level k from level m Display:
display[i]=ptr to level-i AR; enables O(1) non-local access without link traversal

You might also like