0% found this document useful (0 votes)
5 views13 pages

Unit III - Syntax-Directed Translation: Compiler Design

This document covers Syntax-Directed Translation in compiler design, detailing Syntax-Directed Definitions, Translation Schemes, Intermediate Code Generation, and various representations such as Postfix Notation and Three-Address Code. It explains the evaluation of attributes, the construction of parse and abstract syntax trees, and the translation of assignment and boolean expressions. Additionally, it discusses control flow statements and techniques like backpatching for efficient code generation.

Uploaded by

workkwitharshit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

Unit III - Syntax-Directed Translation: Compiler Design

This document covers Syntax-Directed Translation in compiler design, detailing Syntax-Directed Definitions, Translation Schemes, Intermediate Code Generation, and various representations such as Postfix Notation and Three-Address Code. It explains the evaluation of attributes, the construction of parse and abstract syntax trees, and the translation of assignment and boolean expressions. Additionally, it discusses control flow statements and techniques like backpatching for efficient code generation.

Uploaded by

workkwitharshit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COMPILER DESIGN

Unit III — Syntax-Directed Translation


Bhagwant Institute of Technology, Muzaffarnagar

Topic Coverage
Syntax-Directed Definitions Attributes, synthesized & inherited, dependency graphs
Translation Schemes SDT rules, ordering semantic actions
Intermediate Code Types and purpose of intermediate representations
Postfix Notation Reverse Polish notation, conversion algorithm
Parse & Syntax Trees AST construction, tree representation
Three-Address Code Quadruples, triples, indirect triples
Assignment Statements Translation of expressions, type conversion
Boolean Expressions Short-circuit evaluation, backpatching
Control Flow Statements if-else, while, for — translation to TAC
Postfix & Top-Down Postfix translation, top-down parser integration
Translation
Arrays, Procedures, Array references, procedure calls, declarations & case
Declarations

1. Syntax-Directed Definitions (SDDs)


A Syntax-Directed Definition (SDD) is a generalization of a context-free grammar in which each
grammar symbol has an associated set of attributes, and each production rule has an associated set of
semantic rules that define how to compute attribute values.

1.1 Attributes
• Synthesized Attributes: Computed from the attribute values of the children of the node in the
parse tree. They pass information up the parse tree.
• Inherited Attributes: Computed from the attribute values of the parent and siblings of the node.
They pass information down the parse tree.

💡 Key Point: S-attributed SDDs use only synthesized attributes and are easier to implement —
they can be evaluated bottom-up during LR parsing.
1.2 Annotated Parse Tree
An annotated parse tree is a parse tree where each node is annotated with its attribute values. For S-
attributed SDDs, attributes are computed in a single bottom-up pass.

1.3 Dependency Graphs


A dependency graph shows the dependencies among attribute instances at nodes of a parse tree. An
edge from attribute b to attribute c means c depends on b and must be computed after b.
• If the dependency graph has a cycle, the SDD has no valid evaluation order
• For S-attributed SDDs, the dependency graph always has edges pointing upward (no cycles)
• For L-attributed SDDs, edges go left-to-right or upward — can be evaluated during top-down
parsing

1.4 S-Attributed vs L-Attributed SDDs


Property S-Attributed L-Attributed
Attributes Used Only synthesized Synthesized + Inherited
Evaluation Bottom-up (LR parsing) Depth-first left-to-right (LL parsing)
Dependency Always upward Left-to-right or upward
Direction
Cycles Possible? No No (if properly defined)

2. Syntax-Directed Translation Schemes (SDTs)


An SDT is a context-free grammar with program fragments called semantic actions embedded within
production bodies. Actions are placed at arbitrary positions inside a production body and are executed
at that point during parsing.

2.1 Placement of Semantic Actions


• An action at the right end of a production executes after reducing by that production
• An action at an interior position executes when the symbol to its left is recognized
• For S-attributed SDDs, all actions are at the right end — easy to implement

2.2 Example — Simple Expression SDD


Production Semantic Rule
──────────────────────────────────────────────────────
E → E₁ + T [Link] = E₁.val + [Link]
E → T [Link] = [Link]
T → T₁ * F [Link] = T₁.val * [Link]
T → F [Link] = [Link]
F → ( E ) [Link] = [Link]
F → digit [Link] = [Link]

2.3 Postfix SDT


A postfix SDT places all actions at the right end of productions. It is easy to implement with a bottom-up
parser:
E → E₁ + T { print('+') }
E → T
T → T₁ * F { print('*') }
T → F
F → ( E )
F → id { print([Link]) }

3. Intermediate Code Generation


An intermediate representation (IR) is a program representation between the source language and the
target machine language. The compiler translates the source program into IR before generating target
code.

3.1 Why Use Intermediate Code?


• Machine-independent optimization can be applied to the IR
• Makes the compiler portable: one front-end + multiple back-ends
• Simplifies the compilation process by separating analysis from synthesis
• Easier to generate code for different target architectures

3.2 Types of Intermediate Representations


Type Description Example
Postfix Notation Operators follow operands ab+c*
Syntax Tree / AST Tree representation of program structure Abstract Syntax Tree
Three-Address Code At most 3 operands per instruction t1 = a + b
Quadruples 4-field representation of TAC (+, a, b, t1)
Triples 3-field, result referenced by position (+, a, b) at pos i
Static Single Assignment Each variable assigned exactly once SSA form

4. Postfix Notation (Reverse Polish Notation)


Postfix notation (also called Reverse Polish Notation or RPN) is a way of writing arithmetic expressions
where operators appear after their operands. It eliminates the need for parentheses.

4.1 Infix vs Postfix


Infix Expression Postfix (RPN)
a+b ab+
a+b*c abc*+
(a + b) * c ab+c*
a*b+c*d ab*cd*+
a-b+c ab-c+

4.2 Algorithm — Infix to Postfix (Shunting Yard)


• Scan the infix expression left to right
• If the token is an operand: output it
• If the token is '(': push to stack
• If the token is ')': pop and output until '(' is found, then discard '('
• If the token is an operator op:
◦ While stack top has operator with >= precedence to op: pop and output
◦ Push op onto stack
• After scanning: pop all remaining operators and output

4.3 Evaluating Postfix Using a Stack


• Scan postfix expression left to right
• If operand: push onto stack
• If operator: pop two operands, apply operator, push result
• Final result is the single value remaining on stack

Example: Evaluate 3 4 2 * + 1 -

Token Stack
3 [3]
4 [3, 4]
2 [3, 4, 2]
* [3, 8] ← pop 4 and 2, push 4*2=8
+ [11] ← pop 3 and 8, push 3+8=11
1 [11, 1]
- [10] ← pop 11 and 1, push 11-1=10

Result: 10
5. Parse Trees and Syntax Trees

5.1 Parse Tree (Concrete Syntax Tree)


A parse tree is a graphical representation of the derivation of a string. Every node corresponds to a
grammar symbol (terminal or non-terminal). Interior nodes are non-terminals; leaves are terminals.
• Captures the exact syntactic structure of the input
• One-to-one correspondence with grammar productions
• Usually too detailed for code generation purposes

5.2 Abstract Syntax Tree (AST)


An AST is a condensed form of the parse tree that omits unnecessary punctuation and intermediate
nodes. Only semantically meaningful nodes are kept.
Parse Tree AST
Has all grammar symbols Only semantically meaningful nodes
Includes punctuation nodes (parentheses etc.) Punctuation omitted
One node per grammar symbol Operators are internal nodes, operands are leaves
Larger, more verbose Compact and efficient

5.3 Constructing an AST for a + b * c


The AST for a + b * c looks like:
+
/ \
a *
/ \
b c

The * node is lower (higher precedence), a and b*c are children of +.

5.4 DAG (Directed Acyclic Graph) Representation


A DAG is like an AST but subexpressions that appear more than once are shared — represented by a
single node with multiple parents. This eliminates redundant computation.
Expression: a + a * (b - c) + (b - c) * (b - c)

In an AST, (b-c) appears three times — three separate subtrees.


In a DAG, (b-c) appears once — shared node with multiple parents.

6. Three-Address Code (TAC)


Three-address code (TAC) is an intermediate representation in which each instruction has at most
three operands. Instructions are of the form: x = y op z or x = op y

6.1 Types of Three-Address Instructions


Instruction Type Form Example
Assignment (binary) x = y op z t1 = a + b
Assignment (unary) x = op y t1 = -a
Copy x=y t1 = a
Unconditional jump goto L goto L3
Conditional jump if x relop y goto L if a < b goto L2
Indexed copy x = y[i] or x[i] = y t1 = a[i]
Procedure call param x; call p, n param a; call f, 2
Return return y return t1
Address / Pointer x = &y; x = *y; *x = y t1 = &a

6.2 TAC for Expression a + b * c


t1 = b * c
t2 = a + t1

6.3 Quadruples
Quadruples store TAC as 4-field records: (operator, arg1, arg2, result)
Pos Op Arg1 Arg2 Result
(0) * b c t1
(1) + a t1 t2
(2) uminus t2 — t3
(3) = t3 — x

6.4 Triples
Triples use 3-field records: (operator, arg1, arg2). Results are not stored explicitly; they are referenced
by their position number.
Pos Op Arg1 Arg2
(0) * b c
(1) + a (0)
(2) uminus (1) —
Pos Op Arg1 Arg2
(3) = x (2)

6.5 Indirect Triples


Indirect triples add a pointer array that lists the triples in order. Optimizations can reorder the pointer
array without physically moving the triple records.
Quadruples Triples
Explicit result field — easy to optimize No result field — compact
Temp variables take symbol table space References by position number
Used in most modern compilers Efficient but harder to reorder

7. Translation of Assignment Statements


Assignment statement translation involves generating TAC for expressions. A function newtemp()
creates a new temporary variable. The attribute [Link] holds the name of the variable that holds the
value of E.

7.1 SDT for Assignment Statements


S → id = E { gen([Link] '=' [Link]) }

E → E₁ + E₂ { [Link] = newtemp();
gen([Link] '=' E₁.place '+' E₂.place) }

E → E₁ * E₂ { [Link] = newtemp();
gen([Link] '=' E₁.place '*' E₂.place) }

E → -E₁ { [Link] = newtemp();


gen([Link] '=' 'uminus' E₁.place) }

E → (E₁) { [Link] = E₁.place }

E → id { [Link] = [Link] }

7.2 Type Conversion (Widening & Narrowing)


When operands in an expression are of different types, type coercion must be applied. Widening (e.g.,
int → float) is automatic; narrowing requires explicit casting.
Example: x = a + b where a is int and b is float

t1 = (float) a ← widen a to float


t2 = t1 + b ← now both are float
x = t2

8. Translation of Boolean Expressions


Boolean expressions are used in conditional and loop statements. They can be translated in two ways:
• Numerical Method: Evaluate Boolean expression as an integer (0 or 1) and use in conditional
test.
• Short-Circuit (Control Flow) Method: Translate Boolean expressions directly into jumps —
the expression is not fully evaluated if the result can be determined early.

8.1 Short-Circuit Evaluation


• A and B: if A is false, B is not evaluated
• A or B: if A is true, B is not evaluated
• This is more efficient and avoids side-effect problems

8.2 Backpatching
Backpatching is a technique to fill in jump targets (labels) in one pass. Instructions with unresolved
jump targets are created with a placeholder. Once the target is known, the placeholder is filled in
(patched).

Key Functions
• makelist(i): Creates a new list containing only the instruction number i.
• merge(p1, p2): Concatenates two lists p1 and p2 into a single list.
• backpatch(p, i): Sets the target label of all instructions in list p to instruction i.

8.3 Attributes for Backpatching


• [Link]: List of instructions that need to be backpatched with the target jumped to when E is
true.
• [Link]: List of instructions that need to be backpatched with the target jumped to when E is
false.

8.4 TAC for Boolean Expression Example


Expression: if (a < b or c < d) and e < f
100: if a < b goto _ (truelist of E1: {100})
101: goto _ (falselist of E1: {101})
102: if c < d goto _ (truelist of E2: {102})
103: goto _ (falselist of E2: {103})
104: if e < f goto _ (truelist of E3: {104})
105: goto _ (falselist of E3: {105})
After backpatching:
- merge(truelist of E1, truelist of E2) = {100, 102}
- backpatch({100, 102}, 104) → go check e < f
- backpatch({101}, 103 next) → false path

9. Statements that Alter the Flow of Control


Control flow statements (if-else, while, for, etc.) are translated into three-address code using conditional
and unconditional jumps.

9.1 Translation of if-else


Source: TAC:
if (E) S1 else S2 <code for E with [Link] and [Link]>
[Link]:
<code for S1>
goto next
[Link]:
<code for S2>
next:

9.2 Translation of while


Source: TAC:
while (E) S begin:
<code for E>
if [Link] goto next
[Link]:
<code for S>
goto begin
next:

9.3 Translation of for loop


Source: TAC:
for (init; E; step) S <code for init>
begin:
<code for E>
if [Link] goto next
<code for S>
<code for step>
goto begin
next:

9.4 SDT for if-else (with Backpatching)


S → if (E) S₁ else S₂ {
backpatch([Link], nextinstr); // S₁ starts here
backpatch([Link], [Link]); // S₂ starts at M
[Link] = merge(S₁.nextlist, S₂.nextlist);
}

10. Postfix Translation & Translation with Top-Down Parser

10.1 Postfix Translation


Postfix translation generates postfix code directly during parsing. Semantic actions are embedded in
the grammar after each production. Since actions execute after all symbols of a production are
matched, output comes out in postfix order automatically.
Grammar with embedded actions:

E → T { A }
A → + T { print('+') } A | ε
T → F { B }
B → * F { print('*') } B | ε
F → ( E ) | id { print([Link]) }

10.2 Translation with a Top-Down Parser


In a top-down (predictive) parser, inherited attributes are used to pass information down to children.
Actions are embedded within predictive parsing routines.
• Each non-terminal is a procedure
• Inherited attributes are passed as parameters to these procedures
• Synthesized attributes are returned as function values
• Actions at interior positions of productions must be carefully ordered to respect the left-to-right
evaluation
💡 Key Point: For L-attributed SDDs, semantic actions in a top-down parser are executed in the
order the symbols are processed — left-to-right depth-first.

11. More About Translation

11.1 Array References in Arithmetic Expressions


Array elements require address computation. For a one-dimensional array A with base address base
and element width w:
Address of A[i] = base + i * w

TAC: t1 = i * w
t2 = base + t1
t3 = *t2 ← contents at address t2

For a two-dimensional array A[m][n] stored in row-major order:


Address of A[i][j] = base + (i * n + j) * w

TAC: t1 = i * n
t2 = t1 + j
t3 = t2 * w
t4 = base + t3
t5 = *t4

11.2 Procedure Calls


Procedure calls are translated using param and call instructions in TAC:
Source: p(a, b, c) TAC:
param a
param b
param c
call p, 3 ← 3 = number of parameters

The calling sequence involves:


• Evaluating actual parameters
• Pushing parameters onto the stack (or into registers)
• Saving the return address
• Transferring control to the callee
• On return: restoring saved registers, popping parameters, resuming

11.3 Declarations
During translation, variable declarations cause entries to be added to the symbol table. The key
attributes recorded are:
• Type: The data type (int, float, char, pointer, array, record, etc.)
• Width: The number of bytes required to store the variable
• Offset: The relative address within the current activation record

SDT for declarations:

D → T id ; { addtype([Link], [Link]);
offset = offset + [Link] }

T → int { [Link] = integer; [Link] = 4 }


T → float { [Link] = float; [Link] = 8 }
T → T₁ [n] { [Link] = array(n, T₁.type);
[Link] = n * T₁.width }

11.4 Case (Switch) Statements


A switch statement can be translated in several ways depending on the number of cases:
• Sequential Test: Generate a sequence of if-goto instructions checking each case value. Simple
but O(n) comparisons.
• Jump Table: Create an array of labels indexed by the case value. O(1) lookup but requires
dense values.
• Hash Table: Use hashing for sparse case values with many options.

switch (E) { TAC (sequential test):


case v1: S1 t = E
case v2: S2 if t == v1 goto L1
default: Sd if t == v2 goto L2
} goto Ldefault
L1: <S1>; goto next
L2: <S2>; goto next
Ldefault: <Sd>
next:

12. Quick Revision — Key Points for Exams


Concept One-Line Summary
SDD Grammar + attributes + semantic rules to compute them
Synthesized Attribute Computed from children — flows up the parse tree
Inherited Attribute Computed from parent/siblings — flows down or left-to-right
S-Attributed SDD Only synthesized attributes; evaluated bottom-up with LR parser
L-Attributed SDD Synthesized + inherited; evaluated depth-first left-to-right
SDT SDD with actions embedded in grammar productions
Postfix Notation Operators after operands; no parentheses needed
AST Condensed parse tree; operators are interior nodes
DAG AST with shared nodes for common subexpressions
Three-Address Code At most 3 operands per instruction; uses temporaries
Quadruples (op, arg1, arg2, result) — explicit result field
Triples (op, arg1, arg2) — result referenced by position number
Backpatching Fill in jump targets retroactively using truelist/falselist
Short-Circuit Eval Boolean ops using jumps; skip evaluation when result known
Concept One-Line Summary
Array Address base + index * width (1D); base + (i*n+j)*w (2D row-major)
Procedure Call TAC param a; param b; call p, n — n = number of params
Case Statement Sequential test (small n) or jump table (dense values)

BIT Muzaffarnagar | Compiler Design | Unit III Notes | Prepared for Academic Use

You might also like