0% found this document useful (0 votes)
11 views23 pages

Unit Iii

The document discusses Syntax Directed Translation (SDT), covering its definitions, evaluation orders, and applications in compiler design. It explains the construction of syntax trees, types of attributes (synthesized and inherited), and methods for implementing L-attributed SDDs, including recursive descent parsing and LL parsers. Additionally, it details intermediate code generation and the use of Directed Acyclic Graphs (DAGs) for optimizing expressions in compilers.

Uploaded by

syamsai0530
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

Unit Iii

The document discusses Syntax Directed Translation (SDT), covering its definitions, evaluation orders, and applications in compiler design. It explains the construction of syntax trees, types of attributes (synthesized and inherited), and methods for implementing L-attributed SDDs, including recursive descent parsing and LL parsers. Additionally, it details intermediate code generation and the use of Directed Acyclic Graphs (DAGs) for optimizing expressions in compilers.

Uploaded by

syamsai0530
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT III

Syntax Directed Translation


Syntax Directed Translation: Syntax-Directed Definitions, Evaluation Orders for SDD’s, Applications of
Syntax Directed Translation, Syntax-Directed Translation Schemes, Implementing L-Attributed SDD’s.

Intermediate Code Generation: Variants of Syntax Trees, Three Address Code, Types and Declarations,
Translation of Expressions, Type Checking, Control Flow, Backpatching, Intermediate Code for
Procedures.

Directed Definitions, Evaluation Orders for SDD’s :


By connecting semantic rules to grammar productions, syntax-directed definition determines the values
of attributes. It resembles an abstract specification in certain ways.
SDT (syntax-directed translation) adds 'semantics rules' or actions to CFG products. As a result, we may
refer to the grammar that has been augmented as attributed grammar.
The 'syntax analysis phase' and the 'semantic analysis phase' are intertwined in syntax-directed
translation. To begin the syntax-directed translation, you must first create a 'parse tree.' Following that,
apply the production's semantic rules. And then assess the property at the parse tree's nodes.
An annotated parse tree is a parse tree that has the evaluated attribute value. 'L-attributed translations'
and 'S-attributed translations' are the two types of syntax-directed translation.
Syntax Directed Definition
A CFG with attributes and rules is called a syntax-directed definition (SDD). The properties are linked to
the grammar symbols in an extended CFG (i.e. nodes of the parse tree). The rules are associated with
grammar production.
A grammar symbol's attribute can now be integers, types, table references, or a string. Let us go through
two kinds of qualities in more detail:
Inherited and Synthesised Attribute
For a non-terminal symbol (node) N, we can declare the value of a synthesized attribute:
 The attribute value of children.
 The total attribute values of N.
For a non-terminal symbol (node) N, we can declare the value of an inherited attribute:
 N's parent's attribute values.
 N's Siblings' attribute values.
 Total attribute values of N.
Evaluation of an SDD at Node of the Parse Tree
To determine the value of attributes at nodes in a parse tree, do the following:
1. We must first construct a parse tree.
2. Then we must use SDD rules to assess the values of all the nodes in the parse tree's attributes.
3. An annotated parse tree is formed via this improvised parse tree.
How to Construct Annotated Parse Tree?
As the name implies, the annotated parse tree displays the value(s) of all node attribute(s). So, before
we analyze a node's attribute value, we must first evaluate all the elements that determine its value.
 To evaluate a node's synthesized attribute, we must parse the tree from the bottom up. Its value is
determined by the value of the child attributes of the concerned node and the node itself.
 To determine a node's inherited attribute, we must parse the tree from the top down. Its value is
determined by its parent, siblings, and the node itself.
 Synthesized and inherited attributes may coexist at some nodes in a parse tree. In this scenario,
we're not sure if there is even a single order in which the nodes' characteristics may be evaluated.
A dependency tree can even determine the order in which the characteristics are examined
Features of Syntax Directed Definition
Features of Syntax Directed Definition are the following:
 An abstract statement of this kind is called Syntax Directed Definition (SDD).
 It is a context-free grammar with attributes and rules that are jointly linked to productions and
grammar symbols, respectively.
 The symbols used in grammar are connected to attributes. The production of grammar is connected
to rules.
 Definitions that are S-attributed and L-attributed are the two categories of Syntax Directed
Definitions.
Types of Syntax Directed Definitions
S-attributed translation and L-attributed translation are two types of syntax-directed translation.
S-attributed Translation
If the node's attributes are synthesised attributes, the SDD is S-attributed. For evaluation of an S-
attributed SDD, the nodes of the parse tree can be traversed in any bottom-up sequence.
The characteristics in S-attributed SDD are evaluated via postorder traversal. When the traverser crosses
node N for the last time in postorder traversal, node N is evaluated.
L-attributed Translation
If the attributes of nodes are synthesized or inherited, an SDD is L-attributed. The parse tree can now be
traversed exclusively from left to right. This is because L stands for left-to-right traversal in 'L-attributed
translation.'
Start analyzing the L-attributed SDD in the following order:
1. The inherited attribute's value.
2. The value of synthesizing attributes.
Types of attributes
For non-terminals, we have the following two characteristics. Attributes that are synthetic and
inherited.
Inherited attributes
The semantic rule connected to the production at the parent of the node defines inherited attributes.
Values for attributes are limited to a node's parent, siblings, and self. The production's body must
contain the concerned non-terminal. A semantic rule connected to the production at the parent of N
defines an inherited attribute for a nonterminal B at a parse-tree node N. An example of inherited
attributes is following:
S --> TUV { [Link] = [Link], [Link] = [Link] }
Synthesized attributes
The semantic rule related to the production at the node defines the synthesised attributes. Values for
attributes are only applicable to a node's children and to that node alone. The non-terminal in question
needs to be in charge of production. The lexical values (designated by lexval) produced by the lexical
analyzer are the synthesised properties of terminals. A semantic rule connected to the production at the
parse-tree node N defines a synthesised attribute for a nonterminal A there. An example of synthesized
attributes are following:
S --> S1 + T { [Link] = [Link] + [Link]}
The values in this were derived by [Link] from [Link] and [Link].
Application of Syntax Directed Translation
We use SDT(Syntax Directed Translation) for
 Executing Arithmetic Expressions
 Conversion from infix to postfix expression
 Conversion from infix to prefix expression
 For Binary to decimal conversion
 Counting the number of Reductions
 Creating a Syntax tree
 Generating intermediate code
 Storing information into the symbol table
 Type checking
Example
Here, we will cover an example of the application of SDT(Syntax Directed Translation) for a better
understanding of the SDT application uses. Let’s consider an example of an arithmetic expression,
and then you will see how we construct an SDT.
For Example: 2*(4+5)
Syntax-Directed Translation Schemes:It is a kind of notation in which each production of Context-Free
Grammar is related with a set of semantic rules or actions, and each grammar symbol is related to a set
of Attributes. Thus, the grammar and the group of semantic Actions combine to make syntax-directed
definitions.
The translation may be the generation of intermediate code, object code, or adding the information in
symbol table about constructs type. The modern compiler uses the syntax-directed translation that
makes the user’s life easy by hiding many implementation details and free the user from having to
specify explicitly the order in which semantic rules are to be evaluated.
Semantic Actions− It is an action that is executed whenever the Parser will recognize the input string
generated by context-free grammar.
For Example, A → BC {Semantic Action}
Semantic Action is written in curly braces Attached with a production.
In Top-Down Parser, semantic action will be taken when A will be expanded to derive BC which will
further derive string w.
In Bottom-Up Parser, Semantic Action is generated when BC is reduced to A.
Syntax Directed Translation Scheme for Postfix Code
In postfix notation, the operator appears after the operands, i.e., the operator between operands is
taken out & is attached after operands.
For example,
Consider Grammar
E → E(1) + E(2)
E → E(1) ∗ E(2)
E → (E(1))
E → id
The following table shows the Postfix Translation for the grammar.
Production Semantic Action

E → E(1) + E(2) E. CODE = E(1). CODE| |E(2). CODE | |'+'

E → E(1) ∗ E(2) E. CODE = E(1). CODE| |E(2). CODE | |'∗'

E → (E(1)) E. CODE = E(1). CODE

E → id E. CODE = id
Translation for Postfix Notation
Here, E. CODE represents an attribute or translation of grammar symbol E. It means the sequence of
three-address statements evaluating E. The translation of the nonterminal on the left of each production
is the concatenation (| |) of the translation of the non-terminals on the right followed by the operator.
In the first productionE → E(1) + E(2), the value of translation E. CODE is the concatenation of two
translation E(1). CODE & E(2). CODE and symbol '+'.
In the second productionE → E(1) ∗ E(2), the value of translation E. CODE is the concatenation of two
translation E(1). CODE & E(2). CODE and symbol '∗'.
Here, concatenation is represented by the symbol (| |).
In the third productionE → (E(1)), the translation of parenthesized expression is the same as that for
unparenthesized expression.
In the fourth productionE → id, the translation of any identifier is the identifier itself.

Implementing L-Attributed SDD’s :

Implementing L-attributed SDDs in compiler design involves methods like recursive descent parsing, on-
the-fly code generation, and using an LL parser with a stack. These methods work by evaluating
attributes using a combination of synthesized and inherited attributes, often with the evaluation order
determined by a depth-first, left-to-right traversal of the parse tree.

Key implementation techniques include:

1. Translation During Recursive-Descent Parsing

This is a common and straightforward method. For each nonterminal in the grammar, a function is
written in the parser.

 Inherited attributes are passed as arguments to the function calls for nonterminals in the production
body.

 Synthesized attributes are the return values of these functions.

 Local variables are used to store attribute values as they are computed and needed to calculate
subsequent attributes in the production.

 The semantic actions are embedded within the parser functions, ensuring they are executed in the
correct left-to-right order as specified by the L-attributed definition.

2. Implementation with LL Parsers

For table-driven LL parsers, attributes are managed using an augmented parsing stack.

 The stack holds not only grammar symbols but also action records and synthesize records to store
attribute values.

 Inherited attributes for a nonterminal A are stored just below A's record on the stack, ensuring they are
available when A is expanded.

 As the parser mimics a leftmost derivation, attribute values are copied to the necessary locations on the
stack via specific actions, following the L-attributed constraints.
3. Implementation with Bottom-Up (LR) Parsers

While L-attributed SDDs are naturally suited for top-down parsing, they can be implemented with
bottom-up parsers (like LR) using a "trick" involving grammar transformation.

 The L-attributed SDD is first converted into a Syntax Directed Translation (SDT) scheme with embedded
actions.

 Each embedded action in the middle of a production is replaced with a unique marker
nonterminal (e.g., M -> ε).

 The attributes needed by the original action become synthesized attributes of the new marker
nonterminal. The actual attribute evaluation logic is attached as a semantic rule to the marker's ε-
production.

 During the LR parse, the marker is reduced to epsilon at the appropriate point, executing the action and
making the computed attributes available on the stack for subsequent computations.

These techniques allow compilers to perform semantic analysis and code generation (such as type
checking or intermediate code generation) incrementally during the parsing phase, without the need to
build a complete, explicit parse tree in memory, which is more efficient in terms of time and space.

Intermediate Code Generation:

A syntax tree is a tree in which each leaf node represents an operand, while each inside node
represents an operator. The Parse Tree is abbreviated as the syntax tree. The syntax tree is usually
used when representing a program in a tree structure.

Rules of Constructing a Syntax Tree

A syntax tree's nodes can all be performed as data with numerous fields. One element of the node for
an operator identifies the operator, while the remaining field contains a pointer to the operand
nodes. The operator is also known as the node's label. The nodes of the syntax tree for expressions
with binary operators are created using the following functions. Each function returns a reference to
the node that was most recently created.
1. mknode (op, left, right): It creates an operator node with the name op and two fields, containing
left and right pointers.
2. mkleaf (id, entry): It creates an identifier node with the label id and the entry field, which is a
reference to the identifier's symbol table entry.
3. mkleaf (num, val): It creates a number node with the name num and a field containing the
number's value, val. Make a syntax tree for the expression a 4 + c, for example. p1, p2,..., p5 are

Example 1: Syntax Tree for the string a - b ∗ c + d is:


pointers to the symbol table entries for identifiers 'a' and 'c', respectively, in this sequence.
Syntax tree for example 1

Example 2: Syntax Tree for the string a * (b + c) – d /2 is:

Syntax tree for example 2

Variants of syntax tree:

A syntax tree basically has two variants which are described below:
1. Directed Acyclic Graphs for Expressions (DAG)
2. The Value-Number Method for Constructing DAGs

Directed Acyclic Graphs for Expressions (DAG)

A DAG, like an expression's syntax tree, includes leaves that correspond to atomic operands and inside
codes that correspond to operators. If N denotes a common subexpression, a node N in a DAG has
many parents; in a syntax tree, the tree for the common subexpression would be duplicated as many
times as the subexpression appears in the original expression. As a result, a DAG not only encodes
expressions more concisely but also provides essential information to the compiler about how to
generate efficient code to evaluate the expressions.
The Directed Acyclic Graph (DAG) is a tool that shows the structure of fundamental blocks, allows you
to examine the flow of values between them, and also allows you to optimize them. DAG allows for
simple transformations of fundamental pieces.
Properties of DAG are:
1. Leaf nodes represent identifiers, names, or constants.
2. Interior nodes represent operators.
3. Interior nodes also represent the results of expressions or the identifiers/name where the values
are to be stored or assigned.
Examples:
T0 = a+b --- Expression 1
T1 = T0 +c --- Expression 2
Expression 1: T0 = a+b

Syntax tree for expression 1

Expression 2: T1 = T0 +c

Syntax tree for expression 2

The Value-Number Method for Constructing DAGs:


An array of records is used to hold the nodes of a syntax tree or DAG. Each row of the array
corresponds to a single record, and hence a single node. The first field in each record is an operation
code, which indicates the node's label. In the given figure below, Interior nodes contain two more
fields denoting the left and right children, while leaves have one additional field that stores the lexical
value (either a symbol-table pointer or a constant in this instance).
Nodes of a DAG for i = i + 10 allocated in an array

The integer index of the record for that node inside the array is used to refer to nodes in this array.
This integer has been referred to as the node's value number or the expression represented by the
node in the past. The value of the node labeled -I- is 3, while the values of its left and right children
are 1 and 2, respectively. Instead of integer indexes, we may use pointers to records or references to
objects in practice, but the reference to a node would still be referred to as its "value number." Value
numbers can assist us in constructing expressions if they are stored in the right data format.
 Algorithm: The value-number method for constructing the nodes of a Directed Acyclic Graph.
 INPUT: Label op, node /, and node r.
 OUTPUT: The value number of a node in the array with signature (op, l,r).
 METHOD: Search the array for node M with label op, left child I, and right child r. If there is such a
node, return the value number of M. If not, create in the array a new node N with label op, left
child I, and right child r, and return its value number.
While Algorithm produces the intended result, examining the full array every time one node is
requested is time-consuming, especially if the array contains expressions from an entire program. A
hash table, in which the nodes are divided into "buckets," each of which generally contains only a few
nodes, is a more efficient method. The hash table is one of numerous data structures that may
effectively support dictionaries. 1 A dictionary is a data type that allows us to add and remove
elements from a set, as well as to detect if a particular element is present in the set. A good dictionary
data structure, such as a hash table, executes each of these operations in a constant or near-constant
amount of time, regardless of the size of the set.
To build a hash table for the nodes of a DAG, we require a hash function h that computes the bucket
index for a signature (op, I, r) in such a manner that the signatures are distributed across buckets and
no one bucket gets more than a fair portion of the nodes. The bucket index h(op, I, r) is
deterministically computed from the op, I, and r, allowing us to repeat the calculation and always
arrive at the same bucket index per node (op, I, r).
The buckets can be implemented as linked lists, as in the given figure. The bucket headers are stored
in an array indexed by the hash value, each of which corresponds to the first cell of a list. Each column
in a bucket's linked list contains the value number of one of the nodes that hash to that bucket. That
is, node (op,l,r) may be located on the array's list whose header is at index h(op,l,r).
Data structure for searching buckets

We calculate the bucket index h(op,l,r) and search the list of cells in this bucket for the specified input
node, given the input nodes op, I, and r. There are usually enough buckets that no list has more than a
few cells. However, we may need to examine all of the cells in a bucket, and for each value number v
discovered in a cell, we must verify that the input node's signature (op,l,r) matches the node with
value number v in the list of cells (as in fig above). If a match is found, we return v. We build a new
cell, add it to the list of cells for bucket index h(op, l,r), and return the value number in that new cell if
we find no match.
Three Address Code:

What is three-address code in compiler design?


Three-address code, often known as TAC or 3AC, is an intermediate code in computer science that is
used by optimizing compilers to help in the implementation of code-improving transformations. At
most, three operands can be used in a TAC instruction, which is normally made up of an assignment and
a binary operator. For illustration, t1:= t2 + t3. Despite the possibility of instructions with fewer
operands, the name of these statements comes from their use of three operands.
Since three-address code is used as an intermediate language within compilers, the operands will most
likely not be physical memory addresses or processor registers but rather symbolic addresses that will
be transformed into actual addresses during register allocation. Because three-address code is often
generated by the compiler, operand names are frequently numbered sequentially.
General illustration
a = b op c

Where a, b, or c stand for operands such as names, constants, compiler-generated temporaries, and op
stands for the operator.
There can only be one operator on the right side of instruction in a three-address code; no built-up
arithmetic expressions are allowed. As a result, a source-language statement like x+y*z might be
translated into a three-address instruction sequence.
t₁=y*z
t₂=x+t₁

Where t₁ and t₂ are temporary names produced by the compiler. The unwinding of multi-operator
arithmetic expressions and nested flow-of-control statements make three-address code ideal for target-
code creation and optimization. Using names for the intermediate values computed by a computer
makes it simple to rearrange three-address codes.
Examples
1. Convert a = (-c * b) + (-c * d) into three address code.
Solution
t₁ = -c
t₂ = b*t₁
t₃ = -c
t₄ = d * t₃
t₅ = t₂ + t₄
a = t₅

In the target program, t is utilized as a register.


2. For the following code, write the three address codes.
for(i = 1; i<=10; i++) { a[i]=x * 5; }

Solution
i=1
L:t1=x*5
t2=&a
t3=sizeof(int)
t4=t3*i
t5=t2+t4
*t5=t1
i=i+1
if i<10 goto L

Implementation of Three Address Code


There are three different ways to express three address codes:
 Quadruple

 Triples

 Indirect Triples
Quadruple
It is a structure that has four fields: op, arg1, arg2, and result. The operator is denoted by op, arg1, and
arg2 denote the operands, and the result is used to record the outcome of the expression.
These quadruples play a crucial role in breaking down high-level language statements into more
digestible parts, facilitating compilation-stage analysis and optimization procedures.
Benefits of Quadrule
 For global optimization, it's simple to restructure code.

 Using a symbol table, one may rapidly get the value of temporary variables.
Drawbacks of Quadrule
 There are a lot of temporary items.
 The establishment of temporary variables adds to the time and space complexity.

Example
Convert a = -b * c + d into three address codes.
The following is the three-address code:
t₁ = -b
t₂ = c + d
t₃ = t₁ * t₂
a = t₃

Quadruples are used to symbolize these statements:


# Op Arg1 Arg2 Result

(0) unimus b - t₁

(1) + c d t₂

(2) * t₁ t₂ t₃

(3) = t₃ - a

Triples
Instead of using an additional temporary variable to represent a single action, a pointer to the triple is
utilized when a reference to another triple's value is required. As a result, it only has three fields: op,
arg1, and arg2.
Benefits of Triples
 Triples make it easier to analyze and optimize code by disassembling difficult high-level language
constructs into smaller, more manageable parts

 Triples facilitate error, data flow, and control flow analysis of code, facilitating improved debugging
and comprehension

Drawbacks of Triples

 Temporaries are difficult to rearrange since they are implicit.

 It's tough to optimize since it necessitates the relocation of intermediary code. When a triple is
relocated, all triples that relate to it must likewise be changed. The symbol table entry can be
accessed directly using the pointer.

Example
Convert a = -b * c + d into three address codes.
The following is the three-address code:
t₁ = -b
t₂ = c + dM
t₃ = t₁ * t₂
a = t₃

The following triples represent these statements:


# Op Arg1 Arg2

(0) unimus b -

(1) + c d

(2) * (0) (1)

(3) = (2) -

Also See, Top Down Parsing


Indirect Triples
This approach employs a pointer to a list of all references to computations that are created and kept
separately. Its usefulness is comparable to quadruple representation, however, it takes up less space.
Temporaries are easy to rearrange since they are implicit.
Benefits of Indirect Triples
 For languages that use dynamic memory allocation and pointer manipulation, indirect triples are
essential for representing complex pointer operations and memory accesses

 They simplify the intricate address calculations needed for nested structures, multi-dimensional
arrays, and other memory architectures

Drawbacks of Indirect Triples


 Indirect triples can increase the complexity of the intermediate representation and optimization
phases of the compiler, complicating the design and implementation of the compiler

 Due to the additional pointer dereferencing and memory access operations required by using
indirect triples, there may be performance overhead that could slow down execution

Example
Convert a = b * – c + b * – c into three address codes.
The following is the three-address code:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
# Op Arg1 Arg2

(14) unimus c -

(15) * (14) b

(16) unimus c -

(17) * (16) b

(18) + (15) (17)

(19) = a (18)

List of pointers to the table


# Statement

(0) (14)

(1) (15)

(2) (16)

(3) (17)

(4) (18)

(5) (19)

Applications of Types and Declarations:


• Type checking: It uses logical rules to reason regarding the behavior of a program at run time. It
ensures that the operands match the type that an operation expects. For example, the && operator
in Java expects two booleans as operands and returns a boolean result.
• Translation Applications: A compiler can determine the storage that we will need for that
name(declared within a procedure or a class) at run time from the type of name. Type information is
also required to calculate the address denoted by an array reference, to insert explicit type conversions,
and to choose the correct version of an arithmetic operator, among other things.
Type Expressions
A primary type is a type of expression. Typical basic types for a language contain boolean, char, integer,
float, and void; the void implies "the absence of a value."
We can form a type name or type expression by applying the array type constructor to a number and a
type expression.
Types have structure, which we can represent using type expressions: a type expression is either a
primary type or is formed by applying a type constructor operator to a type expression. The basic types
and constructors depend on the language to be checked.
For Example,

The array type int [2][3][2] can be read as an "array of 2 arrays each of 3 arrays of 2 integers each" and
written as a type expression array(2, array(3, array(2, integer))).
Basic types
char, int, double, float are type expressions.
Type names
It is convenient to consider that names of types are type expressions.
Arrays
If E is a type expression and n is an int, then
array(n, E)
is a type expression indicating the type of an array with elements in T and indices in the range
0 ... n - 1.
Products
If E1 and E2 are type expressions then
E1 × E2
is a type expression indicating the type of an element of the Cartesian product of E1 and E2. The
products of more than two types are defined similarly, and this product operation is left-
associative. Hence
(E1 × E2) × E3 and E1 × E2 × E3
are equivalent.
Records
The only difference between a record and a product is that the fields of a record have names.
If abc and xyz are two type names, if E1 and E2 are type expressions then
record(abc: E1, xyz: E2)
is a type expression indicating the type of an element of the Cartesian product of T1 and T2.
Pointers
Let E is a type expression, then
pointer(E)
is a type expression denoting the type pointer to an object of type T.
Function types
If E1 and E2 are type expressions then
E1 -> E2
is a type expression denoting the type of functions associating an element from E1 with an
element from E2.
Also See, Top Down Parsing
Type Equivalence
When graphs represent type expressions, two types are structurally equivalent if and only if one of the
following conditions is true:
 They are of the same basic type.
 They are formed by applying the identical constructor to structurally equivalent types.
 One is a type name that refers to the other.

The first two conditions in the above definition lead to the name equivalence of type expressions if type
names are interpreted as though they stand for themselves.
"If two type expressions are equal, return a specified type else error," says a common type-checking
rule. Ambiguities might develop when type expressions are given names and then utilized in later type
expressions. The critical question is whether a type expression's name refers to itself or is an
abbreviation for another type of expression.
Type names can create implicit cycles by denoting type expressions. The resulting graph can have cycles
owing to recursive types if edges to type names are redirected to the type expressions given by the
names.
Example of a non Cyclic graph:
Declarations
When we encounter declarations, we need to lay out storage for the declared variables.
For every local name in a procedure, we create a Symbol Table(ST) entry including:
1. The type of the name
2. How much storage the name requires
Grammar:
D -> real, id
D -> integer, id
D -> D1, id
We use ENTER to enter the symbol table and use ATTR to trace the data type.

Production Rule (PR) Semantic Action

D -> real, id ENTER ([Link], real)


[Link] = real

D -> integer, id ENTER ([Link], integer)


[Link] = integer

D -> D1, id ENTER ([Link], [Link])


[Link] = [Link]

A succession of declarations is generated by nonterminal D: Basic, array, and record types are developed
by nonterminal T. One of the basic types int and float is caused by nonterminal B. Nonterminal C, which
stands for "component," creates strings of zero or more integers, each encircled by brackets. A primary
kind given by B is followed by array components specified by nonterminal C in an array type. A record
type (T's second product) is a set of declarations for the record fields, all of which are enclosed in curly
braces.

Translation of Expressions :

The translation of expressions in intermediate code generation involves converting high-level language
expressions into a simpler, machine-independent representation that facilitates optimization and target
code generation. The most common intermediate representations (IRs) for expressions are Three-
Address Code (TAC), Syntax Trees, and Directed Acyclic Graphs (DAGs).

Common Intermediate Representations

 Syntax Trees: A condensed form of a parse tree where internal nodes represent operators and child
nodes represent operands. They retain the structure of the source code expression and are easy to
generate from the parser, though less suited for complex optimizations compared to linear IRs.

 Directed Acyclic Graphs (DAGs): A DAG is an extension of a syntax tree that identifies common
subexpressions by using a single node for all occurrences of the same computed value. This eliminates
redundant computations and is highly effective for optimization within basic blocks.

 Three-Address Code (TAC): A linear, pseudo-code representation where each instruction has at most
three addresses (two operands and one result) and one operator. This format breaks down complex
expressions into a sequence of simple, atomic operations, making it ideal for analysis, optimization, and
eventual translation into machine code.
Type Checking in Compiler Design


Type checking is the process of checking and enforcing the constraints of types assigned to values in a
program. A compiler has to check that a source program conforms both to the syntactic and semantic
rules of the language as well as its type rules. The process also helps in limiting the kinds of types that
can be used in certain contexts, assigning types to values, and then checking that these values are used
appropriately.
Object types are checked by the type checker - a separate module in the compiler, which flags type
errors when a value is being used in an inappropriate manner or the type rules of that type are being
violated. The type checker reports a type error in such cases, and then the programmer must fix the
wrong types. Whichever compiler is used for compilation, the type rules for a language must be
enforced.
What is Type Checking?
Type checking is the process of verifying that the types of all variables, expressions, and functions are
according to the rules of the programming language. It means operations operate on compatible data
types, which helps in preventing errors and hence ensures correctness of the program. Type checking
might be performed at compile time, called static type checking, or it could be performed during
execution, and this is termed dynamic type checking. Both techniques have different advantages
concerning error detection and flexibility of programs.
Every programming language has a collection of type rules that dictate how integers, floats, and
characters can legally be used. A compiler keeps track of this type information and performs a set of
computations to verify that a program is correct with respect to the type rules. Type checking therefore
offers a guarantee that programs are type-safe and compatible with the language specifications.
Conversion
Conversion from one type to another type is known as implicit if it is to be done automatically by the
compiler. Implicit type conversions are also called Coercion and coercion is limited in many languages.
Example: An integer may be converted to a real but real is not converted to an integer.
Conversion is said to be Explicit if the programmer writes something to do the Conversion.
Tasks
 Has to allow "Indexing is only on an array"
 Has to check the range of data types used
 INTEGER (int) has a range of -32,768 to +32767
 FLOAT has a range of 1.2E-38 to 3.4E+38
Types of Type Checking
There are two kinds of type checking:
 Static Type Checking.
 Dynamic Type Checking.
Static Type Checking
Static type checking is defined as type checking performed at compile time. It checks the type variables
at compile-time, which means the type of the variable is known at the compile time. It generally
examines the program text during the translation of the program. Using the type rules of a system, a
compiler can infer from the source text that a function (fun) will be applied to an operand (a) of the right
type each time the expression fun(a) is evaluated.
Examples of Static Checks
 Type-checks: A compiler should report an error if an operator is applied to an incompatible operand.
For example, if an array variable and function variable are added together.
 The flow of control checks: Statements that cause the flow of control to leave a construct must have
someplace to which to transfer the flow of control. For example, a break statement in C causes
control to leave the smallest enclosing while, for, or switch statement, an error occurs if such an
enclosing statement does not exist.
 Uniqueness checks: There are situations in which an object must be defined only once. For example,
in Pascal an identifier must be declared uniquely, labels in a case statement must be distinct, and
else a statement in a scalar type may not be represented.
 Name-related checks: Sometimes the same name may appear two or more times. For example in
Ada, a loop may have a name that appears at the beginning and end of the construct. The compiler
must check that the same name is used at both places.
Benefits of Static Type Checking
 Runtime Error Protection.
 It catches syntactic errors like spurious words or extra punctuation.
 It catches wrong names like Math and Predefined Naming.
 Detects incorrect argument types.
 It catches the wrong number of arguments.
 It catches wrong return types, like return "70", from a function that's declared to return an int.
Dynamic Type Checking
Dynamic Type Checking is defined as the type checking being done at run time. In Dynamic Type
Checking, types are associated with values, not variables. Implementations of dynamically type-checked
languages runtime objects are generally associated with each other through a type tag, which is a
reference to a type containing its type information. Dynamic typing is more flexible. A static type system
always restricts what can be conveniently expressed. Dynamic typing results in more compact programs
since it is more flexible and does not require types to be spelled out. Programming with a static type
system often requires more design and implementation effort.
Languages like Pascal and C have static type checking. Type checking is used to check the correctness of
the program before its execution. The main purpose of type-checking is to check the correctness and
data type assignments and type-casting of the data types, whether it is syntactically correct or not
before their execution.
Static Type-Checking is also used to determine the amount of memory needed to store the variable.
Design of the type-checker depends on
 Syntactic Structure of language constructs.
 The Expressions of languages.
 The rules for assigning types to constructs (semantic rules).
The Position of the Type checker in the Compiler
Type checking in Compiler

The token streams from the lexical analyzer are passed to the PARSER. The PARSER will generate a
syntax tree. When a program (source code) is converted into a syntax tree, the type-checker plays a
Crucial Role. So, by seeing the syntax tree, you can tell whether each data type is handling the correct
variable or not. The Type-Checker will check and if any modifications are present, then it will modify. It
produces a syntax tree, and after that, INTERMEDIATE CODE Generation is done.

control flow :

In compiler design, control flow refers to the order in which program instructions are executed.
Intermediate code generation uses specific techniques, primarily involving labels and jump statements,
to translate high-level language control flow structures (like if-else and loops) into a machine-
independent, linear representation, typically three-address code (TAC).

Key Concepts

 Intermediate Representation (IR): A machine-independent code format (e.g., three-address code,


abstract syntax trees, Static Single Assignment (SSA) form) that bridges the gap between the source
code and the target machine code.

 Boolean Expressions: These expressions evaluate to true or false and determine the path of execution.
In the context of control flow, they are translated into conditional or unconditional jumps rather than
simple boolean values (true/false as 1/0).

 Labels: Symbolic names used to mark specific instructions in the intermediate code, serving as targets
for jump statements.

 Jump Statements: Instructions that alter the normal sequential flow of execution, such as goto
L (unconditional jump) and if condition goto L (conditional jump).

 Basic Blocks and Control Flow Graph (CFG): The intermediate code is partitioned into "basic blocks"
(sequences of instructions with a single entry and exit point). These blocks and the possible paths
between them are then visualized in a Control Flow Graph, which is essential for code analysis and
optimization.

Translation of Control Flow Statements

The compiler uses syntax-directed definitions to generate intermediate code for various control flow
statements:

 if (B) S1: The code for boolean expression B is generated first. If B is true, control flows to the code for
statement S1. If B is false, control jumps to the instruction immediately following S1.

 if (B) S1 else S2: The code for B jumps to S1 if true, or to S2 if false. Jumps are added after S1 and S2 to
ensure they both proceed to the same subsequent instruction (the one after the entire if-else construct).

 while (B) S1: A loop requires a label at the beginning (begin) to jump back to for the next iteration. The
code for B jumps to S1 if true (label [Link]) or out of the loop if false (label [Link] points to [Link]).
After S1's code, an unconditional jump to begin is inserted.

 Short-Circuit Evaluation: This optimization avoids evaluating the entire boolean expression if the value
can be determined from the first part (e.g., in B1 || B2, if B1 is true, B2 is skipped). The translation uses
a series of conditional jumps to achieve this efficient flow.

Implementation Techniques

Compilers use several methods to manage the labels and jumps during a single pass:

 Backpatching: In cases where the target label of a jump instruction is unknown when the jump is
generated, a list of all such "unspecified" jumps is maintained. In a subsequent pass, these jumps are
"patched" with the correct, now-known, labels.

 Boolean Values and Jumping Code: Boolean expressions can either be used to alter the flow of control
(using jumps) or to compute a logical value (e.g., for an assignment like x = a > b). Compilers handle
these two roles using different code generation functions or by constructing a syntax tree and using
different methods for generating the appropriate code.

Backpatching:

It is a technique used during intermediate code generation (like Three-Address Code) to handle forward
jumps and control flow (if-else, loops) in a single pass, by leaving jump targets as placeholders (blank
labels) and filling them in later when the actual target address becomes known. It uses lists
(truelist, falselist) to manage these unresolved jumps, allowing the compiler to efficiently generate code
for complex expressions like boolean conditions by patching all related jump instructions at once, thus
resolving forward references and optimizing code generation.

How it works:
1. Placeholder Generation: When generating code for conditional statements (e.g., if (E) {S1} else {S2}), the
compiler generates jump instructions (like goto L) but leaves the label L blank.

2. List Management: It maintains true and false lists for boolean expressions.

1. [Link]: List of jump instructions to patch when E is true.

2. [Link]: List of jump instructions to patch when E is false.

3. Patching: Once the target code for S1 or S2 is generated, the compiler knows the correct address (label)
and uses a backpatch(list, address) function to fill in all blank labels in the corresponding true or false list
with the actual target address.

4. Merging Lists: For complex expressions (e.g., E1 and E2), merge() function concatenates true/false lists
from sub-expressions to handle all potential jumps.

Key uses:

 Short-circuit Evaluation: For and/or conditions, it ensures jumps go to the correct next instruction or
final statement.

 Flow-of-Control Statements: Efficiently translates if-then-else, while, for loops in one pass.

 One-Pass Code Generation: Avoids needing multiple passes over the code to resolve all jump targets.

Example: if (a < b) then S1 else S2

 Generate if a < b goto __ (add to truelist) and goto __ (add to falselist).

 Generate code for S1.

 Patch truelist with the start of S1's code.

 Generate code for S2.

 Patch falselist with the start of S2's code.

Intermediate Code for Procedures:

Intermediate code generation for procedures in compiler design involves representing function or
subroutine calls and their associated actions in a machine-independent format. This representation
facilitates optimization and eventual translation into target machine code.

Key aspects of intermediate code for procedures:

 Parameter Handling:
 Parameter Passing: Intermediate code needs to represent how parameters are passed to a
procedure. This can involve param instructions for each argument, often followed by
a call instruction. The order of param instructions typically corresponds to the order of arguments in the
procedure definition.
 Argument Evaluation: Expressions used as arguments to a procedure must be evaluated and their
results stored in temporary locations or registers before being passed as parameters.
 Procedure Call Representation:
 call instruction: A call instruction is used to invoke a procedure, specifying the procedure's entry point
(e.g., its label or address).
 Return Value: If the procedure returns a value, the intermediate code will include instructions to store
this return value in a designated location (e.g., a temporary variable or register).
 Procedure Body Translation:
 The body of the procedure is translated into intermediate code in the same manner as the main
program, with instructions for computations, control flow, and local variable management.
 Symbol Tables: Managing separate symbol tables for local variables within procedures is crucial to
ensure correct scope and prevent naming conflicts.
 Control Flow for Procedures:
 return instruction: A return instruction marks the end of a procedure's execution and transfers control
back to the caller.
 Stack Management: Intermediate code often implicitly or explicitly handles stack operations for saving
and restoring the execution context (e.g., return address, caller's registers) during procedure calls and
returns.
Example using Three-Address Code (TAC):
Consider a procedure add(a, b):
Code
int add(int a, int b) {
int sum;
sum = a + b;
return sum;
}

Its intermediate code using Three-Address Code might look like this:
Code
// Intermediate code for calling 'add'
param x // Assuming x is the first argument
param y // Assuming y is the second argument
call add, 2 // Call 'add' with 2 parameters
t1 = retval // Store the return value in t1

// Intermediate code for the 'add' procedure itself


add:
t2 = a + b
sum = t2
return sum

In this example, param instructions set up the arguments, call invokes the procedure, and retval (or a
similar construct) represents the return value. Within the procedure, standard arithmetic operations are
translated into TAC

You might also like