0% found this document useful (0 votes)
12 views43 pages

Code Optimization Techniques in Compilers

Uploaded by

ritu shiroha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views43 pages

Code Optimization Techniques in Compilers

Uploaded by

ritu shiroha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT-4

Code Optimization in Compiler Design


Code optimization is a crucial phase in compiler design aimed at enhancing the performance and
efficiency of the executable code. By improving the quality of the generated machine code
optimizations can reduce execution time, minimize resource usage, and improve overall system
performance. This process involves the various techniques and strategies applied during
compilation to produce more efficient code without altering the program's functionality.
The code optimization in the synthesis phase is a program transformation technique, which tries
to improve the intermediate code by making it consume fewer resources (i.e. CPU, Memory) so
that faster-running machine code will result. The compiler optimizing process should meet the
following objectives:
 The optimization must be correct, it must not, in any way, change the meaning of the
program.
 Optimization should increase the speed and performance of the program.
 The compilation time must be kept reasonable.
 The optimization process should not delay the overall compiling process.
When to Optimize?
Optimization of the code is often performed at the end of the development stage since it reduces
readability and adds code that is used to increase performance.
Why Optimize?
Optimizing an algorithm is beyond the scope of the code optimization phase. So the program is
optimized. And it may involve reducing the size of the code. So, optimization helps to:
 Reduce the space consumed and increases the speed of compilation.
 Manually analyzing datasets involves a lot of time. Hence, we make use of software like
Tableau for data analysis. Similarly, manually performing the optimization is also tedious
and is better done using a code optimizer.
 An optimized code often promotes re-usability.
Types of Code Optimization
The optimization process can be broadly classified into two types:
 Machine Independent Optimization: This code optimization phase attempts to improve
the intermediate code to get a better target code as the output. The part of the intermediate
code which is transformed here does not involve any CPU registers or absolute memory
locations.
 Machine Dependent Optimization: Machine-dependent optimization is done after
the target code has been generated and when the code is transformed according to the target
machine architecture. It involves CPU registers and may have absolute memory references
rather than relative references. Machine-dependent optimizers put efforts to take
maximum advantage of the memory hierarchy.
Ways to Optimize Code
There are several ways to optimize code. Some of them are mentioned below.
1. Compile Time Evaluation:
(i) A = 2*(22.0/7.0)*r
Perform 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
Evaluate x/2.3 as 12.4/2.3 at compile time.
2. Variable Propagation:
//Before Optimization
c=a*b
x=a
till
d=x*b+4

//After Optimization
c=a*b
x=a
till
d=a*b+4
3. Constant Propagation:
 If the value of a variable is a constant, then replace the variable with the constant. The
variable may not always be a constant.
Example:
(i) A = 2*(22.0/7.0)*r
Performs 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
Evaluates x/2.3 as 12.4/2.3 at compile time.
(iii) int k=2;
if(k) go to L3;
It is evaluated as :
go to L3 ( Because k = 2 which implies condition is always true)
4. Constant Folding:
 Consider an expression : a = b op c and the values b and c are constants, then the value of a
can be computed at compile time.
Example:
#define k 5
x=2*k
y=k+5

This can be computed at compile time and the values of x and y are :
x = 10
y = 10
Note: Difference between Constant Propagation and Constant Folding:
 In Constant Propagation, the variable is substituted with its assigned constant where as in
Constant Folding, the variables whose values can be computed at compile time are
considered and computed.

5. Copy Propagation:
 It is extension of constant propagation.
 After a is assigned to x, use a to replace x till a is assigned again to another variable or value
or expression.
 It helps in reducing the compile time as it reduces copying.
Example :
//Before Optimization
c=a*b
x=a
till
d=x*b+4

//After Optimization
d=a*b+4
6. Common Sub Expression Elimination:
 In the above example, a*b and x*b is a common sub expression.
7. Dead Code Elimination:
 Copy propagation often leads to making assignment statements into dead code.
 A variable is said to be dead if it is never used after its last definition.
 In order to find the dead variables, a data flow analysis should be done.
Example:
8. Unreachable Code Elimination:
 First, Control Flow Graph should be constructed.
 The block which does not have an incoming edge is an Unreachable code block.
 After constant propagation and constant folding, the unreachable branches can be eliminated.

#include <iostream>
using namespace std;

int main() {
int num;
num=10;
cout << "GFG!";
return 0;
cout << num; //unreachable code
}
//after elimination of unreachable code
int main() {
int num;
num=10;
cout << "GFG!";
return 0;
}
9. Function Inlining:
 Here, a function call is replaced by the body of the function itself.
 This saves a lot of time in copying all the parameters, storing the return address, etc.
10. Function Cloning:
 Here, specialized codes for a function are created for different calling parameters.
 Example: Function Overloading
11. Induction Variable and Strength Reduction:
 An induction variable is used in the loop for the following kind of assignment i = i +
constant. It is a kind of Loop Optimization Technique.
 Strength reduction means replacing the high strength operator with a low strength.
Example 1 :
Multiplication with powers of 2 can be replaced by shift left operator which is less
expensive than multiplication
a=a*16
// Can be modified as :
a = a<<4

Example 2 :
i = 1;
while (i<10)
{
y = i * 4;
}

//After Reduction
i=1
t=4
{
while( t<40)
y = t;
t = t + 4;
}
Loop Optimization Techniques
1. Code Motion or Frequency Reduction:
 The evaluation frequency of expression is reduced.
 The loop invariant statements are brought out of the loop.
Example:
a = 200;
while(a&gt;0)
{
b = x + y;
if (a % b == 0)
printf(“%d”, a);
}

//This code can be further optimized as


a = 200;
b = x + y;
while(a&gt;0)
{
if (a % b == 0}
printf(“%d”, a);
}
2. Loop Jamming
 Two or more loops are combined in a single loop. It helps in reducing the compile time.
Example:
// Before loop jamming
for(int k=0;k<10;k++)
{
x = k*2;
}

for(int k=0;k<10;k++)
{
y = k+3;
}

//After loop jamming


for(int k=0;k<10;k++)
{
x = k*2;
y = k+3;
}
3. Loop Unrolling
 It helps in optimizing the execution time of the program by reducing the iterations.
 It increases the program's speed by eliminating the loop control and test instructions.
Example:
//Before Loop Unrolling

for(int i=0;i<2;i++)
{
printf("Hello");
}

//After Loop Unrolling

printf("Hello");
printf("Hello");
Where to Apply Optimization?
Now that we learned the need for optimization and its two types,now let's see where to apply
these optimization.
 Source program: Optimizing the source program involves making changes to the algorithm
or changing the loop structures. The user is the actor here.
 Intermediate Code: Optimizing the intermediate code involves changing the address
calculations and transforming the procedure calls involved. Here compiler is the actor.
 Target Code: Optimizing the target code is done by the compiler. Usage of registers, and
select and move instructions are part of the optimization involved in the target code.
 Local Optimization: Transformations are applied to small basic blocks of statements.
Techniques followed are Local Value Numbering and Tree Height Balancing.
 Regional Optimization: Transformations are applied to Extended Basic Blocks. Techniques
followed are Super Local Value Numbering and Loop Unrolling.
 Global Optimization: Transformations are applied to large program segments that include
functions, procedures, and loops. Techniques followed are Live Variable Analysis and
Global Code Replacement.
 Interprocedural Optimization: As the name indicates, the optimizations are applied inter
procedurally. Techniques followed are Inline Substitution and Procedure Placement.
Advantages of Code Optimization
 Improved performance: Code optimization can result in code that executes faster and uses
fewer resources, leading to improved performance.
 Reduction in code size: Code optimization can help reduce the size of the generated code,
making it easier to distribute and deploy.
 Increased portability: Code optimization can result in code that is more portable across
different platforms, making it easier to target a wider range of hardware and software.
 Reduced power consumption: Code optimization can lead to code that consumes less
power, making it more energy-efficient.
 Improved maintainability: Code optimization can result in code that is easier to understand
and maintain, reducing the cost of software maintenance.
Disadvantages of Code Optimization
 Increased compilation time: Code optimization can significantly increase the compilation
time, which can be a significant drawback when developing large software systems.
 Increased complexity: Code optimization can result in more complex code, making it
harder to understand and debug.
 Potential for introducing bugs: Code optimization can introduce bugs into the code if not
done carefully, leading to unexpected behavior and errors.
 Difficulty in assessing the effectiveness: It can be difficult to determine the effectiveness of
code optimization, making it hard to justify the time and resources spent on the process.

What are Basic Blocks?

A basic block is a sequence of instructions or operations with a single entry point and a single
exit point. It means, the execution enters at the beginning and leaves at the end without any
jumps or branches in between.

Characteristics of Basic Blocks

To understand basic blocks properly, we must understand its characteristics. Listed below are the
important characteristics of basic blocks −

 No Jump or Branch Statements Inside − A basic block does not contain labels (LBL), jump
(JMP), or test (TST) statements inside it.
 Single Entry, Single Exit − Execution will start at the first instruction and continues
sequentially until the last instruction.
 Self-contained − The basic blocks do not interfere with other code segments. This is making
them easier to analyze and optimize.

Example: Identifying Basic Blocks

Consider the following atom sequence (three-address code) generated from a Java expression −

(ADD, b, c, T1)
(LBL, L1)
(ADD, b, c, T2)
(MUL, T1, T2, T3)
(TST, b, c,, 1, L3)
(MOV, T3,, a)

We can divide it into three basic blocks −

Basic Block Instructions

Block 1 (ADD, b, c, T1)

Block 2 (LBL, L1) (ADD, b, c, T2) (MUL, T1, T2, T3)

Block 3 (TST, b, c, 1, L3) (MOV, T3, , a)

Each block is independent and does not contain any control flow changes within it.

Optimizing Basic Blocks Using DAGs

A Directed Acyclic Graph (DAG) is a graph-based structure that helps in eliminating redundant
computations and detecting common sub expressions in a basic block. Before further discussion
let us understand this properly.

Why Use DAGs?

We use DAGs for the following reasons −

 Reduces Redundant Computations − If there is an operation like b + c and it appears multiple


times, a DAG helps compute it only once.
 Optimizes Instruction Count − The DAG simplifies the atom sequence. Generate a shorter,
more efficient version.
 Minimizes Temporary Storage − Redundant temporary variables are removed. This is reducing
register or memory usage.

Example: DAG for Expression Optimization

Consider the Java statement −

a = (b + c) * (b + c);

The un-optimized atom sequence produced by the parser −

(ADD, b, c, T1)
(ADD, b, c, T2)
(MUL, T1, T2, T3)
(MOV, T3,, a)

This sequence re-computes b + c twice, and this is wasting time and resources.

Using DAG representation, we recognize that b + c is the same in both cases. For that reason we
store it once and reuse it −

(ADD, b, c, T1)
(MUL, T1, T1, a)

Now, instead of two ADD operations, we have only one. Here the MUL operation stores the
result directly in a.

How to Construct a DAG for Optimization

We understood how DAGs are useful, but we need to understand how we can form the DAG. To
build a DAG from an atom sequence, we follow these steps:

 Step 1: Read an Atom (Instruction) − If the operation and operands already exist in the DAG,
reuse the node. Otherwise, create a new node.
 Step 2: Connect Nodes − Draw directed arcs from operation nodes to operand nodes. Then keep
track of which variables are associated with each node.
 Step 3: Generate Optimized Code − Now traverse the DAG in reverse order. This will help to
generate a minimal atom sequence.

Example: Building a DAG for Optimization

Consider the Java expression −

a*b+a*b+a*b

The parser generates the following atoms −

(MUL, a, b, T1)
(MUL, a, b, T2)
(ADD, T1, T2, T3)
(MUL, a, b, T4)
(ADD, T3, T4, T5)

Using the DAG Construction Process


Create a single node for MUL, a, b (instead of three).

Reuse it in both ADD operations instead of recomputing.


Optimized Atom Sequence (Generated from DAG)

(MUL, a, b, T1)
(ADD, T1, T1, T3)
(ADD, T3, T1, T5)

Here, we have eliminated redundant multiplications, and reduced five instructions to just three.

Basic Blocks and DAG Optimization: Advantages and Disadvantages

The following table highlights the advantages and disadvantages of Basic Blocks and DAG
Optimization −

Advantages Disadvantages

Faster Execution − Fewer instructions Limited to Basic Blocks − DAGs work only within a single
mean less CPU time, speeding up basic block and cannot optimize across different blocks due to
program execution. potential jump and branch statements.

Reduced Memory Usage − Fewer Cannot Handle Code with Labels or Jumps − DAG-based
temporary variables reduce register and optimization is not possible when labels (LBL) or jump (JMP)
memory storage needs. instructions exist, as control flow alters execution order.

Better Code Optimization − DAG-based Overhead in DAG Construction − Building and maintaining a
optimizations help compilers generate DAG adds complexity to the compiler, and large expressions
compact, efficient code. may require more memory for DAG storage.
What is Data Flow Analysis?
Data flow analysis is a technique used in compiler design to analyze how data flows through a
program. It involves tracking the values of variables and expressions as they are computed
and used throughout the program, with the goal of identifying opportunities for optimization
and identifying potential errors.
The basic idea behind data flow analysis is to model the program as a graph, where the nodes
represent program statements and the edges represent data flow dependencies between the
statements. The data flow information is then propagated through the graph, using a set of
rules and equations to compute the values of variables and expressions at each point in the
program.
Types of Data Flow Analysis
Some of the common types of data flow analysis performed by compilers include:
1. Reaching Definitions Analysis: This analysis tracks the definition of a variable or
expression and determines the points in the program where the definition "reaches" a
particular use of the variable or expression. This information can be used to identify
variables that can be safely optimized or eliminated.
2. Live Variable Analysis: This analysis determines the points in the program where a
variable or expression is "live", meaning that its value is still needed for some future
computation. This information can be used to identify variables that can be safely
removed or optimized.
3. Available Expressions Analysis: This analysis determines the points in the program
where a particular expression is "available", meaning that its value has already been
computed and can be reused. This information can be used to identify opportunities for
common sub expression elimination and other optimization techniques.
4. Constant Propagation Analysis: This analysis tracks the values of constants and
determines the points in the program where a particular constant value is used. This
information can be used to identify opportunities for constant folding and other
optimization techniques.
Advantages of Data flow Analysis
1. Improved code quality: By identifying opportunities for optimization and eliminating
potential errors, data flow analysis can help improve the quality and efficiency of the
compiled code.
2. Better error detection: By tracking the flow of data through the program, data flow
analysis can help identify potential errors and bugs that might otherwise go unnoticed.
3. Increased understanding of program behavior: By modeling the program as a graph
and tracking the flow of data, data flow analysis can help programmers better understand
how the program works and how it can be improved.
Basic Terminologies
 Definition Point: a point in a program containing some definition.
 Reference Point: a point in a program containing a reference to a data item.
 Evaluation Point: a point in a program containing evaluation of expression.
Data Flow Properties
 Available Expression - A expression is said to be available at a program point x if along
paths its reaching to x. A Expression is available at its evaluation point.
An expression a+b is said to be available if none of the operands gets modified before
their use.

Example -
Advantages
It is used to eliminate common sub expressions.

 Reaching Definition - A definition D is reaches a point x if there is path from D to x in


which D is not killed, i.e., not redefined.

Example -

 Advantage -
It is used in constant and variable propagation.
 Live variable - A variable is said to be live at some point p if from p to end the variable
is used before it is redefined else it becomes dead.

Example -
 Advantage -

1. It is useful for register allocation.


2. It is used in dead code elimination.
 Busy Expression - An expression is busy along a path if its evaluation exists along that
path and none of its operand definition exists before its evaluation along the path.
Advantage -
It is used for performing code movement optimization.
Features
 Identifying dependencies: Data flow analysis can identify dependencies between
different parts of a program, such as variables that are read or modified by multiple
statements.
 Detecting dead code: By tracking how variables are used, data flow analysis can detect
code that is never executed, such as statements that assign values to variables that are
never used.
 Optimizing code: Data flow analysis can be used to optimize code by identifying
opportunities for common subexpression elimination, constant folding, and other
optimization techniques.
 Detecting errors: Data flow analysis can detect errors in a program, such as uninitialized
variables, by tracking how variables are used throughout the program.
 Handling complex control flow: Data flow analysis can handle complex control flow
structures, such as loops and conditionals, by tracking how data is used within those
structures.
 Inter procedural analysis: Data flow analysis can be performed across multiple
functions in a program, allowing it to analyze how data flows between different parts of
the program.
 Scalability: Data flow analysis can be scaled to large programs, allowing it to analyze
programs with many thousands or even millions of lines of code.

Runtime Environments in Compiler Design


A translation needs to relate the static source text of a program to the dynamic actions that must
occur at runtime to implement the program. The program consists of names for procedures,
identifiers, etc., that require mapping with the actual memory location at runtime. Runtime
environment is a state of the target machine, which may include software libraries, environment
variables, etc., to provide services to the processes running in the system.
SOURCE LANGUAGE ISSUES
Activation Tree
A program consist of procedures, a procedure definition is a declaration that, in its simplest form,
associates an identifier (procedure name) with a statement (body of the procedure). Each
execution of the procedure is referred to as an activation of the procedure. Lifetime of an
activation is the sequence of steps present in the execution of the procedure. If ‘a’ and ‘b’ be two
procedures then their activations will be non-overlapping (when one is called after other) or
nested (nested procedures). A procedure is recursive if a new activation begins before an earlier
activation of the same procedure has ended. An activation tree shows the way control enters and
leaves activations. Properties of activation trees are :-
 Each node represents an activation of a procedure.
 The root shows the activation of the main function.
 The node for procedure ‘x’ is the parent of node for procedure ‘y’ if and only if the control
flows from procedure x to procedure y.
Example - Consider the following program of Quicksort
main() {

Int n;
readarray();
quicksort(1,n);
}

quicksort(int m, int n) {
Int i= partition(m,n);
quicksort(m,i-1);
quicksort(i+1,n);
}
The activation tree for this program will be:

First main function as the root then main calls readarray and quicksort. Quicksort in turn calls
partition and quicksort again. The flow of control in a program corresponds to a pre-order depth-
first traversal of the activation tree which starts at the root.
CONTROL STACK AND ACTIVATION RECORDS
Control stack or runtime stack is used to keep track of the live procedure activations i.e the
procedures whose execution have not been completed. A procedure name is pushed on to the
stack when it is called (activation begins) and it is popped when it returns (activation ends).
Information needed by a single execution of a procedure is managed using an activation record
or frame. When a procedure is called, an activation record is pushed into the stack and as soon as
the control returns to the caller function the activation record is popped.
A general activation record consists of the following things:
 Local variables: hold the data that is local to the execution of the procedure.
 Temporary values : stores the values that arise in the evaluation of an expression.
 Machine status: holds the information about the status of the machine just before the
function call.
 Access link (optional): refers to non-local data held in other activation records.
 Control link (optional): points to activation record of caller.
 Return value: used by the called procedure to return a value to calling procedure
 Actual parameters
Control stack for the above quick sort example:

SUBDIVISION OF RUNTIME MEMORY


Runtime storage can be subdivided to hold :
 Target code- the program code, is static as its size can be determined at compile time
 Static data objects
 Dynamic data objects- heap
 Automatic data objects- stack

STORAGE ALLOCATION TECHNIQUES


I. Static Storage Allocation
The names are bound with the storage at compiler time only and hence every time procedure is
invoked its names are bound to the same storage location only So values of local names can be
retained across activations of a procedure. Here compiler can decide where the activation records
go with respect to the target code and can also fill the addresses in the target code for the data it
operates on.
 For any program, if we create a memory at compile time, memory will be created in the
static area.
 For any program, if we create a memory at compile-time only, memory is created only once.
 It doesn’t support dynamic data structure i.e memory is created at compile-time and
deallocated after program completion.
 The drawback with static storage allocation is recursion is not supported.
 Another drawback is the size of data should be known at compile time
Eg- FORTRAN was designed to permit static storage allocation.
II. Stack Storage Allocation
 Storage is organized as a stack and activation records are pushed and popped as activation
begins and end respectively. Locals are contained in activation records so they are bound to
fresh storage in each activation.
 Recursion is supported in stack allocation
III. Heap Storage Allocation
 Memory allocation and deallocation can be done at any time and at any place depending on
the requirement of the user.
 Heap allocation is used to dynamically allocate memory to the variables and claim it back
when the variables are no more required.
 Recursion is supported.
PARAMETER PASSING: The communication medium among procedures is known as
parameter passing. The values of the variables from a calling procedure are transferred to the
called procedure by some mechanism.
Basic terminology :
 R- value: The value of an expression is called its r-value. The value contained in a single
variable also becomes an r-value if its appear on the right side of the assignment operator. R-
value can always be assigned to some other variable.
 L-value: The location of the memory(address) where the expression is stored is known as
the l-value of that expression. It always appears on the left side if the assignment operator.
 [Link] Parameter: Variables that take the information passed by the caller procedure are
called formal parameters. These variables are declared in the definition of the called
function. [Link] Parameter: Variables whose values and functions are passed to the
called function are called actual parameters. These variables are specified in the function call
as arguments.
Different ways of passing the parameters to the procedure:
 Call by Value: In call by value the calling procedure passes the r-value of the actual
parameters and the compiler puts that into called procedure's activation record. Formal
parameters hold the values passed by the calling procedure, thus any changes made in the
formal parameters do not affect the actual parameters.
 Call by Reference In call by reference the formal and actual parameters refers to same
memory location. The l-value of actual parameters is copied to the activation record of the
called function. Thus the called function has the address of the actual parameters. If the
actual parameters does not have a l-value (eg- i+3) then it is evaluated in a new temporary
location and the address of the location is passed. Any changes made in the formal parameter
is reflected in the actual parameters (because changes are made at the address).
 Call by Copy Restore In call by copy restore compiler copies the value in formal
parameters when the procedure is called and copy them back in actual parameters when
control returns to the called function. The r-values are passed and on return r-value of
formals are copied into l-value of actuals.

#include <iostream>
using namespace std;

void swap(int& a, int& b)


{
/* A hybrid between call-by-value and call-by-reference
is copy-restore linkage (also known as copy in and
copy out ,or value-result) */

int copy_a, copy_b;


copy_a = a; // copy in phase
copy_b = b;

int temp = copy_a; // function proper


copy_a = copy_b;
copy_b = temp;

a = copy_a; // copy out phase


b = copy_b;
}

int main()
{
int a = 10, b = 20;
swap(a, b);
cout << a << " " << b << endl;
return 0;
}
// code added by raunakraj232

 Call by Name In call by name the actual parameters are substituted for formals in all the
places formals occur in the procedure. It is also referred as lazy evaluation because
evaluation is done on parameters only when needed.
Advantages:
Portability: A runtime environment can provide a layer of abstraction between the compiled
code and the operating system, making it easier to port the program to different platforms.
Resource management: A runtime environment can manage system resources, such as memory
and CPU time, making it easier to avoid memory leaks and other resource-related issues.
Dynamic memory allocation: A runtime environment can provide dynamic memory allocation,
allowing memory to be allocated and freed as needed during program execution.
Garbage collection: A runtime environment can perform garbage collection, automatically
freeing memory that is no longer being used by the program.
Exception handling: A runtime environment can provide exception handling, allowing the
program to gracefully handle errors and prevent crashes.
Disadvantages:
Performance overhead: A runtime environment can add performance overhead, as it requires
additional processing and memory usage.
Platform dependency: Some runtime environments may be specific to certain platforms,
making it difficult to port programs to other platforms.
Debugging: Debugging can be more difficult in a runtime environment, as the additional layer
of abstraction can make it harder to trace program execution.
Compatibility issues: Some runtime environments may not be compatible with certain operating
systems or hardware architectures, which can limit their usefulness.
Versioning: Different versions of a runtime environment may have different features or APIs,
which can lead to versioning issues when running programs compiled with different versions of
the same runtime environment.

Storage organization in compiler design involves several key concepts:


 Storage Allocation Strategies: These strategies manage memory effectively during program
execution, determining how memory is allocated and freed for variables and data structures.

Storage Allocation Strategies are important in compiler design because they help manage
memory effectively for program execution. These techniques govern how a compiler allocates
and frees memory for variables and data structures throughout the compilation process. One
typical solution is static allocation, which allocates memory at build time, giving stability but
restricting flexibility. Another option, Dynamic Allocation, enables runtime memory allocation,
which improves flexibility but adds overhead. The decision between these tactics influences
program performance and resource allocation. Achieving a balance between static and dynamic
allocation in compiler design is essential for optimizing memory utilization and guaranteeing
smooth and efficient program execution.

Storage Allocation Strategies in Compiler Design

Storage Allocation Strategies are important in compiler design because they enable efficient
program execution. These techniques control how memory is allocated to variables and data
structures throughout the compilation process, resulting in optimal resource utilization.

Static Storage Allocation is a fundamental strategy. Memory is allocated at build time, ensuring
simplicity and predictability. However, this strategy lacks flexibility since it does not respond
well to changing program requirements. In contrast, Dynamic Storage Allocation is a more
adaptive technique that assigns memory during runtime. While dynamic allocation
accommodates changing memory requirements, it presents the difficulty of effective deallocation
to prevent memory leaks.

Heap and stack are important components in storage allocation. The Stack, a Last-In-First-
Out (LIFO) structure, is primarily used to store local variables and make function calls. Its
automated and organized nature facilitates management but may limit adaptability. Meanwhile,
the Heap, a dynamic storage pool, can handle variables with unpredictable lifetimes. Although
adaptable, manual memory management is essential here, necessitating a cautious developer to
avoid memory-related errors.

Let us look at the various storage allocation strategies in compiler design in detail.

Static Allocation

Static allocation assigns RAM to programme variables before runtime and keeps them constant
during execution. This implies that the compiler decides the storage requirements during
compilation, resulting in a predictable and easy technique.

Example:

#include <stdio.h>

int main() {
static int myStaticVar = 10;
int myLocalVar = 20;

// Rest of the code...

return 0;
}

Here, myStaticVar is statically allocated, as its space is determined at compile-time.

Let us now look at the advantages of Static Allocation.

 Predictable Performance:
With memory allocated beforehand, there's no runtime overhead for memory
management. This enhances the predictability of program behavior.
 Efficient Access:
Direct addressing simplifies variable access, leading to faster execution times compared
to dynamic allocation.

Let us now look at the disadvantages of Static Allocation.

 Limited Flexibility:
The fixed allocation poses challenges for programs with dynamic memory requirements.
It may lead to inefficient use of memory if space is reserved but not fully utilized.
 No Adaptability:
The inability to adjust memory during runtime hinders the execution of programs with
evolving memory needs.

Heap Allocation

Heap Allocation, often known as dynamic memory allocation, is a method that enables a
program to request memory during runtime. Unlike static memory allocation, which is specified
at build time, heap allocation is flexible, allowing the program to adapt to changing memory
requirements while running.

Example:

#include <stdlib.h>

int main() {
int *dynamicArray = (int*)malloc(5 * sizeof(int));
// Use dynamicArray as needed
free(dynamicArray); // Release allocated memory
return 0;
}

In this snippet, malloc reserves the memory on the heap for an array of integers. The allocated
memory is then freed using free when it's no longer needed.

Let us now look at its advantages.

Heap Allocation provides programs with dynamic memory management, resulting in benefits
such as flexibility and efficient memory utilization. It supports varying data volumes and
promotes more durable, scalable applications.

Let us now look at its disadvantages.

However, tremendous authority comes with great responsibility. Mismanagement of heap


memory can result in memory leaks or fragmentation, reducing speed and reliability.
Additionally, dynamic allocation has a lower runtime overhead than static allocation.

Stack Allocation

Consider a real-life example where you're at a bakery, with trays stacked high. The first tray you
take is the one at the top. Similarly, with stack allocation, the last function called receives the
memory at the top of the stack. It takes a Last In, First Out (LIFO) method. The program
maintains track of the current function's state, neatly organizing variables and regulating the
execution flow.

Example:

void exampleFunction() {
int localVar = 42;
// other code
}

After exampleFunction is invoked, the variable localVar is allocated stack space, which is
automatically recovered after the function terminates.

Let us look at the advantages of Stack Allocation:

 Speedy Access:
As stack memory is quite well-structured, accessing variables is fast. No need to search;
it's right on top.
 Automatic Cleanup:
Manual memory management is not required. The stack takes care of it. Variables leave
when their function is done.
 Simple Implementation:
Implementing a stack is straightforward, making it a practical choice for many compilers.

Let us look at the disadvantages of Stack Allocation.

 Limited Size:
The stack size is finite, and if exceeded, it leads to a stack overflow. Recursive functions
or deep nesting can cause issues.
 Static Structure:
Unlike the heap, the stack's structure is fixed, making dynamic memory management a
no-go.

Choosing the Best According to Our Use

Static Allocation:

 When: Memory requirements are known at compile time.


 Pros: Simple, efficient.
 Cons: Limited flexibility, can't handle dynamic memory needs.

Stack Allocation:
 When: Dealing with local variables, and function calls.
 Pros: Automatic management, fast access.
 Cons: Limited size, and scope; potential for stack overflow.

Heap Allocation:

 When: Dynamic memory needs, unknown memory requirements.


 Pros: Dynamic, flexible.
 Cons: Slower, manual management, the potential for leaks.

Register Allocation:

 When: Performance-critical sections, loops, computations.


 Pros: Fastest access, optimization.
 Cons: Limited registers, the potential for spills.

Guidelines:=

 Compile-Time Knowledge:
Use static allocation when memory needs are known in advance.
 Dynamic Requirements:
Opt for heap allocation for dynamic memory needs.
 Function Scope:
Stick to stack allocation for local variables and function calls.
 Performance Optimization:
Employ register allocation for critical performance sections.

 Types of Storage Allocation: There are mainly three types: Static Storage Allocation, where
memory is allocated at compile time; Dynamic Storage Allocation, where memory is allocated
at runtime; and Automatic Storage Allocation, which is managed by the compiler.

Storage Allocation Strategies in Compiler Design


A compiler is a program that converts HLL(High-Level Language) to LLL(Low-Level
Language) like machine language. It is also responsible for organizing the memory layout of a
program by allocating space for global and local variables, constants, and dynamic data
structures.
 As the program begins execution, it is under the control of the operating system, which sets
up the environment by allocating space for the whole process.
 Compiler is responsible for deciding which variable gets which part of the memory.
 For example, the compiler needs to decide whether variables are best placed in the stack (for
local variables) or in the heap (for dynamically allocated memory) or data segment (global
and stack variables)
Below is an example of a C code to demonstrate how a C compiler typically decided memory
allocation.
There are mainly three types of Storage Allocation Strategies which compiler uses:
Static Storage Allocation
Static storage allocation is the simplest form of memory allocation. In this strategy, the memory
for variables is allocated at compile time. The compiler determines the memory addresses for all
variables before the program starts running, and these addresses remain the same throughout the
program's execution.
For example, in languages like C or C++, when you declare global variables or static variables,
the memory for these variables is allocated using static storage.
int number = 1; // Global
static int digit = 1;
Advantages of Static Storage Allocation
1. The allocation process is straightforward and easy to understand.
2. The memory is allocated once only at compile time and remains the same throughout the
program completion.
3. Memory allocation is done before the program starts taking memory only on compile time.
Disadvantages of Static Storage Allocation
1. Not highly scalable and offers limited memory flexibility.
2. The size of the data must be known at the compile time.
3. Static storage allocation is not very efficient and there is potential for memory wastage.
Stack Storage Allocation
Stack storage allocation is used for memory allocation of local variables of a function (including
main), where memory is allocated at when the function is called. In this case, memory is
assigned in a Last In, First Out (LIFO) manner, meaning that memory is allocated when a
function is called and deallocated when the function returns.
In C/C++, the local variables of a function are stored in the stack. Each time a function is called,
a new activation record (stack frame) is pushed onto the stack to store the local variables. Once
the function completes, the stack frame is popped off, and the memory used is released.
// when we call the sum function below, memory
// will be allotted for the variables, a, b and ans

void sum(int a, int b){int ans = a+b; cout<<ans;}


Advantages of Stack Storage Allocation
1. Stack allocation provides very fast memory access and management.
2. Memory is automatically managed and deallocated when a function completes.
3. Memory allocation and deallocation is simple (compared to heap) as completely controlled
by compiler.
Disadvantages of Stack Storage Allocation
1. Stack has a limited size which can lead to potential stack overflow errors.
2. It cannot handle dynamic or large memory requirements effectively.
3. The fixed memory size restricts flexibility as programmers have no control about allocation
and deallocation of memory.
Heap Storage Allocation
Heap storage allocation is another form of dynamic memory allocation, but unlike the stack, it
allows memory to be allocated at runtime for variables that can grow or shrink during execution.
The heap is used for memory that is not automatically deallocated when a function call ends.
This type of memory allocation is used when you need to store objects or data that are not tied to
a specific function scope.
In C/C++, functions like malloc() or new are used to allocate memory in the heap, and functions
like free() or delete are used to deallocate the memory.
int* ans = new int[5];
Advantages of Heap Storage Allocation
1. Heap allocation is useful when we have data whose size is not fixed and can change during
the run time.
2. We can retain the values of variables even if the activation records end.
3. Heap allocation is the most flexible allocation scheme as programmers can decide when to
allocate and when to deallocate.
Disadvantages of Heap Storage Allocation
1. Heap allocation is slower as compared to stack allocation.
2. There is a chance of memory leaks (in languages where automatic garbage collection does
not happen) if programmer forgets to deallocate.
3. It can create dangling pointers / references if not handled carefully.
Comparison of Storage Allocation Strategies

Memory
Allocation Memory
Strategy Time Management EfficiencyFlexibility Usage

Static Fixed, no Global and


Compile time High Low
Allocation deallocation static variables

Local
Stack
Runtime Automatic (LIFO) Fast Limited variables,
Allocation
function calls

Heap Runtime Manual (allocation Slower High Large or


Memory
Allocation Memory
Strategy Time Management EfficiencyFlexibility Usage

dynamic data
Allocation and deallocation)
structures

 Runtime Storage Organization: This refers to the structure of the target computer's registers
and memory that manages memory and tracks information required to execute a program.

Introduction
The runtime environment is the structure of the target computer's registers and memory that
manages memory and keeps track of information required to execute a program. The runtime
support package is responsible for data object allocation and de allocation.
The target program executes in its own logical address space, including a location for each
program value. The management and organization of this logical address space are shared
between the compiler, operating system and target machine. The operating system converts this
logical address to a physical address, which is typically distributed throughout the memory.
Runtime Memory
The runtime representation in the logical address space includes data and program areas.
The runtime storage is assumed to be in blocks of contiguous bytes, where a byte is the smallest
unit of addressable memory. A byte is eight bits long, while a machine word can be made from
four bytes. The address of the first byte is used to store multi byte objects in consecutive bytes.
Static Versus Dynamic Storage Allocation
Storage management is concerned with the layout and allocation of data to memory locations in
a runtime environment. Since the same name in a program text might refer to several locations at
run time, these difficulties are tricky.
Compile-time and runtime are distinguished by the terms static and dynamic, respectively. A
storage-allocation decision is said to be static if it can be decided by the compiler based just on
the program's text, rather than what the program does as it executes. On the other hand, a
decision is dynamic if it can only be made while the program is running.
Static Storage
 Fully static runtime environments may be helpful for languages that do not support
pointers, dynamic allocation, or recursive function calls.
 Names are connected to storage locations in static allocation. If memory is generated at
compile-time, it will be created only once in the static area.
 The dynamic data structure is supported by static allocation, which means memory is
allocated only at build time and released after the application is finished.
For dynamic storage allocation, several compilers employ a combination of the two strategies
listed below:
Stack Storage
 Storage is organized as a stack in static storage allocation.
 When activation begins, an activation record is pushed into the stack, and when it ends, it is
popped.
 It runs on the last-in-first-out (LIFO) principle, and this allocation supports
the recursion process.
Heap Storage
 The most flexible allocation strategy is Heap Allocation. Memory allocation and de
allocation can occur at any time and in any location, depending on the user's needs.
 Heap allocation is a technique for dynamically allocating memory to variables and claiming
it back when the variables are no longer used.
 It supports the recursion process.

 Data Object Layout: The storage layout for data objects is influenced by addressing constraints,
ensuring efficient memory usage.

STORAGE ORGANIZATION

The executing target program runs in its own logical address space in which each
program value has a location. The management and organization of this logical address space is
shared between the complier, operating system and target machine. The operating system maps
the logical address into physical addresses, which are usually spread throughout memory.

Fig. 2.9 Typical subdivision of run-time memory into code and data areas

form a machine word. Multibyte objects are bytes and given the address of first byte.Run-
time storage comes in blocks, where a byte is the smallest unit of memory. Four bytes

The storage layout for data objects is strongly influenced by the addressing constraints of the
target machine.

A character array of length 10 needs only enough bytes to hold 10 characters, a compiler may
allocate 12 bytes to get alignment, leaving 2 bytes unused.
This unused space due to alignment considerations is referred to as padding.

The size of some program objects may be known at run time and may be placed in an area
called static.
The dynamic areas used to maximize the utilization of space at run time are stack and heap.

Activation records:

Procedure calls and returns are usually managed by a run time stack called the control
stack. Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom, the latter activation has its record at the top of the stack. The
contents of the activation record vary with the language being implemented.
Temporary values such as those arising from the evaluation of expressions.
Local data belonging to the procedure whose activation record this is.

A saved machine status, with information about the state of the machine just before the call
to procedures.

An access link may be needed to locate data needed by the called procedure but found
elsewhere.
A control link pointing to the activation record of the caller.

Space for the return value of the called functions, if any. Again, n return a value, and if one
does, we may prefer to place that value i efficiency.
The actual parameters used by the calling procedure. These are not placed in record but
rather in registers, when possible, for greater efficiency.

 Overall Storage Organization: This includes various types of storage such as source code
storage, symbol tables, and memory allocation strategies.
Introduction to Compiler Design and Storage Organization

Compiler design is a crucial aspect of computer science that involves the creation of software
programs known as compilers. These compilers are responsible for translating high-level
programming languages into machine code that can be executed by a computer. One important
aspect of compiler design is storage organization, which deals with how data is stored and
accessed during the compilation process.

Types of Storage in Compiler Design

1. Source Code Storage

Source code is the human-readable representation of a program written in a high-level


programming language. The source code is stored in files with specific extensions like .c, .cpp,
or .java. During the compilation process, the compiler reads the source code from these files and
performs various operations to generate the corresponding machine code.

For example, consider a C program stored in a file named “hello.c”. The compiler reads the
contents of this file and analyzes the code to identify variables, functions, and other program
elements.

2. Symbol Table
A symbol table is a data structure used by compilers to store information about variables,
functions, and other program symbols. It acts as a dictionary that maps symbols to their
attributes, such as data type, memory location, and scope.

For example, let’s say we have a C program that declares a variable named “count”. The
compiler stores information about this variable in the symbol table, including its data type (e.g.,
integer), memory location, and scope (e.g., global or local).

3. Memory Allocation

During the compilation process, the compiler needs to allocate memory for variables, arrays, and
other data structures used in the program. Memory allocation can be static or dynamic.

Static memory allocation refers to the allocation of memory at compile-time. In this case, the
compiler determines the memory requirements of variables and allocates the necessary memory
in advance. This memory remains allocated throughout the execution of the program.

For example, consider a C program that declares an array with a fixed size. The compiler
allocates memory for this array based on its size and ensures that the allocated memory remains
reserved for the array during program execution.

Dynamic memory allocation, on the other hand, refers to the allocation of memory at runtime. In
this case, the program requests memory from the operating system as needed. Dynamic memory
allocation is typically used for data structures whose size is not known in advance or may change
during program execution.

For example, consider a C program that uses the “malloc” function to dynamically allocate
memory for a linked list. The compiler generates code to request memory from the operating
system when a new node is added to the linked list.

Examples of Storage Organization in Compiler Design

1. Lexical Analysis

Lexical analysis is the first phase of the compilation process, where the source code is divided
into tokens. Tokens are the smallest meaningful units of a programming language, such as
keywords, identifiers, operators, and literals.
During lexical analysis, the compiler uses storage organization techniques to store and
manipulate tokens. For example, the compiler may use a symbol table to store information about
identifiers encountered in the source code.

Let’s consider an example:

#include <stdio.h>int main() {int num1 = 10;int num2 = 20;int sum = num1 + num2;printf("The
sum is %dn", sum);return 0;}

In this example, the compiler performs lexical analysis to identify tokens like “int”, “main”, “=”,
“+”, “printf”, etc. It stores information about variables like “num1”, “num2”, and “sum” in the
symbol table.

2. Intermediate Code Generation

After lexical analysis, the compiler proceeds to the intermediate code generation phase. In this
phase, the compiler translates the source code into an intermediate representation that is easier to
analyze and optimize.

Storage organization plays a role in storing and manipulating the intermediate code. The
compiler may use data structures like stacks, queues, or arrays to store intermediate results and
perform operations on them.

Let’s consider an example:

#include <stdio.h>int main() {int num1 = 10;int num2 = 20;int sum = num1 + num2;if (sum >
30) {printf("The sum is greater than 30n");} else {printf("The sum is less than or equal to
30n");}return 0;}

In this example, the compiler generates intermediate code to evaluate the condition “sum > 30”.
It may use a stack to store intermediate results and perform the comparison operation. The
compiler also generates intermediate code for the “printf” statements based on the condition
result.

3. Code Optimization
Code optimization is an important phase of compiler design that aims to improve the efficiency
and performance of the generated machine code. Storage organization techniques play a crucial
role in code optimization.

For example, the compiler may analyze the usage of variables and optimize their storage to
minimize memory access operations. It may perform register allocation to store frequently used
variables in CPU registers, which can result in faster execution.

Let’s consider an example:

#include <stdio.h>int main() {int num1 = 10;int num2 = 20;int sum = num1 + num2;int product
= num1 * num2;printf("The sum is %dn", sum);printf("The product is %dn", product);return 0;}

In this example, the compiler may optimize the storage of variables “num1”, “num2”, “sum”,
and “product” based on their usage. It may allocate registers to store these variables instead of
using memory locations, resulting in faster arithmetic operations.

Error Detection and Recovery in Compiler


Error detection and recovery are essential functions of a compiler to ensure that a program is
correctly processed. Error detection refers to identifying mistakes in the source code, such as
syntax, semantic, or logical errors. When an error is found, the compiler generates an error
message to help the programmer fix it.
Error recovery allows the compiler to handle errors gracefully without abruptly stopping the
compilation process. It ensures that minor errors do not prevent the compiler from analyzing the
rest of the program. Common recovery techniques include skipping incorrect parts, suggesting
corrections, and continuing compilation.
Effective error handling improves the debugging process, enhances code reliability, and helps
developers write error-free programs efficiently.
Classification of Errors
Compile-time errors
Compile-time errors are of three types:-
1. Lexical phase errors
These errors are detected during the lexical analysis phase. Typical lexical errors are:
 Exceeding length of identifier or numeric constants.
 The appearance of illegal characters
 Unmatched string
Example 1 : printf("Geeksforgeeks");$
This is a lexical error since an illegal character $ appears at the end of statement.
Example 2 : This is a comment */
This is an lexical error since end of comment is present but beginning is not present
Error recovery for lexical phase errors
Panic Mode Recovery
 In this method, successive characters from the input are removed one at a time until a
designated set of synchronizing tokens is found. Synchronizing tokens are delimiters such
as; or }
 The advantage is that it is easy to implement and guarantees not to go into an infinite loop.
 The disadvantage is that a considerable amount of input is skipped without checking it for
additional errors.
2. Syntactic phase errors
These errors are detected during the syntax analysis phase. Typical syntax errors are:
 Errors in structure
 Missing operator
 Misspelled keywords
 Unbalanced parenthesis
Example : swich(ch)
{
.......
.......
}

The keyword switch is incorrectly written as a swich. Hence, an "Unidentified


keyword/identifier" error occurs.

Error recovery for syntactic phase error:


Panic Mode Recovery
 In this method, successive characters from the input are removed one at a time until a
designated set of synchronizing tokens is found. Synchronizing tokens are deli-meters such
as; or }
 The advantage is that it's easy to implement and guarantees not to go into an infinite loop.
 The disadvantage is that a considerable amount of input is skipped without checking it for
additional errors.
Statement Mode recovery
 In this method, when a parser encounters an error, it performs the necessary correction on
the remaining input so that the rest of the input statement allows the parser to parse ahead.
 The correction can be deletion of extra semicolons, replacing the comma with semicolons, or
inserting a missing semicolon.
 While performing correction, utmost care should be taken for not going in an infinite loop.
 A disadvantage is that it finds it difficult to handle situations where the actual error occurred
before pointing of detection.
Error production
 If a user has knowledge of common errors that can be encountered then, these errors can be
incorporated by augmenting the grammar with error productions that generate erroneous
constructs.
 If this is used then, during parsing appropriate error messages can be generated and parsing
can be continued.
 The disadvantage is that it's difficult to maintain.
Global Correction
 The parser examines the whole program and tries to find out the closest match for it which is
error-free.
 The closest match program has less number of insertions, deletions, and changes of tokens to
recover from erroneous input.
 Due to high time and space complexity, this method is not implemented practically.
3. Semantic errors
These errors are detected during the semantic analysis phase. Typical semantic errors are :
 Incompatible type of operands
 Undeclared variables
 Not matching of actual arguments with a formal one
Example : int a[10], b;
.......
.......
a = b;
It generates a semantic error because of an incompatible type of a and b.

Error recovery for Semantic errors


 If the error "Undeclared Identifier" is encountered then, to recover from this a symbol
table entry for the corresponding identifier is made.
 If data types of two operands are incompatible then, automatic type conversion is done by
the compiler.
Advantages of Error Detection and Recovery in Compiler
1. Better Code Quality – Detects and fixes errors early, preventing bigger issues.
2. Increased Productivity – Allows compilation to continue after errors, saving time.
3. Improved Debugging – Provides clear error messages, helping developers fix bugs quickly.
4. Consistent Error Handling – Ensures uniform handling of errors for better reliability.
5. Lower Maintenance Costs – Fixing errors early reduces time and effort in later stages.
6. Better Software Performance – Helps identify inefficient code that may affect
performance.
7. Enhanced User Experience – Well-handled errors make applications more stable and user-
friendly.
Disadvantages of Error Detection and Recovery in Compiler
1. Slower Compilation – Error handling adds extra processing, increasing compile time.
2. Increased Complexity – Makes the compiler harder to develop and maintain.
3. Silent Errors – Some errors may be masked, leading to unnoticed issues.
4. Incorrect Recovery – Poor error handling may introduce new bugs.
5. Over-Reliance on Recovery – Developers may neglect proper debugging.
6. Difficult Error Diagnosis – Recovery mechanisms can make it harder to pinpoint the root
cause.
7. Compatibility Issues – Some error recovery methods may not work across all platforms.

You might also like