0% found this document useful (0 votes)
64 views32 pages

Compiler Structure and Optimization Overview

The document summarizes the major phases of a compiler: 1. Lexical analysis converts source code characters into tokens. 2. Syntax analysis parses tokens to create an abstract syntax tree. 3. Semantic analysis checks for errors and collects type information. 4. Intermediate code generation outputs an intermediate representation. 5. Code optimization improves efficiency before 6. Code generation outputs target code. Symbol tables are managed throughout.

Uploaded by

Vivek Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views32 pages

Compiler Structure and Optimization Overview

The document summarizes the major phases of a compiler: 1. Lexical analysis converts source code characters into tokens. 2. Syntax analysis parses tokens to create an abstract syntax tree. 3. Semantic analysis checks for errors and collects type information. 4. Intermediate code generation outputs an intermediate representation. 5. Code optimization improves efficiency before 6. Code generation outputs target code. Symbol tables are managed throughout.

Uploaded by

Vivek Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Introduction

From: Chapter 1, The Dragon Book, 2nd ed.


1. The Structure of a Compiler

Source program
Source program
Analysis
Compiler

Synthesis
Target program
Target program

Introduction 2
1.2 The Structure of a Compiler

Phases of a compiler

Introduction 3
1.2.1 Lexical Analysis

• The lexical analyzer reads the stream of characters making up the source
program and groups the characters into meaningful sequences called
lexemes.
• For each lexeme, the lexical analyzer produces as output a token of the form
token-name, attribute-value.

Introduction 4
Translation of an assignment
statement

Introduction 5
1.2.2 Syntax Analysis

• The second phase of the compiler is syntax analysis or parsing.


• The parser uses the first components of the tokens produced by the lexical
analyzer to create a tree-like intermediate representation that depicts the
grammatical structure of the token stream.

Introduction 6
1.2.3 Semantic Analysis

• The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency with
the language definition.
• It also gathers type information and saves it in either the syntax tree or the
symbol table, for subsequent use during intermediate-code generation.
• Coercions

Introduction 7
1.2.4 Intermediate Code Generation

• In the process of translating a source program into a target code, a compiler


may construct one or more intermediate representations (IRs), having a var
iety of forms.
– Syntax trees: commonly used during syntax and semantic analysis
– After syntax and semantic analysis, compilers generate an explicit lower-level
or machine-like IR, a a program for an abstract machine.
– Three-address code

Introduction 8
1.2.5 Code Optimization

• The machine-independent code optimization phase attempts to improve the


intermediate code so that better target code will result.

Introduction 9
1.2.6 Code Generation

• The code generator takes as input an intermediate representation of the


source program and maps it into the target language.
• If the target language is machine code, registers or memory locations are
selected for each of the variables used by the program.
• Then the intermediate instructions are translated into sequence of machine
instructions that perform the same task.

Introduction 10
1.2.7 Symbol-Table Management

• An essential function of a compiler is to record the variable names used in


the source program and collect information about various attributes of each
name.
• These attributes may provide information about the storage allocated for a
name, its type, its scope, and in the case of procedure names, such things as
the number and types of its arguments, the method of passing each
argument, and the type returned.
• The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name.

Introduction 11
1.2.8 The Grouping of Phases into Passes

• Logical organization of a compiler


• Activities from several phases may be grouped together into a pass that
reads an input file and writes an output file.
• For example,
– The front-end phases of lexical analysis, syntactic analysis, semantic
analysis, and intermediate code generation might be grouped into one
pass.
– Code optimization might be an optional pass.
– There could be a back-end pass consisting of code generation for a
particular machine.

Introduction 12
Code Optimization

• Code optimization aims at improving the execution efficiency of a


program. This is achieved in two ways:
1. Redundancies in a program are eliminated.
2. Computation in a program are rearranged or rewritten to make
it execute efficiently.
• Code optimization must not change the meaning of a program.
• The front end generates an IR which could consist of triples,
quadruples. The optimization phase transforms this to achieve
optimization. The transformed IR is input to the back end.

Source Target
Front end Optimization phase Back end
prog prog

Intermediate representation (IR)


Introduction 13
Code optimization

• Optimization Transformation is a rule for rewriting a segment of a


program to improve its execution efficiency without affecting its
meaning.
• Optimizing transforms are classified as:-
• Local optimization : The optimization transformations are applied
over small segments of a program consisting of a few statements.
• Global Optimization : The optimization transformations are applied
over a program unit, i.e. over a function or procedure.

Introduction 14
Compile time evaluation

• Execution efficiency can be improved by performing certain actions


specified in a program during compilation itself. This eliminates the
need to perform them during execution of the program, thereby
reducing the execution time of the program.
• Constant folding is the main optimization of this kind. When all the
operands in an operation are constants, the operation can be
performed at compilation time. The result of the operation, also a
constant, can be replaced the original evaluation in the program. For
ex.- a:=3.14157/2 can be replaced by a:=1.570785, thereby eliminating
a division operation.

Introduction 15
Elimination of common subexpression
• Common subexpressions are occurrences of expression yielding the same
value. Such expressions are called equivalent expressions. Let CSi designate a
set of common subexpressions. It is possible to eliminate an occurrence ej
belong to CSi if, no matter how the evaluation of ej is reached during the
execution of the program, the value of some ek belongs to CSi would have been
already computed. Provision is made to save this value and use it at the place
of occurrence of ej.
• For Example
t := b*c
a := b*c; a := t;
--------- --------
x := b*c + 5.2 ; x := t + 5.2

• Hence CSi contains the 2 occurrences of b*c. the second occurrence of b*c can
be eliminated because the first occurrence of b*c is always evaluated before
the second occurrence is reached during execution of the program. The value
computed at the first occurrence is saved in t. this value is used in the
assignment to x.

Introduction 16
Dead Code Elimination

• Code which can be omitted from a program without affecting its result
is called dead code. Dead code is detected by checking whether the
value assigned in an assignment statement is used anywhere in a
program.
• For Example:
An assignment
x := <exp>
constitutes dead code if the value assigned to x is not used in the
program, no matter how control flows after executing this assignment.

Introduction 17
Frequency Reduction
• Execution time of a program can be reduced by moving code from a part of a
program which is executed very frequently to another part of the program
which is executed fewer times..
• For example:
The transformation of loop optimization moves loop invariant code out of a
loop and places it prior to loop entry.

x: = 25*a;
for i := 1 to 100 do for i := 1 to 100 do
begin begin
z := i ; z := i ;
x := 25 *a; y := x+z;
y: = x+z; end;
end;
• Here x:= 25*a is loop invariant. Hence in the optimized program it is
computed only once before entering the for loop. Y:= x+z is not loop invariant.
Hence it cannot be subjected to frequently reduction.

Introduction 18
Strength Reduction
• The strength reduction optimization replaces the occurrence of a time
consuming operation ( a high strength operation ) by an occurrence of a faster
operation ( a low strength operation ) e.g replacement of a multiplication by an
addition.
• For example:
Itemp: =5;
For i: = 1 to 10 do for i := 1 to 10 do;
begin begin
------ ------
K := i*5; k := itemp;
---- -----
end; Itemp := itemp+5;
end;
• Strength reduction is very important for array accesses occurring within
program loops. Note that strength reduction optimization is not performed on
operations involving floating point operands because finite precision of
floating point arithmetic cannot guarantee equivalence of results after strength
reduction.

Introduction 19
Debugging
• It is process of removing errors from programs. Debuggers are the computer programs
responsible for localization (tracking the errors) and removal of errors of given program.
• Debuggers also provide some debug information while debugging.
• This information is produced by two ways:
• 1. Statically 2. Dynamically

Statically produced debug information:- It is produced by analyzing the source program. It


takes the form of
• Cross reference listing
• List of undefined symbols/variables
• Unreachable statements
• All this is useful in determining the cause of program malfunction.

Dynamically produced debug information:-It is produced during program execution. It


takes the form of
• value dumps
• Execution traces produced during execution.
• This information helps to determine the execution path followed during execution and
sequence of values assumed by the variables during execution.

Introduction 20
Facilities provided by the debugger

• Setting breakpoints in the program.


• Initializing a debug conversation when control reaches to a break point.
• Displaying values of variables
• Assigning new values to variables
• Testing user defined conditions implied and some rules to follow involving
program variables.

Introduction 21
Functions of debugger

• Unit Set Function:- Set of unit test functions deals with execution
sequence which controls flow of the program.
• Tracing and trace Back:- Tracing can be used to track the flow of
execution logic and data modifications. The controlled flow can be traced
at different levels of details like procedure, branch, individual instruction
and so on.
• The tracing can also be based on condition expressions. E.g. It tracebook
can show the path by which the current statement can reach. It can also
show which statements have modified a given variable or parameter.
• Program Display Capabilities:- It makes possible to display the program
being debugged complete with statement number. The user is able to
control the level at which display occurs. The system saves all debugging
specification such as break point-Definition, display modes etc.

Introduction 22
Procedure for Debugging of a program
• The user compiles his program under the debug option. The compiler
produces 2 files:-
• Compiled code file
• Debug information file
• The user activates the debug monitor and indicates the name of the program
to be debugged. The debug monitor opens the compiled code file and debug
information file for the program.
• The user specifies his debug requirements i.e. a list of breakpoints and actions
to be performed at a breakpoint. The debug monitor instruments the program
and builds a debug table containing the pairs ( statement number, debug
action).
• The instrumented program gets control and executes up to a breakpoint.
• A software interrupt is generated and control goes to debug monitor which
consults debug table and perform the debug actions specified for the current
statement. A debug conversation is now opened during which the user may
issue some debug commands or modify breakpoints and debug actions
associated with breakpoints. Control now returns to instrumented program..
• Step 4 and Step 5 repeated until the end of debug sessions.

Introduction 23
printf() debugging

• printf debugging is our term for a debugging technique we encounter all too
often. It consists of ad hoc addition of lots of printf (C) or cerr or cout (C+
+) statements to track the control flow and data values in the execution of a
piece of code.

Disadvantages:
• It is very ad hoc. Code is temporarily added, to be removed as soon as the
bug at hand is solved; for the next bug, similar code is added etc. There are
better ways of adding debugging information, as you shall see shortly.
• It clobbers the normal output of your program, and slows it down
considerably.
• Often, it does not help. Output to stdout is buffered, and the contents of the
buffer are usually lost in case of a crash. Thus, you'll most likely miss the
most important information, causing you to start looking in all the wrong
places.
Introduction 24
There are some circumstances where printf debugging is appropriate. If
you want to use it, here are some tips:
• Produce your output on stderr. Unlike stdout, stderr is unbuffered. Thus
using stderr, you're much less likely to miss the last information before a
crash.
• Do not use printf directly, but define a macro around it, so that you can
switch your debugging code on and off easily.
• Use a debugging level, to manage the amount of debugging information.

Introduction 25
ANWB debugging

• `ANWB debugging' is based on a simple principle: the best way to learn


things is to teach them.
• In `ANWB debugging' you find a, preferably innocent and willing,
bystander and explain to her how your code works . This forces you to
rethink your assumptions, and explain what is really happening; often you
find the cause of your problems this way.

Introduction 26
Debugging by Watching

• Sometimes minor problems can be tracked down by watching the behavior


of an application in user space. Watching programs can also help in
building confidence that a driver is working correctly. For example, we
were able to feel confident about scull after looking at how its read
implementation reacted to read requests for different amounts of data.

Introduction 27
Single Step control

• In this technique user can check the contents of register, memory and
variables after executing each step of the program. This helps in finding
the errors location/code of line and eliminating them there itself.
Debugging can also be done procedure wise in order to skip the debugging
of already checked procedures. This method is useful for small size
program codes. It is more time consuming and may not be suitable for
larger programs.

Introduction 28
Breakpoint method

• It allows the user to set a breakpoint in the program to check the contents
of registers, memory and variables. The program execution is halted at
break points for the user to examine intermediate results of program
execution. This method is faster for debugging the loops and larger
programs. This also helps to localize the error and skip the code that is
error free.

Introduction 29
Linking for overlay

• An overlay is a part of a program which has the same load origin as some
other parts of the program.
• Overlays are used to reduce the main memory requirement of a program.

Introduction 30
Overlay structured programs

• We refer to a program containing overlays as an overlay structured


program. Such a program consist of
– A permanently resident portion, called the root,
– A set of overlays.
• Execution of an overlay structured program proceeds as follow:
• To start with, the root is loaded in memory and given control for the
purpose of execution. Other overlays are loaded as and when neede. Note
that the loading of an overlay overwrites a previously loaded overlay with
the same load origin. This reduce the memory requirement of a program. It
also makes it possible to execute programs whose size exceeds the amount
of memory which can be allocated to them.
• The overlay structure of a program is designed by identifying mutually
exclusive modules_ i.e. modules which do not call each other. Such modules
do not need to reside simultaneously in memory. Hence they are located in
different overlays with the same load origin.
Introduction 31
Example

• Consider a program with 6 sections named init, read, trans_a, trans_b,


trans_c and [Link] performs some initializationa and passes control to
read. Read reads one set of data and invokes one of trans_a, trans_b or
trans_c depending on the values of the data. Print is called to print the
results.
• Trans_a, trans_b and trans_c are mutually exclusive. Hence they can be
made into separate overlays. Read and print are put in the root of the
program since they are needed for each set of data. For simplicity, we put
init also in the root.

Introduction 32

You might also like