0% found this document useful (0 votes)
7 views61 pages

Compilers: Principles and Techniques

The document provides an overview of compilers, detailing the principles, techniques, and tools involved in programming languages and their translation processes. It explains the hierarchy of programming languages, the advantages of high-level languages, and the various phases of compilation, including lexical analysis, syntax analysis, and semantic analysis. Additionally, it discusses error detection, the role of translators, and the structure of a language-processing system.

Uploaded by

haithamalrawipc
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views61 pages

Compilers: Principles and Techniques

The document provides an overview of compilers, detailing the principles, techniques, and tools involved in programming languages and their translation processes. It explains the hierarchy of programming languages, the advantages of high-level languages, and the various phases of compilation, including lexical analysis, syntax analysis, and semantic analysis. Additionally, it discusses error detection, the role of translators, and the structure of a language-processing system.

Uploaded by

haithamalrawipc
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lectures in :  



Compilers
Principles & Techniques

By
Dr. Esam T. Yassen
[Link]-Anbar,Computers College
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Introduction
Programming languages :-
Interactions involving humans are most effectively carried out
through the medium of language . language permits the expression of
thoughts and ideas , and without it , communication as we know it
would be very difficult indeed .
In computer programming , programming language serves as
means of communication between the person with a problem and the
computer used to solve it . programming language is a set of symbols ,
words , and rules used to instruct the computer .
A hierarchy of programming languages based on increasing
machine independence include the following :-
1- machine language : is the actual language in which the
computer carries out the instructions of program . otherwise , " it is
the lowest from of computer language , each instruction in
program is represented by numeric cod , and numeric of addresses
are used throughout the program to refer to memory location in the
computer memory .
2- Assembly languages : is a symbolic version of a machine
language ,each operation code is given a symbolic code such a
Add , SUB ,…. Moreover , memory location are given symbolic
name , such as PAY , RATE .
3-high – level language :Is a programming language where the
programming not require knowledge of the actual computing
machine to write a program in the language .H.L.L . offer a more
enriched set of language features such as control structures , nested
statements , block …
4- problem-oriented language : It provides for the expression of
problems in a specific application . Examples of such language
are SQL for Database application and COGO for civil engineering
applications .

With My Best Wishes 1 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Advantages of H.L.L over L.L.L include the following :


1- H.L.L are easier to learn then L.L.L
2- A programmer is not required to know how to convert data
from external from to internal within memory .
3- Most H.L.L offer a programmer a variety of control structures
which are not available in L.L.L
4- Programs written in H.L.L are usually more easily debugged
than L.L.L. equivalents.
5- Most H.L.L offer more powerful data structure than L.L.L.
6- Finally ,High level languages are relatively machine-
independent. Consequently certain programs are portable

Translator: High- Level language programs must be translated


automatically to equivalent machine- language programs .
A translator input and then converts a " source program" into an object
or target program . the source program is written in a source language
and the object program belong to an object language .

Source Translator Object


program program

1- If the source program is written in assembly language and the


target program in machine language .the translator is
called " Assembler "
2- If the source language is H.L.L. and the object language is L.L.L.
,then the translator is called " Compiler " .
3- If the source language is L.L.L. and the object language is H.L.L.,
then the translator is called "Decompiler"

Source Object Executing


Compiler Result
program program computer

compile Time run Time

(compilation Process)

With My Best Wishes 2 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

- The time at which conversion of a source program to an object


program occurs is called " Compile time " .The object program is
executed at " Run time " ,note that the source program and data
are process at different time .

Another kind of translator ,called an " Interpreter " in which


processes an internal form of source program and data at the same
time . that is interpretation of the internal source from occurs at run
time and no object program is generated .

Data

Source Program Interpreter Result

( Interpretive process )

Compiled programs usually run faster than interpreter ones


because the overhead of understanding and translating has already
been done .However ,Interpreters are frequent easier to write than
Compilers , and can more easily support interactive debugging of
program .
Remark :Some programming language implementations support
both interpretation and compilation.

A bad workman quarrels his


tools

With My Best Wishes 3 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Compilation concepts

What is compiler?
A compiler is a program that translates a computer
program(source program) written in H.L.L (such as Pascal,C++)
into an equivalent program (target program) written in L.L.L.

Source program compiler target program

Error messages

Model of Compiler:
The task of constructing a compiler for a particular source
language is complex. The complexity of the compilation process
depend on the source language.A compiler must perform two
major tasks:
1. Analysis :deals with the decomposition of the source program
into its basic parts.
2. Synthesis:builds their equivalent object program using these
basic parts.
To perform these tasks, compiler operates in phases each of
which transforms the source program from one representation to
another. A typical decomposition of a compiler is shown in the
following figure.

With My Best Wishes 4 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools
Source program

Lexical analyzer

Syntax analyzer

Semantic analyzer

Symbol Table Error Handler

Intermediate code
generation

Code optimization

Code generation

Target program
Compiler phases

1. Lexical Analyzer: whose purpose is to separate the incoming


source code into small pieces (tokens) , each representing a
single atomic unit of language, for instance "keywords",
"Constant "," Variable name" and "Operators".
2. Syntax Analyzer : whose purpose is to combine the tokens into
well formed expressions (statements) and program and it check
the syntax error
3. Semantic Analyzer: whose function is to determine the
meaning of the source program.
4. Intermediate Code Generator: at this point an internal form
of a program is usually [Link] example:

(+,a,b,t1)
Y=(a+b)*(c+b) (+,c,d,t2)
(*,t1,t2,t3)

With My Best Wishes 5 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

[Link] Optimizer :Its purpose is to produce a more efficient


object program (Run faster or take less space or both)
[Link] Generator: Finally, the transformed intermediate
representation is translated into the target language.

The grouping of phases : the phases of compiler are collection


into :
1. Front-End :It consists of those phases that depend on the
source language and are largely independent of the target
machine ,those include : (lexical analysis ,syntax analysis ,
semantic analysis, and intermediate code generation )
2. Back-End : Includes those phases of compiler that depend on
the target machine and not depend on the source language .
these include:( code optimization phase and code
generation phase )

The grouping of Compiler Phases

With My Best Wishes 6 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Symbol–Table Management: An essential function of a compiler


is to record the identifiers used in the source program and collect
information about various attributes of each identifier .These
attributes may provide information about the storage allocated for
an identifier , its type and in case of procedure , the number and
types of its arguments and so on .
Symbol-Table is a data structure containing a record for each
identifier , with fields for the attributes of the identifier.
Error Detection and Reporting
Each phase can encounter errors. However ,after defection an
error a phase must somehow deal with that error, so the
compilation can proceed, allowing further errors in the source
program to be detected .A compiler that stops where it finds the
first error is not as helpful as it could be.
The syntax and semantic analysis phases usually handle a large
fraction of the error detectable by the compiler .

• Types of Errors
Lexical errors: The lexical phase can detect errors where the
characters remaining in the input do not form any token of the
language.
Syntax errors: The syntax analysis phase can detect errors
Errors where the token stream violates the structure rules (syntax)
of the language .
Semantic errors: During semantic analysis the compiler tries to
detect constructs that have the right syntactic structure but no
meaning to the operation involved, e.g. to add two identifiers, one
of which is the name of an array, and the other the name of a
procedure .

With My Best Wishes 7 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Where errors show themselves


Compile-time errors
Many errors are detected by the compiler, the compiler will
generate an error message - Most compiler errors have a file name ,
line number, and Type of error. This tells you where the error was
detected .
File name ([Link])

With My Best Wishes 8 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Runtime Errors
Runtime errors occur while the program is running, although the
compilation is successful. The causes of Runtime Errors are [5]:
1) Errors that only become apparent during the course of execution of the
program
2) External Factors – e.g.
Out of memory
Hard disk full
Insufficient i/o privileges
etc.
3) Internal Factors – e.g.
Arithmetic errors
Attempts to read beyond the end of a file
Attempt to open a non-existent file
Attempts to read beyond the end of an array
etc.

With My Best Wishes 9 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

A Language- Processing System :-


In addition to a compiler ,several other programs may be required
to create an executable target program.
Source program

Preprocessor

Compiler
Assembly program

Assembler
Machine program

Loader / Link editor Library

Machine code

(Language- Processing System)

Preprocessing : During this stage , comments ,macros and


directives are processed :
• Comments are removed from the source file.
• Macros :If the language supports macros ,the macros are
replaced with the equivalent text,Example:
# define pi 3.14
When the preprocessor encounter the word(pi) it would replace (pi)
with ( 3.14 )

With My Best Wishes 10 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

• Directives : The preprocessor also handles directives. In 'C'


language , including statement looks like:
# include<"file">
this line is replaced by the actual file.
Loader - Link Editor : Is program that performs two functions:
1. Loading :taking relocatable machine code and placed the
altered instructions and data in memory at the proper
locations.
2. Link-Editing :Allows us to make a single program from
several files . these files may have been result of different
compilers and one or more may be library files.

Actions Speak Louder than Words

With My Best Wishes 11 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Lexical Analyzer
The analysis of source program during compilation is often
complex . The construction of compiler can often be made easier
if the analysis of source program is separated into two parts , with
one part identifying the low – level language constructs , such as
variable names , keyword , labels , and operations , and the
second part determine the syntactic organization of the program .

Lexical Analyzer : the job of the lexical analyzer , or scanner , is


to read the source program ,one character at a time and produce as
output a stream of tokens . the tokens produced by the scanner
serve as input the next phase , parser . Thus , the lexical analyzers
job is the translate the source program into a form more
conductive the recognition by the parser .
Tokens : are used to represent low – level program units such as:-
- Identifiers , such as sum , value , and X .
- Numeric literals , such as 123 and 1.35e02 .
- Operators , such as +,*,&&, < = , and % .
- Keywords , such as if , else and returns.
- Many other language symbols .
There are many ways we could represent the tokens of a
programming language . one possibility is to use a 2- duple of the
form < token – class, value > .
For example :-
- The identifiers sum and value may be represented as :
< ident , “ sum “ >
< ident , “ value” >

- The numeric literals 123 and 1.35E02 may be represented


as :
< numericliteal , “ 123” >
< numericliteral , “ 1.35E02” >
- The operators > = and + may be represented as :

With My Best Wishes 12 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

< relop , “ >= “ >


< addop , “ + “ >
- The scanner may take the expression x = 2+ f(3) , and
produce the following stream of tokens :

< ident , “ x “ > < lparent , “ ( “ >


< assign – op , “ = “ > < numlit , “ 3 “ >
< numlit , “ 2 “ > < rporent, “ ) “ >
< addop , “ + “ > < semicolon , “ ; “ >
< ident ,“ f ”>

Interaction of Scanner with Parser :


Using only parser can become costly in terms of time and
memory requirements .The complexity and time can be reduced
by using a scanner .
The separation of scanner and parser can have other
advantages, scanning characters is typically slow in compilers
and separating it from parsing particular emphasis can be given to
making the process efficient .
Therefore, The scanner usually interacts with the parser in one of
two ways :-
1- The scanner may process the source program in separate
pass before parsing begins . Thus the tokens are stored in
file or large table .
2- The second way involves an interaction between the
parser and scanner , the scanner called by the parser
whenever the next token in the source program is required .
Source Token
program SCANNER PARSER
Get next
Token

SYMBOL
Table

Interaction of Scanner with Parser


With My Best Wishes 13 Esam
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

The latter approach is the preferred method of operation ,


since an internal form of the complete source program dose not
need to be constructed and stored in memory before parsing can
begin .
Note : The lexical analyzer may also perform certain secondary
tasks at the user interface : such task is stripping out from source
program comments and white space in the form of bank , tab and
new line characters.
Lexical Errors : the lexical phase can detect errors where the
characters remaining in the input do not form any token of the
language for example if the string “ fi “ is encountered in ‘ C ‘
program :-
fi ( A = = f(x) ) ...

A lexical analyzer can not tell whether “ fi “ is misspelling of the


keyword “ if “ or an undeclared function identifier since “ fi “ is a
valid identifier , the lexical must return the token for an identifier
and let some other phase of compiler handle any error. The
possible error – recovery actions are :

1. Deleting an extraneous character .


2. Inserting a missing character .
3. Replacing an incorrect character by a correct char .
4. Transposing two adjacent characters .

Finally , the scanner breaks the source program into tokens .


the type of token is usually represented in the form of unique
internal representation number or constant. For example, a
variable name may be represented by 1 ,a constant by 2 , a label
by 3 and so on .
The scanner then returns the internal type of token and some
time the location in the table where the tokens are stored . Not all
tokens may be associated with location , while variable name and
constant are stored in table , operators , for example , may not be .

With My Best Wishes 14 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Example : Suppose that the value of tokens are :


Variable name ____ 1
Constant ______ 2
Label _______ 3
Keyword ______4
Add operator _____ 5
Assignment ____ 6
and the program is :
Sum : A = a+b ;
Goto Done ;
The output is :

Token Internal represent Location


Sum 3 1
: 11 0
A 1 2
= 6 0
A 1 2
+ 5 0
B 1 3
; 12 0
Goto 4 0
Done 3 4
; 12 0

By their Fruit Ye shall know them

With My Best Wishes 15 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Symbol Table ( ST )
A symbol table is a data structure containing a record for
each identifier, with fields for attributes of the [Link]
data structure allows us to find the record for each identifier
quickly and to store or retrieve data from that record quickly.
When an identifier in source program is detected by the
lexical analyzer, the identifier is entered into ST. however, the
attributes of an identifier can not be determined during lexical
analysis, the remaining phases enter information about identifier
into ST and then use this information in various ways.
Symbol Table Contents :-
A symbol table is most often conceptualized as series of
rows, each row containing a list of attributes values that are
associated with a particular variable. The kinds of attributes
appearing in ST are dependent to some degree on the nature of
language for which compiler is written. For example, a language
may be typeless, and therefore the type attribute need not appear
in ST .The following list of attribute are not necessary for all
compilers, however, each should be considered for a particular
compiler:-
1- Variable Name: A variable's name most always reside in
the ST. major problem in ST organization can be the
variability in the length of identifier names. For languages
such as BASIC with its one – and two – character names
and FORTRAN with names up to six characters in length,
this problem is minimal and can usually be handled by
storing the complete identifier in a fixed – size maximum
length fields. While there are many ways of handling the
storage of variable names, two popular approaches will be
outlined, one which facilitates quick table access and
another which supports the efficient storage of variable
names. The provide quick access, yet sufficiently large,
maximum variable name length. A length of sixteen or
greater is very likely adequate (‫)ﻣﻼﺋ ﻢ‬, the complete
identifier can then be stored in a fixed – length fields in

With My Best Wishes 16 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

ST, in this approach, table access is fast but the storage of


short variable names is inefficient.
A second approach is to place a string Descriptor in the
name filed of the table. The descriptor contains (position and
length) subfields. The pointer subfield indicates the position
of the first character of the name in a general string area, and
the length subfield describes the number of characters in the
name. therefore, this approach results in slow table access,
but the savings in storage can be considerable.
2- Object Time Address:- The relative location for values of
variable at run time.
3- Type:- This fields is stored in ST when compiling
language having either implicit or explicit data type. For
typeless language such as "BASIC" this attribute is
excluded. " FORTRAN" provides an example of what
mean by implicit data typing. Variables which are not
declared to be particular type are assigned default types
implicitly (variables with names starting with I, J, K, L, M,
or N are integer, all other variable are real).
4- Dimension of array or Number of parameters for a
procedure.
5- Source line number at which the variable is declared.
6- Source line number at which the variable is referenced.
7- Link filed for listing in alphabetical order.

Operation on ST:-
The two operations that are most commonly performed on
ST are: Insertion & Lookup (Retrieval) .For language in
which explicit declaration of all variables is mandatory
(‫)إﺟﺒ ﺎري‬, an insertion is required when processing a
declaration. If ST is Ordered, then insertion may also involve
a lookup operation to find allocations at which the variable's
attributes are to be placed. In such a situation an insertion is

With My Best Wishes 17 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

at least as expensive as retrieval. If the ST is not ordered, the


insertion is simplified but the retrieval is expensive.
Retrieval operations are performed for all references to
variables which don’t involve declaration statements.
The retrieved information is used for semantic checking and
code generation. Retrieval operations for variables which
have not been previously declared are detected at this stage
and appropriate error messages can be emitted. Some
recovery from such semantic errors can be achieved by
posting a warning message and incorporation ( ‫ )ﯾﻨ ﺸﺊ‬the
nondeclared variable in ST.
When a programming language permits implicit declarations
of variable reference must be treated as an initial reference,
since there is no way of knowing a priori of the variable's
attributes have been entered in ST. Hence any variable
reference generates a lookup operation followed by an
insertion if the variable's name is not found in ST.
For block – structured languages, two additional operations
are required: Set & Reset.
The Set operation is invoked when the beginning of a block is
recognized during compilation. The complementary
operation, the Reset operation is applied when the end of
block is encountered. Upon block entry, the set operation
establishes a new subtable (within the ST) in which the
attributes for the variables declared in the new block can be
stored. Because an new subtable is established for each block,
the duplicated variable- name problem can be resolve.
Upon block exit the reset operation removes the subtable
entries for the variables of the completed block.

ST Organizations:-
The primary measure which is used to determine the
complexity of a ST operation is the average length of search.
This measure is the average number of comparisons required
to Retrieve a ST record in a particular table organization, the

With My Best Wishes 18 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

name of variable for which an insertion or lookup operation is


to be performed will be referred to as the search argument.
1- Unordered ST: The simplest method of organization ST,
is to add the attribute entries to the table in the order in
which the variable are declared. In an insertion operation
no comparisons are required.
2- Ordered ST : In this and following organization, we
described ST organization in which the table position of a
variable's set of attributes is based on the variable's name.
An insertion operation must be accompanied by lookup
procedure which determines where in ST the variables
attribute should be placed. The insertion of new of
attributes may generate some additional overhead
primarily because other sets of attributes may have to be
moved in order to a chive the insertion.
3- Tree – structured ST :The time to performed an insertion
operation can be reduced by using a tree – structured type
of storage organization.

Constant Dropping Wears Away a


Stone

With My Best Wishes 19 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Syntactic Analyzer ( Parser )


Every programming language has rules that prescribe the
syntactic structure of well formed programs. The syntax of
programming language constructs can be described by context
free grammars. In syntax analysis we are concerned with
groping tokens into larger syntactic classes such as expression ,
statements , and procedure. The syntax analyzer (parser) outputs
a syntax tree, in which its leaves are the tokens and every non-
leaf node represents a syntactic class type. For example:-
Consider the following grammars:-
E E+E E*E (E) -E id

Then the parse tree for -(id+id) is:-

- E

( E )

E + E

id id

Syntax Error Handling :-


Often much of the error detection and recovery in a
compiler is central around the parser. One reason of this is that
many errors are syntactic in nature. Errors where the token
stream violates the structure of the language are determined by
parser, such as an arithmetic expression with unbalanced
parentheses.

With My Best Wishes 20 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Derivations :-
This derivational of view gives a precise description of the
top-down construction of parse tree. The central idea here is that
a production is treated as rewriting rule in which the
nonterminal on the left is replaced by the string on the right side
of the production. For example, consider the following
grammar:

E E+E
E E*E
E (E)
E -E
E id
The derivation of the input string id + id* id is:

Left-most derivation Right-most derivation


E E
E+E E+E
id +E E+E*E
id+E*E E+E*id
id+id*E E+id*id
id+id*id id+id*id

Note:- parse tree may by viewed as a graphical representation


for a derivation :
E

E + E

id E * E

id id

With My Best Wishes 21 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Ambiguity :-
A grammar that produce more that one parse tree for same
sentence is said to be Ambiguous. In the another way, by
produced more that one left-most derivation or more that one
Right-most derivation for the same sentence.
E

E + E

id E * E

id id

(1)

E * E

E + E id

id id

(2)
two parse tree for id+id*id
E E
E+E E*E
id+E E+E*E
id+E*E id+E*E
id*id*E id+id*E
id+id*id id+id*id
(1) (2)
Two left-most derivations for id+id*id
With My Best Wishes 22 Esam
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Sometimes am ambiguous grammar can be rewritten to


eliminate the ambiguity. Such as:

E E+E E E+T T
E E*E E (E)
E (E) E -E
E -E T T*F F
E id F id

E + T

T T * F

F F id

id id
parse tree for id+id*id

Left-Recursion :-
A grammar is left-recursion if has a nonterminal A such
that there is a derivation A Aα for some string α . Top-
down parser cannot handle left-recursion grammars, so a
transformation that eliminates left-recursion is needed:

A → Aα ß

A → ß A'
A' → αA' ε
OR:

With My Best Wishes 23 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

A→Aα1 Aα2 … Aαm … ß1 ß2 … ßn

A→ ß1A' ß2 A' … ßn A'


A'→ α1 A' α2 A' … αmA' ε .

Example:
E→ E+T T
T→ T*F F
F→ (E) id

E→ T E'
E' → +T E' ∊
T→ FT'
T'→ *F T' ∊
F→ (E) id

Left-Factoring :-
The basic idea is that when is not clear which of two
alternative production to use to expand a nonterminal A . We
may be able to rewrite the A-productions to defer the decision
until we have seen enough of the input to make the right choice.

A→ α ß1 α ß2 where α ≠ ∊

A→ α A'
A'→ß1 ß2

With My Best Wishes 24 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

OR :-

A→ α ß1 α ß2 .. α ßn γ where α ≠ ∊

A→ α A' γ
A'→ ß1 ß2 .. ßn

Example:

S→ iEtS iEtSeS a
E→b

S → iEtSS' a
S' → eS ∊
E→ b

Easy Come , Easy Go

With My Best Wishes 25 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Top-Down Parsing :-
Top-down parsing can be viewed as an attempt to
find a leftmost derivation for an input string. Equivalently, A top
down parser, such as LL(1) parsing, move from the goal symbol
to a string of terminal symbols. in the terminology of trees, this
is moving from the root of the tree to a set of the leaves in the
syntax tree for a program. in using full backup we are willing to
attempt to create a syntax tree by following branches until the
correct set of terminals is reached. in the worst possible case,
that of trying to parse a string which is not in the language, all
possible combinations are attempted before the failure to parse
is recognized. the nature of top down parsing technique is
characterized by:
1-Recursive-Descent Parsing : It is a general form of Top-
Down Parsing that may involve " Backtracking ",that is ,making
repeated scans of the input.
Example: consider the grammar
S cAd
A ab a

Input : cad
Then the implementation of Recursive-Descent Parsing is:
S S S

c A d c A d c A d

a b a

-a- -b- -c-

2-Predictive parsing : In many cases, by carefully writing a


grammar , eliminating left-recursion from it and left-factoring
the resulting grammar, we can obtain agrammar that can be
parsed by recursive-descent parser that needs no
"Backtracking",i.e.,a Predictive parser.

With My Best Wishes 26 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

2.1. Transition Diagrams for Predictive parsers


It is useful plan or flowchart for a predictive parser. There
is one diagram for each nonterminal, the labels of edges are
tokens and [Link] example:

E→ E+T T
T→ T*F F // Original grammar
F→ (E) id
Eliminate left-recursion and left factoring
E→ T E'
E' → +T E' ∊
T→ FT'
T'→ *F T' ∊
F→ (E) id

Transition Diagrams
E: T E'
0 1 2

E' : + T E'
3 4 5 6

ε
T: F T'
7 8 9

* F T'
T' : 10 11 12 13

ε
( E )
F: 14 15 16 17

id

With My Best Wishes 27 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

First & Follow :

• First : To compute First(X) for all grammar symbols


apply the following rules until no more terminal or ε
can be added to any First set :
1. If x is terminal, then FIRST(x) is {x}.
2. If x є is a production ,then add є to FIRST(x).
3. If x is nonterminal and x y1y2 … yk is a production,
then place a in FIRST(x) if for some i ,a is in
FIRST(yi),and є is in all of FIRST(y1)… FIRST(yi-1).
• Follow :To compute Follow(A) for all nonterminals
apply the following rules until nothing can be added to
any Follow set.
1. Place $ in FOLLOW(S),where S is the start symbol.
2. If there are a production A αBß, then everything in
FIRST(ß)except for є is placed in FOLLOW(B).
3. If there are a production A αB ,or a production
A αBß where FIRST(ß) contains є, then everything in
FOLLOW(A) is in FOLLOW(B).
Example : suppose the following grammar

E→ T E'
E' → +T E' ∊
T→ FT'
T'→ *F T' ∊
F→ (E) id

Nonterminals First Follow


E ( , id ) , $
E' + ,∊ ) , $
T ( , id +,) , $
T' * , ∊ +,) , $
F ( , id *,+,) , $

With My Best Wishes 28 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

2.2. Nonrecursive Predictive Parsing :-The nonrecursive


parser in following figure lookup the production to be
applied in a parsing table.

Input a + b $

X Predictive Parsing
Output
Program
Y
Z

$ Parsing Table
M

Stack

Model of a Nonrecursive Predictive Parsing

• Construction of Predictive Parsing Table :


1. For each production A α of the grammar, do
steps 2 and 3 .
2. For each terminal a in First(α),add A α to
M[A, a].
3. If ε is in First(α) ,add A α to M[A,b] for
each b in Follow(A).
4. Make each undefined entry of M be error.
• Predictive Parsing Program :The parser is
controlled by a program that behaves as follows:
The program consider X- the symbol on top of the
stack- and a – the current input symbol-. These two
symbols determine the action of the parser. There are
three possibilities :

With My Best Wishes 29 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

1. If X = a = $ , the parser halt, and successful


completion of parsing.
2. If X = a ≠ $ , the parser pops X off the stack and
advances the input pointer to the next input
symbol.
3. If X is nonterminal , the program consults entry
M[X,a] of the parsing [Link] M[X,a]=
{X UVW }the parser replaces X on top of
stack by WVU ( with U on top ).
Example:

E→ E+T T
T→ T*F F // Original grammar
F→ (E) id
Eliminate left-recursion and left factoring
E→ T E'
E' → +T E' ∊
T→ FT'
T'→ *F T' ∊
F→ (E) id
Predictive Parsing Table M
Input symbol
Nonterminals
id + * ( ) $

E TE' TE'

E' +TE' є є

T FT' FT'

T' є *FT' є є

F id (E)

With My Best Wishes 30 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Implement Predictive Parsing Program


stack Input output
$E id+id*id$
$E'T id+id*id$ E TE'
$E'T'F id+id*id$ T FT'
$E'T'id id+id*id$ F id
$E'T' +id*id$
$E' +id*id$ T' є
$E'T+ +id*id$ E' +TE'
$E'T id*id$
$E'T'F id*id$ T FT'
$E'T'id id*id$ F id
$E'T' *id$
$E'T'F* *id$ T' *FT'
$E'T'F id$
$E'T'id id$ F id
$E'T' $ T' є
$E' $ E' є
$ $ Accept

LL( 1 )Grammar :A grammar whose parsing table has no


multiply-defined entries is said to be LL(1).
Example :- (H.W)
S iEtSS' a

S' eS ε
E b

Everything Comes to him who Waits

With My Best Wishes 31 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Bottom-Up Parsing
The term "Bottom-Up Parsing" refer to the order in which
nodes in the parse tree are constructed, construction starts at the
leaves and proceeds towards the root. Bottom-Up Parsing can
handle a large class of grammars.
1. Shift-Reduce Parsing: Is a general style of Bottom-up
syntax analysis , it attempts to construct a parse tree for an
input string beginning at leaves and working up towards
the root,(reducing a string w to the start symbol of
grammar).At each reduction step a particular substring
matching the right side of production is replaced by the
symbol on the left of that production.
Example : consider the grammar

S aABe
A Abc b
B d

And the input is abbcde


The implementation Bottom-Up Parsing is
abbcde
aAbcde
aAde
aABe
S
Accept
Handle : Is a substring that matches the right side of a
production.

Stack Implementation of Shift-Reduce Parsing:


A convenient way to implement a shift-reduce parser is to use a
Stack to hold a grammar symbols and an input buffer to hold the
sting w to be parsed. We use $ to mark the bottom of stack and
also the right end of the input string. There are actually four
possible actions:

With My Best Wishes 32 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

1. Shift : The next input symbol is Shifted onto the top


of stack.
2. Reduce : Replace the handle with nonterminal.
3. Accept : The parser announces successful
completion of parsing .
4. Error : The parser discovers that syntax error has
occurred and calls an error recovery routine.
Example: Consider the following grammar
E E+E E*E (E) id
And the input string is id + id * id, then the
implementation is :

Action
Stack Input Buffer

$ id+id*id$ Shift
$id +id*id$ Reduce: E→id
$E +id*id$ Shift
$E+ id*id$ Shift
$E+id *id$ Reduce: E→id
$E+E *id$ Shift(*)
$E+E* id$ Shift
$E+E*id $ Reduce: E→id
$E+E*E $ Reduce: E→E*E
$E+E $ Reduce: E→E+E
$E $ Accept

Conflicts During Shift-Reduce Parsing:


There are context free grammars for which shift-reduce
parsing cannot be used. Ambiguous grammars lead to parsing
conflicts. Can fix by rewriting grammar or by making
appropriate choice of action during parsing. There are two type
of conflicts :

With My Best Wishes 33 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

1. Shift/Reduce conflicts: should we shift or reduce? (See


previous example (*))
2. Reduce/Reduce conflicts: which production should we
reduce with? for example:
stmt → id(param)
param → id
expr → id(expr) | id

Stack Input Buffer Action


$...id(id ,id)...$ Reduce by ??

Should we reduce to param or to expr ?

Example is Better than Precept

With My Best Wishes 34 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

LR Parsers
This section presents an efficient Bottom-Up syntax
analysis technique that can be used to parse a large class of
context-free grammars. The technique is called LR(k) parsing,
the "L" is for left to right scanning of input, the "R" for
constructing a rightmost derivation in reverse, and "k" for the
number of input symbols of lookahead that are used in making
parsing decisions-when "k" is omitted , k is assumed to be 1).
LR parsing is attractive for a variety of reasons:-
1. LR parsers can be constructed to recognize virtually all
programming language constructs for which context-free
grammars can be written.
2. The LR parsing method is the most general
nonbacktracking shift-reduce parsing method known, yet it
can be implemented as efficiently as other shift-reduce
methods.
3. The class of grammars that can be parsed using LR
methods is a proper superset of the class of grammars that
can be parsed with predictive parsers.
The schematic form of an LR parser is shown in following
figure .It consists of an Input,an Output,a Stack,a Driver
program,and a Parsing table that has two parts (action and
goto) .

a1 …… ai ……. an $
Input

LR
Sm Output
Parsing program
Xm
Sm-1
Xm-1
………
action goto
S0

Stack
Model of an LR parser
With My Best Wishes 35 Esam
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

There are three techniques for LR parser depending on the


construct of LR parsing table for a grammar :
1. Simple LR parser (SLR for short):Is the easiest to
implement but the least powerful of the [Link] may be
fail to produce a parsing table for certain grammars on
which the other methods succeed.
2. Canonical LR parser: It is most powerful, and most
expensive.
3. Lookahead LR parser (LALR for short):It is
intermediate in power and cost between other two. The
LALR method will work on most programming-language
grammars and ,with some effort ,can be implemented
efficiently.
The LR parsing Algorithm :- The LR program is the same for
all LR parsers, only the parsing table changes from one parser to
another.

With My Best Wishes 36 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Implementation of SLR parser:-


The SLR-parser Extremely tedious to build by hand, so
need a generator. The following steps represents the main
stages, which are used to build system that is used for
implementing the SLR-parser:
1. Input stage : In this state the grammar has been reading
and the symbols of grammar (terminals and nonterminals)
could be specified and each production of grammar must be
on one straight line. Finally, the productions has been
numbered.
For example, consider the grammar
1E E+T
E E+T T 2E T
T T*F F 3T T*F
4T F
F ( E ) id 5F (E)
6F id
2. Compute First & Follow stage : Through this state First
& Follow could be detected for each nonterminal.
3. Construct DFA stage:By using a deterministic finite
automaton ( DFA )the SLR-parser know when to shift and
when to reduce. the edges of DFA are labeled by symbols
of grammar( terminals & nonterminals).In this state, where
the input begins with S'(root),that means that it begins with
any possible right-hand side of an S-production we indicate
that by
S' .S$
S .
.
.

Call this state1 or state0,a productions combined with the


dot(.) that indicates a position of [Link] ,for each
production in state1 we exam the symbol that occur after
dot, there are three cases :
1. If the symbol is null (the dot has been occurred in the end
of right side of production),then there are no new state .
2. If the symbol is "$" sign, then there are no new state.
With My Best Wishes 37 Esam
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

3. If the symbol is a terminal or nonterminal, then there are


new state, this state start with current production after the dot
has been proceeded one step forward. If the symbol has been
occurred after the dot(in new position)is nonterminal such as A,
then we add all possible right hand side of A to a new state, and
so on.
You must know that any new state must built firstly in a buffer,
and we compare it with a previous states in DFA, if there are no
similarity situation then the new state is added to DFA and give
it a new number equal to number of states in DFA plus one.
Finally ,we repeat this steps on all new states until the DFA
completed.
Example : consider grammar
E T+E
E T
T x
Initially, it will have an empty stack, and the input will be
a complete S-sentence followed by $;that is the right-hand side
of the S' rule will be on the input. we indicate this as S' . S$
where the dot indicates the current position of the parser. So:
0 S' E$
1E T+E
2E T
3T x
1
0
E S' E.
S' . E$
E . 2
T+E T
E .T E T.
+E
x
T + 3
4 E T+ 5
T x. x .E E T+ E
E . E .
T+E
E .T

Deterministic finite automaton (DFA)


With My Best Wishes 38 Esam
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

4. Construct SLR table stage:The SLR-table is a data


structure consist of many rows equal to the number of the
states in DFA,also many columns equal to the number of
grammar symbols plus "$" sign .As know ,data structure
presents fast in information treatment and information
retrieve .In this stage SLR-table is constructed .this table had
seen as two subtables:
1. The Action table: consist of many rows equal to number
of states in DFA,and many columns equal to number of
terminals plus "$" sign(the end of input).
2. The Goto table: consist of many rows equal to number of
states in DFA,and many columns equal to number of
nonterminals.
the elements(entries) in the SLR-table are labeled with four
kinds of actions:
• Snshift into state n
• gngoto state n
• rkreduce by production k
• a accept
• error (denoted by blank entry in the table)
For the construction of this table and the contribution the
actions on the tables cells must pass to each state in DFA
individually :
t State
State
n+1
n-1
N State
n

• Shift action & Goto action could be specified according to the


edge which has been moved from the current state(n) to the
new state.
If the edge was terminal symbol (t) then
Cell[n-1,t]= sn
If the edge was nonterminal symbol (N) then
Cell[n-1,N]= gn

With My Best Wishes 39 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

• If there are production in current state has the form


A ß. (the dot in the end of right hand side, ß is any
string ),then the action is reduce
Cell[n-1,f]=rk { f in Follow(A), k is the no. of
production}
• If there are production in current state has the form
A ß.$ {the dot occurred before $ sign, ß is any string },then
the action is accept
Cell[n-1,$]=a
• Finally, any empty cell in row n-1 means error action.
Repeat the above steps for each states in DFA.

state x + $ E T
0 S4 g1 g2
1 Accept
2 S3 r2
3 S4 g5 g2
4 r3 r3
5 r1

[Link] LR Algorithm : Suppose input string is x+x.


After insert input string the LR-program is executed, as
follows:

Stack Input Action


0 x+x $ shift
0S4 +x$ Reduce by T x
0T2 +x$ shift
0T2S3 x$ Shift
0T2S3S4 $ Reduce by T x
0T2S3T2 $ Reduce by E T
0T2S3E5 $ Reduce by E T+E
0E1 $ Accept

With My Best Wishes 40 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Semantic Analysis
The semantic analysis phase of compiler connects variable
definition to their uses ,and checks that each expression has a
correct type.
This checking called "static type checking" to distinguish it
from "dynamic type checking" during execution of target
program. This phase is characterized be the maintenance of
symbol tables mapping identifiers to their types and locations.
Examples of static type checking:-
1. Type checks : A compiler should report an error if an
operator is applied to an incompatible operand.
2. Flow of control checks:- Statements that cause flow of
control to leave a construct must have some place to which
to transfer the flow of control. For example, a "break"
statement in 'C' language causes control to leave the
smallest enclosing while , for , or switch statement ;an
error occurs if such an enclosing statement does not exist.
3. Uniqueness checks:- There are situations in which an
object must be defined exactly once. For example, in
'Pascal' language, an identifier must be declared uniquely.
4. Name-related checks:- Sometimes, the same name must
appear two or more times. For example, in 'Ada' language
a loop or block may have a name that appear at the
beginning and end of the construct. The compiler must
check that the same name is used at both places.
Type system:-
The design of type checker for a language is based on
information about the syntactic constructs in the language, the
notation of types, and the rules for assigning types to language
constructs.
The following excerpts are examples of information that a
compiler writer might have to start with.
• If both operands of the arithmetic operators "addition",
"subtraction", and "multiplication" are of type integer ,
then the result is of type integer.

With My Best Wishes 41 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

• The result of Unary & operator is a pointer to the object


referred to by the operand. If the type of operand is T , the
type of result is ' pointer to T '.
We can classify type into :
1. Basic type: This type are the atomic types with no
internal structure , such as Boolean, Integer, Real,
Char, Subrange, Enumerated, and a special basic
types " type-error , void ".
2. Construct types: Many programming languages
allows a programmer to construct types from basic
types and other constructed types. For example
array, struct, set.
3. Complex type: Such as link list, tree, pointer.
Type system:- is a collection of rules for assigning type
expressions to the various parts of a program. A type checker
implements a type system.
Specification of a simple type checker:-
The type checker is a translation scheme that synthesizes
the type of each expression from the types of its subexpressions.
In this section, we specify a type checker for simple language in
which the type of each identifier must be declared before the
identifier is used.
Suppose the following grammar to generates program,
represented by nonterminal P, consisting of a sequence of
declarations D followed by a single expression E.
P D;E

D D ; D id : T

T char int array[num] of T T

E literal num id E mod E E[E] E

Type checker ( translation scheme) produce the following part


that saves the type of an identifier:

With My Best Wishes 42 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

P D;E
D D;D
D id:T {addtype([Link],[Link])}
T char {[Link]=char}
T int {[Link]=int}
T T1 {[Link]=pointer([Link])}
T array[num] of T1 {[Link]=array(1..[Link],[Link])}

• The type checking of expression: the following some


of semantic rules:
E literal {[Link]=char} //constants represented
E num {[Link]=int} // = =

We can use a function lookup( e ) to fetch the type saved in ST


,if identifier " e " appears in an expression:
E id {[Link]=lookup([Link])}
The following expression formed by applying (mod) to two subexpression:
E E1 mod E2 {[Link]= if [Link]=int and [Link]=int then int
Else type-error }
An array reference:
E E1[E2] { [Link]= if [Link]=int and [Link]=array[s,t] then t
Else type-error}

E E1 { [Link]= if [Link]=pointer(t) then t


Else type-error}

• The type checking of statements :

S id=E {[Link]=if [Link] = [Link] then void


Else type-error )}

S if E then S1 {[Link]= if [Link]=boolean then [Link]


Else type-error }

S while E do S1 {[Link]= if [Link]=boolean then [Link]


Else type-error }

S S1 ; S2 { [Link]= if [Link]=void and [Link]=void then void


Else type-error}

With My Best Wishes 43 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Intermediate Code Generation ( IR)

IR is an internal form of a program created by the


compiler while translating the program from a H.L.L to
L.L.L.(assembly or machine code),from IR the back end of
compiler generates target code.
Although a source program can be translated directly into the
target language,some benefits of using a machine independent
IR are:
1. A compiler for different machine can be created by
attaching a back end for a new machine into an existing
front end.
2. Certain optimization strategies can be more easily
performed on IR than on either original program or L.L.L.
3. An IR represents a more attractive form of target code.
Intermediate Languages:-
1. Syntax Tree and Postfix Notation are tow kinds of
intermediate representations, for example a=b*-c+b*-c

= =

a + a +

* * *

b - b - b -

c c c

Syntax Tree DAG

• A DAG give the same information in syntax tree but in


compact way because common subexpressions are
identified.
• Postfix notation is a linearized representation of a syntax
tree, for example: a b c - * b c - * + =
• Two representation of above syntax tree are:

With My Best Wishes 44 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

0 id b
= • •
1 id c
id a 2 - 1
3 * 0 2
+ • • 4 id b
5 id c
6 - 5
7 * 4 6
* • • * • •
8 + 3 7

id b id b 9 id a

10 = 9 8
…. ….. …..
- • - •
….. ….. …..

id c id c

1 2

2. Three-Address Code is a sequence of statements of the


general form :
X= Y op Z // op is binary arithmetic
operation
For example : x + y * z

t1 = y * z
t2 = x + t1

where t1 ,t2 are compiler generated temporary.

With My Best Wishes 45 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Types of three address code statement:-

1. Assignment statements of the form X= Y op Z ( where


op is a binary arithmetic or logical operator).
2. Assignment instructions of the form X= op Y ( op is
a unary operator).
3. Copy statements of the form X= Y .
4. Unconditional jump ( Goto L ).
5. Conditional jump ( if X relop Y goto L).
6. Param X & Call P,N for procedure call and and
return Y , for example :
Param x1
Param x2
……..
Param xn
Call P,n
7. Index assignments of the form X=Y[i] & X[i]=Y.
8. Address & Pointer Assignments
X= &Y
X= * Y
*X= Y
Example : a= b * -c + b * -c

t1 = - c t1 = - c
t2 = b * t1 t2 = b * t1
t3 = - c t5 = t2 + t2
t4 = b * t3 a = t5
t5 = t2 + t4
a = t5
Three address code Three address code
For syntax tree For DAG

With My Best Wishes 46 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Note: Three-address statements are a kin to assembly code


statements can have symbolic labels and there are statements for
flow of control.

Implementation of Three Address Code :-


In compiler , three-address code can be implement as
records, with fields for operator and operands.

1. Quadruples :- It is a record structure with four


fields:
• OP // operator
• arg1 , arg2 // operands
• result

2. Triples :- To avoid entering temporary into ST , we


might refer to a temporary value by position of the
statement that compute it . So three address can be
represent by record with only three fields:

• OP // operator
• arg1 , arg2 // operands

With My Best Wishes 47 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Example: a = b * -c + b * -c

i. By Quadruples
Position OP arg1 arg2 result
0 - c t1
1 * b t1 t2
2 - c t3
3 * b t3 t4
4 + t2 t4 t5
5 = t5 a

ii. By Triples
Position OP arg1 arg2
0 - c
1 * b (0)
2 - c
3 * b (2)
4 + (1) (3)
5 = a (4)

With My Best Wishes 48 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Code Optimization
Compilers should produce target code that is as good as can be
written by hand. This goal is achieved by program
transformations that are called " Optimization " . Compilers that
apply code improving transformations are called " Optimizing
Compilers ".
Code optimization attempts to increase program efficiency by
restructuring code to simplify instruction sequences and take
advantage of machine specific features:-
• Run Faster , or
• Less Space , or
• Both ( Run Faster & Less Space ).
The transformations that are provided by an optimizing compiler
should have several properties:-
1. A transformation must preserve the meaning of
program. That is , an optimizer must not change the
output produce by program for an given input, such
as division by zero.
2. A transformation must speed up programs by a
measurable amount.

Source Intermediate Code Target


Front End Representation
Code Generation Code

High-Level Mid-Level Low-Level


Optimization Optimization Optimization

Places for Optimization

This lecture concentrates on the transformation of intermediate


code ( Mid-Optimization or Independent Optimization ),this
optimization using the following organization:-
With My Best Wishes 49 Esam
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Front End Code


Optimizer Generation

Control Data
Flow Flow Transformations
Analysis Analysis

Organization of the Optimizer

This organization has the following advantages :-


1. The operations needed to implement high-level constructs
are made explicit in the intermediate code.
2. The intermediate code can be independent of the target
machine, so the optimizer does not have to change much if
the code generator is replaced by one for different
machine.

Basic Blocks:-
The code is typically devided into a sequence of "Basic
Blocks". A Basic Block is a sequence of straight-line code,with
no branches " In " or " Out " except a branch "In" at the top of
block and a branch "Out" at the bottom of block.
• Set of Basic Block : The following steps are used to set
the Basic Block:
1. Determine the Block beginning:
i- The First instruction
ii- Target of conditional & unconditional
Jumps.
iii- Instruction follow Jumps.

With My Best Wishes 50 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

2. Determine the Basic Blocks:


i-There is Basic Block for each Block beginning.
ii-The Basic Block consist of the Block beginning
and runs until the next Block beginning or
program end.

Example\\ 1) i=0
B1
2) t=0
1) i=0
2) t=0
3) t=t+1
3) t=t+1
B2 4) i=i+1
4) i=i+1
5) if I < 10 then goto 3
5) if I < 10 then goto 3
6) x=t
6) x=t
B3

Basic Blocks

1) i=0
B1
2) t=0

3) t=t+1
B2 4) i=i+1
5) if I < 10 then goto B2

6) x=t
B3

Control Flow
With My Best Wishes 51 Esam
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Data – Flow Analysis ( DFA )


In order to do code optimization a compiler needs to
collect information about program as a whole and to distribute
this information to each block in the flow graph. DFA provides
information about how the execution of a program may
manipulate its data , and it provides information for global
optimization .
There are many DFA that can provide useful information
for optimizing transformations. One data-flow analysis
determines how definitions and uses are related to each other,
another estimates what value variables might have at a given
point, and so on. Most of these DFAs can be described by data
flow equations derived from nodes in the flow graph.
Reaching Definitions Analysis: All definitions of that variable,
which reach the beginning of the block, as follow:
1. Gen[B] : contains all definitions d:v= e , in block B that v
is not defined after d in B.
2. Kill[B] : if v is assigned in B , then Kill[B] contains all
definitions d:v= e,in block different from B.
3. In[B] : the set of definitions reaching the beginning of B.
In[B] = ∪ Out[H] where H ∈ Pred[B]
4. Out[B] : the set of definitions reaching the end of B.
Out[B] = Gen[B] ∪ ( In[B] – Kill[B] )
Example d1 : a=
d2 : b= B1
d3 : c=

B2 d4 : b= d5 : c= B3

d6 : b=
d7 : c=
B4

d8 : a=
B5
With My Best Wishes 52 Esam
Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Block Gen Kill In Out


B1 d1d2d3 d4d5d6d7d8 Ø d1d2d3
B2 d4 d2d6 d1d2d3 d1d3d4
B3 d5 d3d7 d1d2d3d6d7 d1d2d5d6
B4 d6d7 d2d3d4d5 d1d2d5d6 d1d6d7
B5 d8 d1 d1d2d3d4d5d6 d2d3d4d5d6d8

Loop Information: The simple iterative loop which causes the


repetitive execution of one or more basic blocks becomes the
prime area in which optimization will be [Link] we
determine all the loops in program and limit headers &
preheaders for every loop, for example:

B1

B2
Loop No. Header Preheader Blocks
1 B2 B1 2-3-4-5-2
B3 2 B2 B1 2-2
3 B3 B2 3-3

B4
Loop Information

B5 B6

Flow Graph

With My Best Wishes 53 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Code Optimization Methods


A transformation of program is called " Local " if it can
performed by looking only at the statements in a Basic Block,
otherwise, it is called " Global " .
Local Transformations:
1. Structure-Preserving Transformations:-
• Common Subexpression Elimination
• Dead Code Elimination

2. Algebraic Transformations:-This transformations uses to


change the set of expressions ,computed by a basic block,
with an algebraically equivalent set. The useful ones are
those that simplify expressions or replace expensive
operations by cheaper one, such as:
x:= x+ 0
x:= x*1 Eliminated
x:= x/1

x:= y^2 x:= y*y


Another class of algebraic transformations is Constant Folding
,that is, we can evaluate constant expressions at compiler time
and replace the constant expressions by their values, for
example, the expression 2*3.14 would be replaced by 6.28.

Global Transformations:
1. Common Subexpression Elimination
a=b+c a=b+c
c=b+c c=a
d=b+c d=b+c
2. Dead Code Elimination: Variable is dead if never used

x=y+1
y=1 y=1
x=2*z x=2*z

With My Best Wishes 54 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

3. Copy Propagation
Origin Copy Propagation Dead Code
x=t3 x=t3
a[t2]=t5 a[t2]=t5 a[t2]=t5
a[4]=x a[4]=t3 a[4]=t3
Goto B2 Goto B2 Goto B2
4. Constant Propagation
Origin Copy Propagation Dead Code
x=3 x=3
a[t2]=t5 a[t2]=t5 a[t2]=t5
a[4]=x a[4]=3 a[4]=3
Goto B2 Goto B2 Goto B2

5. Loop Optimization
• Code Motion: An important modification that
decreases the amount of code in a loop is Code
Motion. If result of expression does not change
during loop( Invariant Computation ),can hoist its
computation out of the loop.

For(i=0;i<n;i++)
A[i]=a[i]+( x*x )/( y*y );
c=( x*x )/( y*y );
For(i=0;i<n;i++)
A[i]=a[i]+c;

With My Best Wishes 55 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

• Strength Reduction: Replaces expensive operat-


-ions (Multiplies, Divides)by cheap ones ( Adds,
Subs ).For example, suppose the following
expression:
For(i=1;i<n;i++){v=4*i;s=s+v;} i is induction variable
Then:
v=0;
For(i=1;i<n;i++){ v=v+4; s=s+v; }
Induction Variable: is a variable whose value changes by a
constant amount on each loop iteration.

With My Best Wishes 56 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

Code Generation
In computer science, code generation is the process by
which a compiler's code generator converts some internal
representation of source code into a form( e.g., machine
code)that can be readily executed by a machine.
Issues in the Design of a Code Generator:-
1. Input to the Code Generator :The input to the code
generator consists of the intermediate representation of the
source program(Optimized IR),together with information
in ST that is used to determine the Run Time Addresses of
the data objects denoted by the names in IR. Finally, the
code generation phase can therefore proceed on the
assumption that its input is free of the errors.
2. Target Programs : The output of the code generator is the
target program. The output code must be Correct and of
high Quality, meaning that it should make effective use of
the resources of the target machine. Like the IR ,this
output may take on a variety of forms:
a. Absolute Machine Language // Producing this form
as output has the advantage that it can placed in a
fixed location in memory and immediately executed.
A small program can be compiled and executed
quickly.
b. Relocatable Machine Language // This form of the
output allows subprograms to be compiled
separately. A set of relocatable object modules can
be linked together and loaded for execution by
linking-loader.
3. Memory Management : Mapping names in the source
program to addresses of data objects in run time memory.
This process is done cooperatively by the Front-end &
code generator.
4. Major tasks in code generation : In addition to the basic
conversion from IR into a linear sequence of machine
instructions, a typical code generator tries to optimize the
generated code in some way. The generator may try to use

With My Best Wishes 57 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

faster instructions, use fewer instructions ,exploit available


registers ,and avoid redundant computations. Tasks which
are typically part of a compiler's code generation phase
include:
i. Instruction selection: Is a compiler optimization
that transforms an internal representation of program
into the final compiled code(either Binary or
Assembly).The quality of the generated code is
determined by its Speed & Size. For example, the three
address code ( x=y+z ) can be translated into:

MOV y,R0
ADD z,R0
MOV R0,x
If three-address code is :
a=b+c
d=a+e
then the target code is :
MOV b,R0
ADD c,R0
MOV R0,a
MOV a,R0
ADD e,R0
MOV R0,d
Finally, A target machine with "Rich" instruction set
may be provide several ways of implementing a given
operation. For example, if the target machine has an
"increment" instruction ( INC ) ,then the IR a=a+1
may be implemented by the single instruction ( INC a )
rather than by a more obvious sequence :
MOV a,R0
ADD #1,R0
MOV R0,a
ii. Instruction Scheduling : In which order to put
those instructions. Scheduling is a speed optimization.
The order in which computations are performed can

With My Best Wishes 58 Esam


Compilers Anbar University – Computer College
Principle ,Techniques, and Tools

effect the efficiency of the target code, because some


computation orders require fewer registers to hold
intermediate results than others.
iii. Register Allocation : Is the process of
multiplexing a large number of target program
variables onto a small number of CPU registers. The
goal is to keep as many operands as possible in
registers to maximize the execution speed of software
programs ( instructions involving register operands
are usually shorter and faster than those involving
operands in memory ).

Hitch your Wagon to a Star

With My Best Wishes 59 Esam


References

1. [Link],[Link],[Link]," Compilers- Principles, Techniques and


Tools"Addison-Weseley,2007

2. [Link],[Link],"The Theory and Practice of Compiler


Writing ",McGRAW-HILL,1985

3. [Link],[Link],"An Introduction to Compiler


Construction",Harper Collins,New york,1993

4. [Link],"Modern Compiler Implementation in ML"


,CambridgeUniversity Press,1998
5. Internet Papers

You might also like