Compiler Design and Translation Basics
Compiler Design and Translation Basics
iq
rd
3 class
Compilers
المترجمات
عبير طارق مولود.م. م:استاذة الماده
Compiler
References:
1. Principle of compiler design
Alfred V. Aho & Jeffrey D. Ullman
2. Basics of compiler Design
Torben Egidius Mogensen
3. Compilers : principles, techniques, and tools
Alfred V. Aho & Jeffrey D. Ullman
Programming Languages
Hierarchy of Programming Languages based on increasing machine
independence includes the following:
1- Machine – level languages.
2- Assembly languages.
3- High – level or user oriented languages.
4- Problem - oriented language.
1- Machine level language: is the lowest form of computer. Each instruction
in program is represented by numeric code, and numerical addresses are
used throughout the program to refer to memory location in the computers
memory.
2- Assembly language: is essentially symbolic version of machine level
language, each operation code is given a symbolic code such ADD for
addition and MULT for multiplication.
3- A high level language such as Pascal, C.
4- A problem oriented language provides for the expression of problems in
specific application or problem area .examples of such as languages are SQL
for database retrieval application problem oriented language.
Translator
A translator is program that takes as input a program written in a given
programming language (the source program) and produce as output program
in another language (the object or target program). As an important part of
this translation process, the compiler reports to its user the presence of errors
in the source program.
If the source language being translated is assembly language, and the object
program is machine language, the translator is called Assembler.
Fig (1)
Fig (2)
Compiler
Is a program (translator) that reads a program written in one language, (the
source language) and translates into an equivalent program in another
language (the target language).
Fig (3)
The time at which the conversion of the source program to an object
program occurs is called (compile time) the object program is executed at
(run time).
Fig (4) illustrate the compilation process Note that the program and data are
processed at different times, compile time and run time respectively.
Data
Compiler structure :
A compiler operates in phases, each of which transforms the source program
from one representation to another. A typical decomposition of a compiler is
shown in fig (5).
Source program
Compiler structure:
1- lexical analysis
The lexical analyzer is the first stage of a compiler. Its main task is to read
the input characters and produce as output a sequence of tokens that the
parser uses for syntax analysis.
2- syntax analysis (parsing)
The syntax analysis (or parsing) is the process of determining if a string of
tokens can be generated by grammar. Every programming language has
rules that prescribe the syntactic structure of well-formed programs.
Syntax Analyzer takes an out of lexical analyzer and produces a large tree
3- Semantic analysis
The semantic analysis phase checks the source program for semantics errors
and gathers type information for the subsequent code-generation phase. It
uses the hierarchical structure determined by the syntax-analysis phase to
identify the operators and operands of expressions and statements.
Semantic analyzer takes the output of syntax analyzer and produces another
tree.
5- Code Optimization
Attempts to improve the intermediate code so that faster running machine
code will result.
6- code generation
Generates a target code consisting normally of machine code or an assemble
code. Memory locations are selected for each of the variables used by the
program. Then intermediate instructions are each translated in to a sequence
of machine instructions that perform the same task.
-Error handler:
Is called when an error in the source program is detected. It must warn the
programmer by issuing a diagnostic, and adjust the information being passed
from phase to phase so that each phase can produced.
Types of errors
The syntax and semantic phases usually handle a large fraction of errors
detected by compiler.
1. Lexical error: The lexical phase can detect errors where the characters
remaining in the input do not form any token of the language . few errors are
discernible at the lexical level alone ,because a lexical analyzer has a very
localized view of the source program. Example : If the string fi is
encountered in a C program for the first time in context:
fi ( a== f(x)….
A lexical analyzer cannot tell whether fi is a misspelling of the keyword if or
an undeclared function name. since fi is a valid identifier, the lexical
analyzer must return the token for an identifier and let some other phase of
the compiler handle any error.
2- syntax error: The syntax phase can detect Errors where the token stream
violates the structure rules (syntax) of the language.
4- runtime error.
Passes:
In an implementation of a compiler, portions of one or more phases are
combined into a module called a pass. A pass reads the source program or
the output of the previous pass, makes the transformations specified by its
phases and writes output into an intermediate file, which may then be read
by subsequent pass.
Lexical Analyzer
The lexical analyzer is the first phase of compiler. The main task of lexical
Analyzer is to read the input characters and produce a sequence of tokens
such as names, keywords, punctuation marks etc.. for syntax analyzer. This
interaction, summarized in fig.6, is commonly implemented by making the
lexical analyzer be a subroutine of the parser. Up on receiving a "get next
token" command from the parser, the lexical analyzer reads input characters
until it can identify the next token.
Preliminary scanning :
Since the lexical analyzer is the part of compiler that reads the source text; it
may also perform certain secondary tasks at the user interface. One such task
is stripping out from the source program comments and white space in the
form of blank, tab, and new line characters. Another is correlating error
messages from the compiler with the source program. For example, the
lexical analyzer may keep track of the number of new line characters seen,
so that a line number can be associated with an error message.
Some times, lexical analyzers are divided into a cascade of two phases, the
first called "scanning" and the second "lexical analysis". The scanner is
responsible for doing simple tasks, while the lexical analyzer proper does the
more complex operations. For example, a FORTRAN compiler might use a
scanner to eliminate blanks from the input.
Tokens, Patterns, Lexemes
In general, there is a set of strings in the input for which the same token is
produced as output. This set of strings is described by a rule called a pattern
associated with the token. The pattern is said to match each string in the set.
A lexeme is a sequence of characters in the source program that is matched
by the pattern for a token.
Example: Const pi = 3.1416
The sub string pi is a lexeme for the token “identifier”
Token
A lexical token is a sequence of characters that can be treated as a unit in the
grammar of the programming languages. In most programming language ,
the following constructs are treated as tokens: keywords, operators,
identifiers, constants, literal strings (any characters between " and "), and
punctuations symbols such as parentheses, commas, and semicolons.
The lexical analyzer returns to parser a representation for the token it has
found. This representation is:
• an integer code if there is a simple construct such as a left
parenthesis, comma or colon .
• or a pair consisting of an integer code and a pointer to a table if the
token is more complex element such as an identifier or constant .
This integer code gives the token type. The pointer points to the value of the
token. For example we may treat "operator" as a token and let the second
component of the pair indicate whenever the operator found is +, *, and so
on.
Symbol Table
A symbol table is a table with two fields. A name field and an information
field. This table is generally used to store information about various source
language constructs. The information is collected by the analysis phase of
the compiler and used by the synthesis phase to generate the target code.
We required several capabilities of the symbol table we need to be able to:
1- Determine if a given name is in the table, the symbol table routines are
concerned with saving and retrieving tokens.
insert(s,t) : this function is to add a new name to the table
Lookup(s) : returns index of the entry for string s, or 0 if s is not found.
2- Access the information associated with a given name, and add new
information for a given name.
3- Delete a name or group of names from the tables.
For example consider tokens begin , we can initialize the symbol-table using
the function: insert("begin",1)
beginning pointer
Constant:
start
26
relops:
S2 = SS or
o S.S
3
S = SSS or S.S.S
S4 = SSSS or S.S.S
S.S and so on
Languagges
A languaage is a sett of stringss over som
me fixed allphabet. The
T languaage may co
ontain a
finite or an
a infinitee number ofo strings.
L* = L0 U L` U L2 U L3 . . . U Ln . . .
L* = { , a, b, aa, ab,
a ba, bb, aaa, aba, baa, . . . }
L+ = L` U L2 U L3 . . . U Ln . . .
For exam
mple, If L = {a, b}, then
t
L+ = {a, b,
b aa, ba, bb,
b aaa, abba, . . . }
Regular Definition
D ns
Regular expression
e ns is a useful notatioon suitablee for descrribing tokens . A reggular
definition
n gives names to certain regu ular expresssions and d uses thoose namess in other
regular expressions.
Example1
1:
keywo
ord = BEGIN|END|IF|THEN|EELSE
constaant = digitt+
relop = <|<=|=|<>|>|>=
example22: Here is a regular definitionn for the seet of Pascaal identifieers that is define as
the set off strings off letter annd digits beginning with
w a lettters.
letter → A | B | . . . | Z | a | b | . . . | z
digit → 0 | 1 | 2 | . . . | 9
id → leetter (letteer | digit)*
digit → 0 | 1 | 2 | . . . | 9
digit → digit
d digit**
Optimaal-fractionn → . diigits | ε
Optimaal-exponeent → (E (+( | - | ) digits)
d |ε
num m → digiits optimaal-fraction
n optimal--exponent
Finite au
utomata
A recognnizer for a language is a progrram that taakes a strinng x as ann input and
d answers
"yes" if x is a senteence of thee languagee and "no"" otherwisse.
1. a set of states S;
2. a set of input symbol, ∑, called the input symbols alphabet.
3. a transition function move that maps state-symbol pairs to sets of states.
4. a state so called the initial or the start state.
5. a set of states F called the accepting or final state.
An NFA can be described by a transition graph (labeled graph) where the nodes are
states and the edges shows the transition function.
The labeled on each edge is either a symbol in the set of alphabet, ∑, or denoting
empty string.
Following figure shows an NFA that recognizes the language: (a | b)* a bb.
The transition is
The advantage of transition table is that it provides fast access to the transitions of
states and the disadvantage is that it can take up a lot of space.
The following diagram shows the move made in accepting the input strings aabb :
In general, more than one sequence of moves can lead to an accepting state. If at least
one such move ended up in a final state
The language defined by an NFA is the set of input strings that particular NFA
accepts.
A DFA has at most one transition from each state on any input. It means that each
entry in the transition table is a single state (as oppose to set of states in NFA).
it is very easy to determine whether a DFA accepts an input string, since there is at
most one path from the start state labeled by that string.
INPUT:
• string x
• a DFA with start state, so . . .
• a set of accepting state's F.
OUTPUT:
The function move (S, C) gives a new state from state S on input character C.
The function 'nextchar' returns the next character in the string.
Initialization:
S := S0
C := nextchar;
end
If S is in F then
return "yes"
else
return "No".
example :
state a b
0 1 0
1 1 2
2 1 3
3 1 0
With this DFA and the input string "ababb", above algorithm follows the sequence of
states: 0,1,2,1,2,3 and returns "yes"
Syntax Analysis
In our compiler model, the parser obtains a string of tokens from the lexical
analyzer, and verifies that the string can be generated by the grammar for the
source program. We expect the parser to report any syntax errors in an
intelligible fashion. It should also recover from commonly occurring errors
so that it can continue processing the remainder of its input.
The methods commonly used in compilers are classified as being either Top-
down or bottom up. As indicated by their names, Top down parsers build
parse trees from the top (root) to the bottom (leaves) and work up to the root.
In both cases, the input to the parser is scanned from left to right, one
symbol at time.
We assume the output of the parser is some representation of the parse tree
for the stream of tokens produced by the lexical analyzer. In practice there
are a number of tasks that might be conducted during parsing, such as
collecting information about various tokens into the symbol table,
performing type checking and other kinds of semantic analysis, and
generating intermediate code.
Example
Consider the previous arithmetic expression grammar, the sentence id+id*id
has the two distinct left most derivations:
With the two corresponding parse trees shown below:
Writing Grammar
Grammars are capable of describing most, but not all of syntax of
programming languages. A limited a mount of syntax Analysis is done by
lexical analyze as it produce the sequence of tokens from the input
characters, certain constraints on the input, such as the requirement that
identifiers be declared before they are used, can not be described by a
context-free-grammar.
Every construct that can be described by a regular expression can also be
described by a grammar. For example, the regular expression (a|b)*abb the
NFA is:
A0 aA0 | bA0 | aA1
A1 bA2
A2 bA3
A3 λ
The grammar above was constructed from NFA using the following
constructed:
• For each state i of NFA, create a non terminal symbol Ai.
• If state i has a transition to state j on symbol a, introduction the production
Ai aAj
• If state i goes to state j on input λ, introduce the production Ai Aj
• If state i is on accepting state introduce Ai λ
• If state i is the start state, make Ai be symbol of the grammar.
RE' S are most useful for describing the structure of lexical constructs such
as identifiers, constants, keywords, and so forth.
Grammars, on the other hand, are most useful in describing nested structures
such as balanced parenthesis, matching begin- end's. corresponding if - then-
else's.
These nested structures cannot be described by RE.
If El then S1 else if E2 then S2 else S3 has the parse tree link below:
The grammar above is ambiguous since the string
If E1 then if E2 then S1 else S2
Has the two parse trees shown below:
2- Left Recursion
A grammar is left recursion if it has a non terminal A, such that there is a
derivation A αA Fer some string α. Top-down parsing methods cannot
handle left recursion grammars, so a transformation that eliminates left
recursion is needed.
In the following example, we show how that left recursion pair of production
A αA|β could be replaced by the non left recursional productions:
A αA|β
A βA'
A' αA'| λ
E TE'
E' +T E'| λ
T FT'
T' *FT' | λ
F (E) | id
This produce eliminates all immediate left recursion from A and A'
production. but it does not eliminate left recursion involving derivation of
two or more steps.
S Aa | b
A Ac | Sd | λ
The non terminal S is left recursion because S Aa Sda, but is not
immediately left recursion.
3- Left Factoring
left factoring is a grammar transformation that is useful for producing a
grammar suitable for predictive parsing. The basic idea is that when it is not
clear which of two alternative productions to use to expand a nonterminal A,
we may be able to rewrite the A-productions to defer the decision until we
have seen enough of the input to make the right choice.
For example, if we have the two productions
on seeing the input token if, we cannot immediately tell which production to
choose to expand stmt. In general, if A α β1| α β2 are two A
productions, and the input begins with a non-empty string derived from α
,we do not know whether to expand A to α β1 or to α β2. However, we may
defer the decision by expanding A to αA', then after seeing the input derived
from α, we expand A' to β1 or to β2. that is, left factored, the original
production become:
A αA'
A' β1| β2
Top down parser
In this section there are basic ideas behind top-down parsing and show how
constructs an efficient non- backtracking form of top-down parser called a predictive
parser.
Top down parsing can be viewed as attempt to find a left most derivation for an input
string. Equivalently, it can be viewed as an attempt to construct a parse tree for the
input starting from the root and creating the nodes of the parse tree in preorder.
1
We now try the second alternative for A to obtain the tree:
In many cases, by carefully writing a grammar eliminating left recursion from it,
and left factoring the resulting grammar, we can obtain a grammar that can be parsed
by a non backtracking predictive parser.
We can build a predictive parser by maintaining a stack. The key problem during
predictive parser is that of determining the production to be applied for a
nonterminal. The nonrecursive parser looks up the production to be applied in a
parsing table.
A table-driven predictive parser has an input buffer, a stack, a parsing table, and
an output stream. The input buffer contains the string to be parsed, followed by $, (a
symbol used as a right endmarker to indicate the end of the input string). The stack
contains a sequence of grammar symbols with $ on the bottom,( indicating the
bottom of the stack). Initially, the stack contains the start symbol of the grammar on
2
the top of $. The parsing table is a two-dimensional array M[A,a], where A is a
nonterminal, and a is a terminal or the symbol $.
3
Algorithm nonrecusive predictive parser
Example:
E → TE'
E' → +TE' │ ε
T → FT'
4
T' → *FT' │ ε
F → (E) │ id
5
FIRST and FOLLOW:
Define the FIRST(α) to be the set of terminals that begin the strings
derived from α, and the FOLLOW(A) for nonterminal A, to be the set of terminals
a that can appear immediately to the right of A in some sentential form.
To compute FIRST(x) for all grammar symbols x, apply the following rules until no
more terminals or ε can be added to any first set.
S → aSb │ba │ ε
6
FIRST (a) ={a}
To compute FOLLOW(A) for all non terminals A, is the set of terminals that can
appear immediately to the right of A in some sentential form S → aAxB... To
compute Follow, apply these rules to all nonterminals in the grammar:
1- Place $ in FOLLOW(S) , where S is the start symbol and $ is the input right
end marker.
FOLLOW(START) = {$}
is placed in FOLLOW(A).
Example 1:
S → aSb │X
X → cXb │b
X → bXZ
7
Z→ n
First Follow
S a,c,b $,b
X c,b b,n,$
Z n b,n,$
Example 2:
S → bXY
X→b|c
Y→b|ε
First Follow
S b $
X b, c b,$
Y b,ε $
Example 3:
S → ABb │bc
A → ε │abAB
B → bc │cBS
First Follow
S b,a,c $,b,c,a
A ε, a b,c
8
B b,c b,c,a
Example 4:
X → ABC │ nX
A → bA │ bb │ ε
B → bA │CA
C → ccC │CA │ cc
First Follow
X n,b,c $
A b,ε b,c,$
B b,c c
C c b, $
H.W:
S → bSX | Y
X → XC | bb
Y → b | bY
C → ccC | CX | cc
Note: there is a left recursion problem here trying to solve this Problem and find
the first and follow for this grammar.
S → bSX | Y
X → bbX’ X’ → CX’| ε
Y → b | bY C → ccC | CX | cc
9
First Follow
S b $, b
X b $, b,c
X’ c,ε $, b ,c
Y b $, b
C c $, b ,c
10
Example:
E→ E+T │ T
T → T*F │ F
F → (E) │id
1- we must solve the left recursion and left factoring if it founded in the grammar
E → TE'
E' → +TE' │ ε
T → FT'
T' → *FT' │ ε
F → (E) │ id
11
First Follow
E ( , id $, )
T ( , id + , ) , id
E’ + , ε $, )
T’ * , ε + , ( , id
F ( , id + , * , ( , id
12
Since FIRST(TE’) = FIRST(T) = {(,id}, production E→TE’ causes M[E,( ] and
M[E,id] to acquire the entry E→TE’.
Production E’→+TE’ causes m[E’,+] to acquire E’→+TE’. Production
E’→ causes M[E’,)] and M[E’,$] to acquire E’→ since FOLLOW(E’) =
{ ),$}. So the parsing table produced by the previous algorithm
LL(1) grammars:
The previous algorithm can be applied to any grammar G to produce a parsing
table M. For some grammars, M may have some entries that are multiply
defined. If G is left recursive or ambiguous, then M will have at least one
multiply-defined entry.
Example:
FIRST(S) = { i, a} FOLLOW(S) = { $, e }
FIRST(S’) = { e, ε} FOLLOW(S’) = { $, e }
FIRST(E) = { b } FOLLOW(E) = { t }
So the parsing table for our grammar is:
13
The entry for M[S’,e] contains both S’→ eS and S’→ ε, since
FOLLOW(S’) ={e, $}. The grammar is ambiguous and the ambiguity is
manifested by a choice in what production to use when an e (else) is seen. We can
resolve the ambiguity if we choose S’→ eS. Note that the choice S’→ ε would
prevent e from ever being put on the stack or removed from the input, and is
therefore surely wrong.
An error is detected during predictive parsing when the terminal on top of the stack
does not match the next input symbol or when nonterminal A is on top of the stack, a
is the next input symbol, and the parsing table entry M[A,a] is empty.
Error recovery is based on the idea of skipping symbols on the input until a token in
selected set of synchronizing tokens appears. The sets should be chosen so that the
parser recovers quickly from errors that are likely to occur in practice.
14
We can place all symbols in FOLLOW(A) into the synchronizing set for
nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A
from the stack, it is likely that parsing can continue.
Example:
The parsing table for this grammar is repeated with synchronizing tokens
If the parser looks up entry M[A,a] and finds that it is blank, then the input
symbol a is skipped. If the entry is synchronize, then the nonterminal on top of
the stack is popped in an attempt to resume parsing.
15
16
Bottom – Up Parsing
Bottom up parsers start from the sequence of terminal symbols and work
their way back up to the start symbol by repeatedly replacing grammar
rules' right hand sides by the corresponding non-terminal. This is the reverse
of the derivation process, and is called "reduction".
Shift reduces parsing attempts to construct a parse tree for an input string
beginning at the leaves (bottom) and working up towards the root (the top).
There are two problems that must be solved if we are to parse by handle
pruning. The first is to determine the handel, and the second is to determine
what production to choose in case there is more than one production with
that substring on the right side. A convenient way to implement a shift
reduce parser is to use stack to hold grammar symbols and an input buffer to
hold the string (W) to be parsed. Use $ to mark the bottom of the stack and
also the right end of the input. Initially the stack is empty and the string (W)
is on the input, as follows:
Stack Input
$ W$
The parser operates by shifting zero or more input symbol onto the stack
until a handle β is on top of the stack. The parser then reduces β to the left
side of the appropriate production. The parser repeat this cycle until it has
detected an error or until the stack contains the start symbol and the input is
empty.
Stack Input
$ S $
After entering this configuration the parser halts and announces successful
completion of parsing.
1- Shift one symbol from the input onto the parse stack
2- Reduce one handle on the top of the parse stack. The symbols from
the right hand side of a grammar rule are popped of the stack, and the
nonterminal symbol is pushed on the stack in their place.
3- Accept is the operation performed when the start symbol is alone on
the parse stack and the input is empty.
4- Error actions occur when no successful parse is possible.
E → E+E
E → E*E
E → (E)
E → id
Example 2: parse the input id +* id for the same grammar
$ E+ * id2 $ Shift
$ E +* id $ Shift
$ E +*E $ Reduce by E→ id
E → E+T | T
T → T*F | F
F → id | (E)
1
Then the string with the precedence relations inserted is:
2
Stack Input
$ <. id + id $
We shift id giving:
$ <. id .> + id $
The handle id is reduced to F
$E <. + id $
$ <. E + <. id $
$ E + <.id .> $
$ <. E + E .> $
Now we are told to reduce the handle E+E:
$E $
h.w.:
try input with the following relations
3
LR parser
This section presents an efficient bottom-up syntax analysis technique
that can be used to parse a large class of context-free grammars. These
technique is called LR parsing; the L is for left-right scanning of the
input, the R for constructing a rightmost derivation in reverse.
This method present three techniques for construct an LR parsing table
for grammar .
The first method, called simple LR (SLR), is easiest to implement. But
the least powerful.
The second method, called canonical LR, is the most powerful and will
work on a very large class of grammars and the most expensive.
The third method, called look ahead LR (LALR), is intermediate in
power and cost between the SLR and the canonical LR methods.
[Link] Parser
This method of parsing is the weakest of three in terms of the number
of grammar for which it succeeds, but it is easiest to implement this
parsing method there are four basic steps:
1. Find first & follow.
2. Find set of I.
3. Find parsing table.
4. Check the sentence (parse the input).
GO TO operation
The second useful function is GOTO( I, X) where I is a set of items and
X is a grammar symbol. GOTO( I, X) is defined to be the closure of the
set of all items A αX. β such that A αX. β is in I.
procedure ITEMS(G');
begin
c = {closure({S̀ .S})};
repeat
for each set of items I in c and each grammar symbol x
such that GOTO(I, x) is not empty and is not in c do
add GOTO(I, x) to C
until no more sets of items can be added to C
end
Example: consider the grammar:
E E+T | T
T T*F | F
F (E) | id
2. Find set of I.
3. Find parsing table.
Notes:
1. A α . Bβ, X
α, β : anything
X : terminal or signdolar
B: nonterminal
2. first(βX)
α = λ , B= S, β= λ, X= $
first(βX)= first($)=$
0 cdcd$
0c3 dcd$
0c3d4 cd$
0c3C8 cd$
0C2 cd$
0C2c6 d$
0C2c6d7 $
0C2c6C9 $
0C2C5 $
0S1 $
Accept
LALR Parser
C d . , c|d C d.,$
Their union :
I47: C d . , c|d |$
Their union:
I89: C cC. , c|d |$
Type checking
A type checker verifies that the type of a construct matches that
expected by its context. For example, the –in arithmetic operator mod
in pascal requires integer operands, so a type checker must verify that
the operands of mod have type integer.
Intermediate Code Generation
Translate from abstract-syntax trees to intermediate codes.
Generating a low-level intermediate representation with two properties:
o It should be easy to produce
o It should be easy to translate into the target machine
One of the popular intermediate code is three-address code. A three-
address code:
Each statement contains at most 3 operands; in addition to “: =”,
i.e., assignment, at most one operator.
An” easy” and “universal” format that can be translated into most
assembly languages.
some of the basic operations which in the source program, to change in the
Assembly language:
operations H.L.L Assembly language
Math. OP +,-,*,/ Add, sub, mult, div
Boolean. OP &, | , ~ And, or, not
Assignment = mov
Jump goto JP, JN, JC
conditional If, then CMP
Loop instruction For, do, repeat These most have and I.C.G before
until, while do change it to assembly language.
The operation which change H.L.L to assembly language, is called the
intermediate code generation and there is the division operation come with
it, which mean every statement have a single operation.
Ex (1):
X=A+B*C/D–Y*N
T1 T3
T2
T4
T5
T1 = B * C
T2 = T1 / D
T3 = Y * N
T4 = A + T2
T5 = T4 - T3
Ex (2):
Y = cos (A * B) + C / N - Y * P
T1 T4 T3
T2
T5
T6
T1 = A * B
T2 = cos T1
T3 = Y * P
T4 = C / N
T5 = T2 + T4
T6 = T5 – T3
Mathematical operation
There are two kinds of operation, which are deals with
mathematical operation, such as the parsing for these operations:
1. Triple form
Ex: X = A + B * C / ( - N )
OP Arg1 Arg2
(0) * B C
(1) - N __
(2) / (0) (1)
(3) + A (2)
= X (3)
Ex: Y = A + C * X / B [i]
OP Arg1 Arg2
(0) * C X
(1) =[ ] B i
(2) / (0) (1)
(3) + A (2)
= Y (3)
Ex: X = A + B * ( c / d ) - y
OP Arg1 Arg2
(0) / c d
(1) * B (0)
(2) + A (1)
(3) - (2) y
= X (3)
Ex: A = C * X [i,j]
OP Arg1 Arg2
(0) =[ ] X P
(1) * C (0)
= A (1)
2. Quadruple form
Ex: X = A * C / N + P
OP Arg1 Arg2 Result
* A C t1
/ t1 N t2
+ t2 P t3
= t3 X
Ex: A = N[i] * C / N
OP Arg1 Arg2 Result
=[ ] N i t1
* t1 C t2
/ t2 N t3
= t3 A
Ex: A = C * y / X[i,j]
OP Arg1 Arg2 Result
* C y t1
=[ ] X P t2
/ t1 t2 t3
= t3 A
Ex: X = A + B * ( c / d ) - y
OP Arg1 Arg2 Result
/ c d t1
* B t1 t2
+ A t2 t3
- t3 y t4
= t4 X
Ex1: X = A + C * N – M
Y=B+C*N*e
Sol: Q = C * N
X=A+Q–M
Y=B+Q*e
Ex2:
Note: The value of the variable which are optimize will not be change.
2. Copy Propagation
Ex:
Ex1: X=3
If X > 4 Then
.
.
end
Ex2: A= false
If A Then
Begin
.
.
end
Code Motion
An important modification that decreases the amount of code in a
loop is code motion. This transformation takes an expression that yields
the same result independent of the number of times a loop is executed and
places the expression before the loop.
Sol: t = limit-2
While (I <= t )
Sol: P = C * y
While X<A + P do
Begin
.
.
end
Induction Variable
Ex: X = 2 * y + 2 * h + 4
Sol: X = 2 * (y + h) +4
Reduction in strength
Reduction in strength, which replaces an expensive operation by a
cheaper one, such as a multiplication by an addition.
Ex: t4 = 4*j - 4
Code Generation
Ex:
+
+
The code
Mov R0,a
Mov R1,b
Add R1,R0
Mov R2,c
Sub R0,R2
Add R1 , R0
Add R0 , R1
Mov d, R0