0% found this document useful (0 votes)
5 views76 pages

Compiler Design and Translation Basics

The document provides an overview of compilers, detailing their structure, phases, and the role of translators in programming languages. It explains the hierarchy of programming languages, the compilation process, and the types of errors that can occur during compilation. Additionally, it discusses the functions of lexical analyzers, symbol tables, and the importance of error detection and reporting in the compilation process.

Uploaded by

phd202520056
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views76 pages

Compiler Design and Translation Basics

The document provides an overview of compilers, detailing their structure, phases, and the role of translators in programming languages. It explains the hierarchy of programming languages, the compilation process, and the types of errors that can occur during compilation. Additionally, it discusses the functions of lexical analyzers, symbol tables, and the importance of error detection and reporting in the compilation process.

Uploaded by

phd202520056
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Save from: [Link].

iq

rd
3 class
Compilers
‫المترجمات‬
‫ عبير طارق مولود‬.‫م‬.‫ م‬:‫استاذة الماده‬
Compiler
References:
1. Principle of compiler design
Alfred V. Aho & Jeffrey D. Ullman
2. Basics of compiler Design
Torben Egidius Mogensen
3. Compilers : principles, techniques, and tools
Alfred V. Aho & Jeffrey D. Ullman

Programming Languages
Hierarchy of Programming Languages based on increasing machine
independence includes the following:
1- Machine – level languages.
2- Assembly languages.
3- High – level or user oriented languages.
4- Problem - oriented language.
1- Machine level language: is the lowest form of computer. Each instruction
in program is represented by numeric code, and numerical addresses are
used throughout the program to refer to memory location in the computers
memory.
2- Assembly language: is essentially symbolic version of machine level
language, each operation code is given a symbolic code such ADD for
addition and MULT for multiplication.
3- A high level language such as Pascal, C.
4- A problem oriented language provides for the expression of problems in
specific application or problem area .examples of such as languages are SQL
for database retrieval application problem oriented language.

Translator
A translator is program that takes as input a program written in a given
programming language (the source program) and produce as output program
in another language (the object or target program). As an important part of
this translation process, the compiler reports to its user the presence of errors
in the source program.

If the source language being translated is assembly language, and the object
program is machine language, the translator is called Assembler.
Fig (1)

A translator, which transforms a high level language such as C in to a


particular computers machine or assembly language, called Compiler.

Another kind of translator called an Interpreter process an internal form of


the source program and data at the same time. That is interpretation of the
internal source from occurs at run time and an object program is generated
Fig (2) which illustrate the interpretation process.
Data

source program Interpreter Result

Fig (2)

Compiler
Is a program (translator) that reads a program written in one language, (the
source language) and translates into an equivalent program in another
language (the target language).

Fig (3)
The time at which the conversion of the source program to an object
program occurs is called (compile time) the object program is executed at
(run time).
Fig (4) illustrate the compilation process Note that the program and data are
processed at different times, compile time and run time respectively.
Data

Source Compiler Object Execution Results


Program Program

Compile time Run time


Fig (4) Compilation process

The Analysis - Synthesis model of compilation


There are two parts to compilation: analysis and synthesis. The analysis part
breaks up the source program into constituent pieces and creates an
intermediate representation of the source program. The synthesis part
constructs the desired target program from the intermediate representation.
During analysis, the operations implied by the source program are
determined and recorded in a hierarchical structure called a tree. Often, a
special kind of tree called asyntax tree is used, in which each node
represents an operation and the children of a node represent the arguments of
the operation. For example, a syntax tree for an assignment statement is
shown below

Compiler structure :
A compiler operates in phases, each of which transforms the source program
from one representation to another. A typical decomposition of a compiler is
shown in fig (5).
Source program

Fig (5) Phases of a Compiler

Compiler structure:
1- lexical analysis
The lexical analyzer is the first stage of a compiler. Its main task is to read
the input characters and produce as output a sequence of tokens that the
parser uses for syntax analysis.
2- syntax analysis (parsing)
The syntax analysis (or parsing) is the process of determining if a string of
tokens can be generated by grammar. Every programming language has
rules that prescribe the syntactic structure of well-formed programs.
Syntax Analyzer takes an out of lexical analyzer and produces a large tree

3- Semantic analysis
The semantic analysis phase checks the source program for semantics errors
and gathers type information for the subsequent code-generation phase. It
uses the hierarchical structure determined by the syntax-analysis phase to
identify the operators and operands of expressions and statements.
Semantic analyzer takes the output of syntax analyzer and produces another
tree.

4- Intermediate code generation


Generate an explicit intermediate representation of the source program. This
representation should have two important properties, it should be easy to
produce and easy to translate into the target program.

5- Code Optimization
Attempts to improve the intermediate code so that faster running machine
code will result.

6- code generation
Generates a target code consisting normally of machine code or an assemble
code. Memory locations are selected for each of the variables used by the
program. Then intermediate instructions are each translated in to a sequence
of machine instructions that perform the same task.

- Symbol table management :


Portion of the compiler keeps tracks of the name used by the program and
records essential information about each, such as type ( integer, real, etc.).
The data structure used to record this information is called symbolic table.

-Error handler:
Is called when an error in the source program is detected. It must warn the
programmer by issuing a diagnostic, and adjust the information being passed
from phase to phase so that each phase can produced.
Types of errors
The syntax and semantic phases usually handle a large fraction of errors
detected by compiler.
1. Lexical error: The lexical phase can detect errors where the characters
remaining in the input do not form any token of the language . few errors are
discernible at the lexical level alone ,because a lexical analyzer has a very
localized view of the source program. Example : If the string fi is
encountered in a C program for the first time in context:
fi ( a== f(x)….
A lexical analyzer cannot tell whether fi is a misspelling of the keyword if or
an undeclared function name. since fi is a valid identifier, the lexical
analyzer must return the token for an identifier and let some other phase of
the compiler handle any error.

2- syntax error: The syntax phase can detect Errors where the token stream
violates the structure rules (syntax) of the language.

3- semantic error: During semantic analysis the compiler tries to detect


constructs that have the right syntactic structure but no meaning to the
operation involved, e.g., if we try to add two identifiers, one of which is the
name of an array, and the other the name of a procedure.

4- runtime error.

Error Detection and Reporting


Each phase can encounter errors after detecting an error, a phase must some
how deal with that error, so the compilation can proceed, allowing further
error in the source program to be detected. A compiler that stops when it
finds the first error is not as helpful as it could be.

Passes:
In an implementation of a compiler, portions of one or more phases are
combined into a module called a pass. A pass reads the source program or
the output of the previous pass, makes the transformations specified by its
phases and writes output into an intermediate file, which may then be read
by subsequent pass.
Lexical Analyzer
The lexical analyzer is the first phase of compiler. The main task of lexical
Analyzer is to read the input characters and produce a sequence of tokens
such as names, keywords, punctuation marks etc.. for syntax analyzer. This
interaction, summarized in fig.6, is commonly implemented by making the
lexical analyzer be a subroutine of the parser. Up on receiving a "get next
token" command from the parser, the lexical analyzer reads input characters
until it can identify the next token.

Fig. (6 ) Interaction of lexical analyzer with parser

Preliminary scanning :
Since the lexical analyzer is the part of compiler that reads the source text; it
may also perform certain secondary tasks at the user interface. One such task
is stripping out from the source program comments and white space in the
form of blank, tab, and new line characters. Another is correlating error
messages from the compiler with the source program. For example, the
lexical analyzer may keep track of the number of new line characters seen,
so that a line number can be associated with an error message.
Some times, lexical analyzers are divided into a cascade of two phases, the
first called "scanning" and the second "lexical analysis". The scanner is
responsible for doing simple tasks, while the lexical analyzer proper does the
more complex operations. For example, a FORTRAN compiler might use a
scanner to eliminate blanks from the input.
Tokens, Patterns, Lexemes
In general, there is a set of strings in the input for which the same token is
produced as output. This set of strings is described by a rule called a pattern
associated with the token. The pattern is said to match each string in the set.
A lexeme is a sequence of characters in the source program that is matched
by the pattern for a token.
Example: Const pi = 3.1416
The sub string pi is a lexeme for the token “identifier”

Fig (7) Examples of tokens

Token
A lexical token is a sequence of characters that can be treated as a unit in the
grammar of the programming languages. In most programming language ,
the following constructs are treated as tokens: keywords, operators,
identifiers, constants, literal strings (any characters between " and "), and
punctuations symbols such as parentheses, commas, and semicolons.
The lexical analyzer returns to parser a representation for the token it has
found. This representation is:
• an integer code if there is a simple construct such as a left
parenthesis, comma or colon .
• or a pair consisting of an integer code and a pointer to a table if the
token is more complex element such as an identifier or constant .
This integer code gives the token type. The pointer points to the value of the
token. For example we may treat "operator" as a token and let the second
component of the pair indicate whenever the operator found is +, *, and so
on.
Symbol Table
A symbol table is a table with two fields. A name field and an information
field. This table is generally used to store information about various source
language constructs. The information is collected by the analysis phase of
the compiler and used by the synthesis phase to generate the target code.
We required several capabilities of the symbol table we need to be able to:
1- Determine if a given name is in the table, the symbol table routines are
concerned with saving and retrieving tokens.
insert(s,t) : this function is to add a new name to the table
Lookup(s) : returns index of the entry for string s, or 0 if s is not found.
2- Access the information associated with a given name, and add new
information for a given name.
3- Delete a name or group of names from the tables.
For example consider tokens begin , we can initialize the symbol-table using
the function: insert("begin",1)

Any subsequent call lookup("begin") returns the token begin, so begin


cannot be used as an identifier.
Input buffer
Lexical analyzer scans the characters of the source program one at a time to
discover tokens. It is desirable for the lexical analyzer to input from buffer.
One pointer marks the beginning of the token being discovered. A look
ahead pointer scans ahead of the beginning pointer, until a token is
discovered.

beginning pointer

A simple approach to the design of lexical analysis


One way to begin the design of any program is to describe the behavior of
the program by a flowchart.
Remembering previous character by the position flowchart is a valuable tool,
so that a specialized kind of flowchart for lexical analyzer, called transition
diagram, has evolved.
In transition diagram, the boxes of the flowchart are drawn as circle and
called states. The states are connected by arrows called edge. The labels on
the various edges leaving a state indicate the input characters that can appear
after that state.
To turn a collection of transition diagrams into a program, we construct a
segment of code for each state. The first step to be done in the code for any
state is to obtain the next character from the input buffer. For this purpose
we use a function GETCHAR, which returns the next character, advancing
the look ahead pointer at each call. The next step is to determine which edge,
if any out of the state is labeled by a character, or class of characters that
includes the character just read. If no such edge is found, and the state is not
one which indicates that a token has been found ( indicated by a double
circle), we have failed to find this token. The look ahead pointer must be
retracted to where the beginning pointer is, and another token must be search
for using another token diagram. If all transition diagrams have been tried
without success, a lexical error has been detected and an error correction
routine must be called.
State 0 : C = GETCHAR ( )
if LETTER(C) then goto state1
else FAIL( )
State1 : C= GETCHAR ( )
if LETTER(C) or DIGIT(C) then goto state1
else if DELIMITER(C) then goto state2
else FAIL ( )
State2: RETRACT( )
return( id,INSTALL( ) )

LETTER(C) is a procedure which return true if and only if C is a letter.


DIGIT(C) is a procedure which return true if and only if C is one of the
digits 0,1,…,9.
DELIMITER(C) is a procedure which return true whenever C is character
that could follow an identifier. The delimiter may be: blank, arithmetic or
logical operator, left parenthesis, equals sign, comma,…
State2 indicates that an identifier has been found.
Since the delimiter is not part of the identifier, we must retract the look
ahead pointer one character, for which we use a procedure RETRAC. We
must install the newly found identifier in the symbol table if it is not already
there , using the procedure INSTALL, In state2 we return to the parser a pair
consisting of integer code for an identifier, which we denoted by id, and a
value that is a pointer to the symbolic table returned by INSTALL.
A more efficient program can be constructed from a single transition
diagram than from a collection of diagrams, since there is no need to
backtrack and rescan using a second transition diagram.
identifier:

Constant:

start
26
relops:
S2 = SS or
o S.S
3
S = SSS or S.S.S
S4 = SSSS or S.S.S
S.S and so on

By definiition S0 is an empty string, , and S` = S. For exaample, if x =ba and


d na then
2
xy = bannana.

Languagges
A languaage is a sett of stringss over som
me fixed allphabet. The
T languaage may co
ontain a
finite or an
a infinitee number ofo strings.

Let L andd M be two languagges where L = {dog


g, ba, na} and
a M = {{house, baa} then

• Unnion: LUM M = {dog, ba, na, hoouse}


• Cooncatenatioon: LM = {doghousse, dogba, bahouse, baba, nahhouse, nab
ba}
2
• Exxponentiatiion: L = LL
L
0
• Byy definitionn: L ={ } and L` = L

The kleenne closuree of languaage L, dennoted by L*,


L is "zeroo or more Concaten
nation of"
L.

L* = L0 U L` U L2 U L3 . . . U Ln . . .

mple, If L = {a, b}, then


For exam t

L* = { , a, b, aa, ab,
a ba, bb, aaa, aba, baa, . . . }

The posittive closurre of Langguage L, denoted


d y L+, is "onne or moree Concateenation
by
of" L.

L+ = L` U L2 U L3 . . . U Ln . . .

For exam
mple, If L = {a, b}, then
t

L+ = {a, b,
b aa, ba, bb,
b aaa, abba, . . . }

Regular Definition
D ns
Regular expression
e ns is a useful notatioon suitablee for descrribing tokens . A reggular
definition
n gives names to certain regu ular expresssions and d uses thoose namess in other
regular expressions.
Example1
1:

keywo
ord = BEGIN|END|IF|THEN|EELSE

Identiifier = letter (letter||digit)*

constaant = digitt+

relop = <|<=|=|<>|>|>=

example22: Here is a regular definitionn for the seet of Pascaal identifieers that is define as
the set off strings off letter annd digits beginning with
w a lettters.

letter → A | B | . . . | Z | a | b | . . . | z
digit → 0 | 1 | 2 | . . . | 9
id → leetter (letteer | digit)*

The regullar expression id is the patternn for the Pascal


P idenntifier tokken and deefines
letter andd digit.
Where letter is a reegular exppression foor the set of
o all uppeer-case annd lower- case
c
letters in the alphabbet and diigit is the regular
r for the set of
o all decim mal digits..

Example33: The patttern for thhe Pascal unsigned


u token
t can be speciffied as folllows:

digit → 0 | 1 | 2 | . . . | 9
digit → digit
d digit**

Optimaal-fractionn → . diigits | ε
Optimaal-exponeent → (E (+( | - | ) digits)
d |ε
num m → digiits optimaal-fraction
n optimal--exponent

This reguular definittion says that


t

• Ann optimal-fraction iss either a decimal


d pooint followwed by onne or more digits or
it is
i missing (i.e., an empty
e strinng).
• Ann optimal-exponent is either an a empty string
s or itt is the lettter E follo
owed by
an ' optimal + or - signn, followeed by one or
o more diigits.

Finite au
utomata

A recognnizer for a language is a progrram that taakes a strinng x as ann input and
d answers
"yes" if x is a senteence of thee languagee and "no"" otherwisse.

One can compile


c any regularr expressioon into a recognizer
r r by constrructing a
generalizzed transitiion diagraam called a finite au
utomation.
Nondeterministic Finite Automata (NFA)

A nondeterministic finite automation is a mathematical model consists of

1. a set of states S;
2. a set of input symbol, ∑, called the input symbols alphabet.
3. a transition function move that maps state-symbol pairs to sets of states.
4. a state so called the initial or the start state.
5. a set of states F called the accepting or final state.

An NFA can be described by a transition graph (labeled graph) where the nodes are
states and the edges shows the transition function.

The labeled on each edge is either a symbol in the set of alphabet, ∑, or denoting
empty string.

Following figure shows an NFA that recognizes the language: (a | b)* a bb.

This automation is nondeterministic because when it is in state-0 and the input


symbol is a, it can either go to state-1 or stay in state-0.

The transition is
The advantage of transition table is that it provides fast access to the transitions of
states and the disadvantage is that it can take up a lot of space.

The following diagram shows the move made in accepting the input strings aabb :

In general, more than one sequence of moves can lead to an accepting state. If at least
one such move ended up in a final state

The language defined by an NFA is the set of input strings that particular NFA
accepts.

Following figure shows an NFA that recognize aa* | bb*.


String aaa is accepted by moving through states 0,1,2,2 and 2

Deterministic Finite Automata (DFA)

A deterministic finite automation is a special case of a non-deterministic finite


automation (NFA) in which

1. no state has an -transition


2. for each state s and input symbol a, there is at most one edge labeled a leaving
s.

A DFA has at most one transition from each state on any input. It means that each
entry in the transition table is a single state (as oppose to set of states in NFA).

it is very easy to determine whether a DFA accepts an input string, since there is at
most one path from the start state labeled by that string.

Algorithm for Simulating a DFA

INPUT:

• string x
• a DFA with start state, so . . .
• a set of accepting state's F.

OUTPUT:

• The answer 'yes' if D accepts x; 'no' otherwise.

The function move (S, C) gives a new state from state S on input character C.
The function 'nextchar' returns the next character in the string.

Initialization:
S := S0
C := nextchar;

while not end-of-file do


S := move (S, C)
C := nextchar;

end

If S is in F then
return "yes"

else
return "No".

example :

Following figure shows a DFA that recognizes the language (a|b)*abb.

The transition table is

state a b
0 1 0
1 1 2
2 1 3
3 1 0

With this DFA and the input string "ababb", above algorithm follows the sequence of
states: 0,1,2,1,2,3 and returns "yes"
Syntax Analysis
In our compiler model, the parser obtains a string of tokens from the lexical
analyzer, and verifies that the string can be generated by the grammar for the
source program. We expect the parser to report any syntax errors in an
intelligible fashion. It should also recover from commonly occurring errors
so that it can continue processing the remainder of its input.

Fig(8) Position of parser in Compiler model

The methods commonly used in compilers are classified as being either Top-
down or bottom up. As indicated by their names, Top down parsers build
parse trees from the top (root) to the bottom (leaves) and work up to the root.
In both cases, the input to the parser is scanned from left to right, one
symbol at time.
We assume the output of the parser is some representation of the parse tree
for the stream of tokens produced by the lexical analyzer. In practice there
are a number of tasks that might be conducted during parsing, such as
collecting information about various tokens into the symbol table,
performing type checking and other kinds of semantic analysis, and
generating intermediate code.

Parse Tree and Derivations


A parse tree may be viewed as a graphical representation for an derivation
that fillers out the choice regarding replacement order, that each interior
node of parse tree is labeled by some non terminal A, and that the children
of the node are labeled, from left to right, by the symbols in the right side of
the production by which A was replaced in the derivation, the leaves of the
parse tree are labeled by non terminals or terminals and, read from left to
right, they constitute a sentential form, called the yield or frontier of the tree.
For example, the parse tree for -(id+id) implied previously is shown bellow,
For the grammar
E E+E | E-E | E*E | E/E

Parse tree ignores variations in the order in which symbols in sentential


forms are replaced, these variations in the order in which productions are
applied can also be eliminated by considering only left- most or right- most
derivations. It is not hard to see that every parse tree has associated with it
unique left most and unique right most derivations.

Example
Consider the previous arithmetic expression grammar, the sentence id+id*id
has the two distinct left most derivations:
With the two corresponding parse trees shown below:

Two parse trees for id+id*id

Writing Grammar
Grammars are capable of describing most, but not all of syntax of
programming languages. A limited a mount of syntax Analysis is done by
lexical analyze as it produce the sequence of tokens from the input
characters, certain constraints on the input, such as the requirement that
identifiers be declared before they are used, can not be described by a
context-free-grammar.
Every construct that can be described by a regular expression can also be
described by a grammar. For example, the regular expression (a|b)*abb the
NFA is:
A0 aA0 | bA0 | aA1
A1 bA2
A2 bA3
A3 λ
The grammar above was constructed from NFA using the following
constructed:
• For each state i of NFA, create a non terminal symbol Ai.
• If state i has a transition to state j on symbol a, introduction the production
Ai aAj
• If state i goes to state j on input λ, introduce the production Ai Aj
• If state i is on accepting state introduce Ai λ
• If state i is the start state, make Ai be symbol of the grammar.

RE' S are most useful for describing the structure of lexical constructs such
as identifiers, constants, keywords, and so forth.
Grammars, on the other hand, are most useful in describing nested structures
such as balanced parenthesis, matching begin- end's. corresponding if - then-
else's.
These nested structures cannot be described by RE.

1- Ambiguity: (problems of grammar)


A grammar that produces more than one parse tree for some sentence is
said to be ambiguous. An ambiguous grammar is one that produces more
than one leftmost or more than one right most derivation for the same
sentence. For certain types of parsers, it is desirable that the grammar be
made unambiguous, for if it is not, we can not uniquely determine which
parse tree to select for a sentence.

Sometimes an ambiguous grammar can be rewritten to eliminate the


ambiguity. As an example ambiguous "else" grammar

Stmt Expr then Stmt


| if Expr then Stmt else Stmt
|other

According to this grammar, the compound conditional statement

If El then S1 else if E2 then S2 else S3 has the parse tree link below:
The grammar above is ambiguous since the string
If E1 then if E2 then S1 else S2
Has the two parse trees shown below:
2- Left Recursion
A grammar is left recursion if it has a non terminal A, such that there is a
derivation A αA Fer some string α. Top-down parsing methods cannot
handle left recursion grammars, so a transformation that eliminates left
recursion is needed.
In the following example, we show how that left recursion pair of production
A αA|β could be replaced by the non left recursional productions:
A αA|β
A βA'
A' αA'| λ

Example: Consider the following grammar for arithmetic expressions.


E E+T |T
T T*F|F
F (E) | id

Eliminating the immediate left recursion (productions of the form A Aα


to the production for E and then for T, we obtain:

E TE'
E' +T E'| λ
T FT'
T' *FT' | λ
F (E) | id

No matter how many A productions there are, we can eliminate immediate


left recursion from them by the following technique. First we group the A
production as
A Aα1| Aα2| …| Aαn| β1| β2| … βn

where no βi begins with an A. then, we replace the A -productions by


A β1A'| β2A'| …| βnA'
A α1A'| α2A'| …| αnA' | λ

This produce eliminates all immediate left recursion from A and A'
production. but it does not eliminate left recursion involving derivation of
two or more steps.
S Aa | b
A Ac | Sd | λ
The non terminal S is left recursion because S Aa Sda, but is not
immediately left recursion.

3- Left Factoring
left factoring is a grammar transformation that is useful for producing a
grammar suitable for predictive parsing. The basic idea is that when it is not
clear which of two alternative productions to use to expand a nonterminal A,
we may be able to rewrite the A-productions to defer the decision until we
have seen enough of the input to make the right choice.
For example, if we have the two productions

Stmt if Expr then Stmt else Stmt


| if Expr then Stmt

on seeing the input token if, we cannot immediately tell which production to
choose to expand stmt. In general, if A α β1| α β2 are two A
productions, and the input begins with a non-empty string derived from α
,we do not know whether to expand A to α β1 or to α β2. However, we may
defer the decision by expanding A to αA', then after seeing the input derived
from α, we expand A' to β1 or to β2. that is, left factored, the original
production become:

A αA'
A' β1| β2
Top down parser

In this section there are basic ideas behind top-down parsing and show how
constructs an efficient non- backtracking form of top-down parser called a predictive
parser.

Top down parsing can be viewed as attempt to find a left most derivation for an input
string. Equivalently, it can be viewed as an attempt to construct a parse tree for the
input starting from the root and creating the nodes of the parse tree in preorder.

The following grammar requires backtracking:

1
We now try the second alternative for A to obtain the tree:

Predictive Parsing Method

In many cases, by carefully writing a grammar eliminating left recursion from it,
and left factoring the resulting grammar, we can obtain a grammar that can be parsed
by a non backtracking predictive parser.

We can build a predictive parser by maintaining a stack. The key problem during
predictive parser is that of determining the production to be applied for a
nonterminal. The nonrecursive parser looks up the production to be applied in a
parsing table.

A table-driven predictive parser has an input buffer, a stack, a parsing table, and
an output stream. The input buffer contains the string to be parsed, followed by $, (a
symbol used as a right endmarker to indicate the end of the input string). The stack
contains a sequence of grammar symbols with $ on the bottom,( indicating the
bottom of the stack). Initially, the stack contains the start symbol of the grammar on

2
the top of $. The parsing table is a two-dimensional array M[A,a], where A is a
nonterminal, and a is a terminal or the symbol $.

The parser is controlled by a program that behaves as follows. The program


considers X, the symbol on top of the stack, and a, the current input symbol. These
two symbols determine the action of the parser. There are three possibilities.

If M[X,a]= error, the parser calls an error recovery routine.

3
Algorithm nonrecusive predictive parser

Example:

Parse the input id * id + id in the grammar:

E → TE'

E' → +TE' │ ε

T → FT'

4
T' → *FT' │ ε

F → (E) │ id

The parse table M for the grammar:

The moves made by predictive parser on input id+id*id

5
FIRST and FOLLOW:

The construction of a predictive parser is aided by two functions


associated with a grammar G. These functions, FIRST and FOLLOW,
allow us to fill in the entries of a predictive parsing table for G, whenever
possible.

Define the FIRST(α) to be the set of terminals that begin the strings
derived from α, and the FOLLOW(A) for nonterminal A, to be the set of terminals
a that can appear immediately to the right of A in some sentential form.

To compute FIRST(x) for all grammar symbols x, apply the following rules until no
more terminals or ε can be added to any first set.

1- If x is terminal, then FIRST(x) is {x}.

2- If X→ a ; is a production, then add a to FIRST(X) and

If X → ε ; is a production, then add ε to FIRST(X).

3- If X is nonterminal and X→Y1,Y2…Yi ; is a production, then add FIRST(Y1) to


FIRST(X).
4- a- for (i = 1; if Yi can derive epsilon ε; i++)
b- add First(Yi+1) to First(X)
If Y1 does not derive ε , then we add nothing more to FIRST(X), but if
Y1→ ε , then we add FIRST(Y2) and so on .

First function example

1- FIRST (terminal) = {terminal }

S → aSb │ba │ ε

6
FIRST (a) ={a}

FIRST (b) ={b}

2- FIRST(non terminal) = FIRST (first char)

FIRST (S)= {a,b, ε }

To compute FOLLOW(A) for all non terminals A, is the set of terminals that can
appear immediately to the right of A in some sentential form S → aAxB... To
compute Follow, apply these rules to all nonterminals in the grammar:

1- Place $ in FOLLOW(S) , where S is the start symbol and $ is the input right
end marker.

FOLLOW(START) = {$}

2- If there is a production X→ α Aβ , then everything in FIRST(β) except for ε

is placed in FOLLOW(A).

i.e. FOLLOW(A) = FIRST(β)

3- If there is a production X→ α A, or a production X→ α Aβ , where FIRST(β)

Contains ε (β → ε ), then everything in FOLLOW(X) is in FOLLOW(A).

i.e. : FOLLOW(A)= FOLLOW(X)

Follow function examples:

Example 1:

S → aSb │X

X → cXb │b

X → bXZ

7
Z→ n

First Follow

S a,c,b $,b

X c,b b,n,$

Z n b,n,$

Example 2:

S → bXY

X→b|c

Y→b|ε

First Follow

S b $

X b, c b,$

Y b,ε $

Example 3:

S → ABb │bc

A → ε │abAB

B → bc │cBS

First Follow

S b,a,c $,b,c,a

A ε, a b,c

8
B b,c b,c,a

Example 4:

X → ABC │ nX

A → bA │ bb │ ε

B → bA │CA

C → ccC │CA │ cc

First Follow

X n,b,c $

A b,ε b,c,$

B b,c c

C c b, $

H.W:

S → bSX | Y

X → XC | bb

Y → b | bY

C → ccC | CX | cc

Note: there is a left recursion problem here trying to solve this Problem and find
the first and follow for this grammar.

S → bSX | Y

X → bbX’ X’ → CX’| ε

Y → b | bY C → ccC | CX | cc

9
First Follow

S b $, b

X b $, b,c

X’ c,ε $, b ,c

Y b $, b

C c $, b ,c

Construction of predictive parsing tables:

The following algorithm can be used to construct a predictive parsing


table for a grammar G. The idea behind the algorithm is the following :
Suppose A→ α is a production with a in FIRST(α). Then the parser will expand
A by α when the current input symbol is a. The only complication occurs when
α→ . In this case, we should again expand A by α if the current input symbol is
in FOLLOW(A).

Algorithm construction of predictive parsing table:

10
Example:

E→ E+T │ T

T → T*F │ F

F → (E) │id

Parse the input id * id + id by using predictive parsing:

1- we must solve the left recursion and left factoring if it founded in the grammar

E → TE'

E' → +TE' │ ε

T → FT'

T' → *FT' │ ε

F → (E) │ id

2- we must find the first and follow to the grammar:

11
First Follow

E ( , id $, )

T ( , id + , ) , id

E’ + , ε $, )

T’ * , ε + , ( , id

F ( , id + , * , ( , id

3- We must find or construct now the predictive parsing table

12
Since FIRST(TE’) = FIRST(T) = {(,id}, production E→TE’ causes M[E,( ] and
M[E,id] to acquire the entry E→TE’.
Production E’→+TE’ causes m[E’,+] to acquire E’→+TE’. Production
E’→ causes M[E’,)] and M[E’,$] to acquire E’→ since FOLLOW(E’) =
{ ),$}. So the parsing table produced by the previous algorithm

LL(1) grammars:
The previous algorithm can be applied to any grammar G to produce a parsing
table M. For some grammars, M may have some entries that are multiply
defined. If G is left recursive or ambiguous, then M will have at least one
multiply-defined entry.
Example:

FIRST(S) = { i, a} FOLLOW(S) = { $, e }
FIRST(S’) = { e, ε} FOLLOW(S’) = { $, e }
FIRST(E) = { b } FOLLOW(E) = { t }
So the parsing table for our grammar is:

13
The entry for M[S’,e] contains both S’→ eS and S’→ ε, since
FOLLOW(S’) ={e, $}. The grammar is ambiguous and the ambiguity is
manifested by a choice in what production to use when an e (else) is seen. We can
resolve the ambiguity if we choose S’→ eS. Note that the choice S’→ ε would
prevent e from ever being put on the stack or removed from the input, and is
therefore surely wrong.

A grammar whose parsing table has no multiply-defined entries is said to be


LL(1). The first “L” in LL(1) indicates the reading direction (left-to-right), the
second “L” indicates the derivation order (left), and the “1” indicates that there is
a one-symbol or lookahead at each step to make parsing action decisions.

Error Detection and Reporting

An error is detected during predictive parsing when the terminal on top of the stack
does not match the next input symbol or when nonterminal A is on top of the stack, a
is the next input symbol, and the parsing table entry M[A,a] is empty.

Error recovery is based on the idea of skipping symbols on the input until a token in
selected set of synchronizing tokens appears. The sets should be chosen so that the
parser recovers quickly from errors that are likely to occur in practice.

14
We can place all symbols in FOLLOW(A) into the synchronizing set for
nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A
from the stack, it is likely that parsing can continue.

Example:

Using FOLLOW symbols as synchronizing tokens works reasonably well when


expressions are parsed according to the grammar:

The parsing table for this grammar is repeated with synchronizing tokens

If the parser looks up entry M[A,a] and finds that it is blank, then the input
symbol a is skipped. If the entry is synchronize, then the nonterminal on top of
the stack is popped in an attempt to resume parsing.

15
16
Bottom – Up Parsing

Bottom up parsers start from the sequence of terminal symbols and work
their way back up to the start symbol by repeatedly replacing grammar
rules' right hand sides by the corresponding non-terminal. This is the reverse
of the derivation process, and is called "reduction".

Example: consider the grammar

The sentence abbcde can be reduced to S by the following steps:

Definition: a handle is a substring that

1- matches a right hand side of a production rule in the grammar and


2- Whose reduction to the nonterminal on the left hand side of that
grammar rule is a step along the reverse of a rightmost derivation.

There is a general style of bottom-up syntax analysis, known as shift


reduces parsing.

An easy to implement form of this parsing, called operator precedence


parsing.
A much more general method of shift reduce parsing called LR parsing ,
used in a number of automatic parsing generators.

Shift reduces parsing attempts to construct a parse tree for an input string
beginning at the leaves (bottom) and working up towards the root (the top).

Shift Reduce Parsing Method

There are two problems that must be solved if we are to parse by handle
pruning. The first is to determine the handel, and the second is to determine
what production to choose in case there is more than one production with
that substring on the right side. A convenient way to implement a shift
reduce parser is to use stack to hold grammar symbols and an input buffer to
hold the string (W) to be parsed. Use $ to mark the bottom of the stack and
also the right end of the input. Initially the stack is empty and the string (W)
is on the input, as follows:

Stack Input

$ W$

The parser operates by shifting zero or more input symbol onto the stack
until a handle β is on top of the stack. The parser then reduces β to the left
side of the appropriate production. The parser repeat this cycle until it has
detected an error or until the stack contains the start symbol and the input is
empty.

Stack Input

$ S $
After entering this configuration the parser halts and announces successful
completion of parsing.

At each step, the parser performs one of the following actions.

1- Shift one symbol from the input onto the parse stack
2- Reduce one handle on the top of the parse stack. The symbols from
the right hand side of a grammar rule are popped of the stack, and the
nonterminal symbol is pushed on the stack in their place.
3- Accept is the operation performed when the start symbol is alone on
the parse stack and the input is empty.
4- Error actions occur when no successful parse is possible.

Example 1: parse the input id +id *id for this grammar

E → E+E

E → E*E

E → (E)

E → id
Example 2: parse the input id +* id for the same grammar

Stack Input Action

$ id1 + * id2 $ Shift

$ id1 + * id2 $ Reduce by E → id

$ E+ * id2 $ Shift

$ E+* id2 $ Shift

$ E +* id $ Shift

$ E +*E $ Reduce by E→ id

$ E +*E $ Not Accept

H.W. : For this grammar

E → E+T | T

T → T*F | F

F → id | (E)

Parse the input id * id + id


1­ Operator­precedence parser:
The operator-precedence parser is a shift –reduce parser that can be
easily constructed by hand. It used for a small class of grammars which
is called operator grammar. These grammars have the property:
- That no production right side is ε
- And no production right side has two adjacent nonterminals.
Example: The following grammar for expressions:
E → EAE | (E) | –E |id
A→+|-|*|/|^
Is not an operator grammar, because the right side EAE has two (in fact
three) consecutive nonterminals. However, if we substitute for A each of
its alternatives, we obtain the following operator grammar:
E → E+E | E-E | E*E | E/E | E^E| (E) | –E | id
In operator-precedence parser, we define three disjoint precedence
relations, <. , =, and .>, between certain pairs of terminals. These
precedence relations guide the selection of handlers. If a <. b, we say a
“yields precedence” to b; if a .> b , a “takes precedence over” b.
Now suppose we remove the nonterminals from the string and place the
correct relation, <. , = or .> between each pair of terminals and between
the endmost terminal and the $’s marking the ends of the string. For
example, suppose we initially have the right-sentential form id + id * id
and precedence relations are those given in below

1
Then the string with the precedence relations inserted is:

The handle can be found by the following process:


1- Scan the string from the left end until the first .> is encountered. In
our example, this occurs between the first id and +.
2- Then scan backwards (to the left) over any = until a<. is
encountered, we scan back to $
3- The handle contains everything to the left of the first .> and to the
right of the <. encountered in step (2).
So our first handler is the first id. We then reduce id to E. At this
point we have the sentential form E+id *id. After reducing the two
remaining id’s to E by the same steps, we obtain the right-sentential
form E+E*E. consider now the string $+*$ obtained by deleting the
nonterminals. Inserting the precedence relations, we get:

These precedence relations indicate that, in E+E*E, the handle is E*E


and then E+E.
Stack implementation of operator precedence parser:
Suppose we are given the expression id+id to parse we set up the
stack and input as:

2
Stack Input
$ <. id + id $
We shift id giving:
$ <. id .> + id $
The handle id is reduced to F
$E <. + id $
$ <. E + <. id $
$ E + <.id .> $
$ <. E + E .> $
Now we are told to reduce the handle E+E:
$E $
h.w.:
try input with the following relations

3
LR parser
This section presents an efficient bottom-up syntax analysis technique
that can be used to parse a large class of context-free grammars. These
technique is called LR parsing; the L is for left-right scanning of the
input, the R for constructing a rightmost derivation in reverse.
This method present three techniques for construct an LR parsing table
for grammar .
The first method, called simple LR (SLR), is easiest to implement. But
the least powerful.
The second method, called canonical LR, is the most powerful and will
work on a very large class of grammars and the most expensive.
The third method, called look ahead LR (LALR), is intermediate in
power and cost between the SLR and the canonical LR methods.

[Link] Parser
This method of parsing is the weakest of three in terms of the number
of grammar for which it succeeds, but it is easiest to implement this
parsing method there are four basic steps:
1. Find first & follow.
2. Find set of I.
3. Find parsing table.
4. Check the sentence (parse the input).

The CLOSURE operation


If I is a set of items for a grammar G then the set of items CLOSURE(I)
is constructed from I by the rules:
1. Every item in I is in CLOSURE(I).
2. lf A α.Bβ is in CLOSURE(I) and B ‫ ץ‬is a production, then add
the item B . ‫ ץ‬to I, if it is not already there.

Example: consider the grammar:


The function closure can be computed bellow:
function CLOSURE(I);
begin
J=I;
repeat
for each item A α.Bβ in I and each production B ‫ ץ‬in G such
that B . ‫ ץ‬is not in I do
add B . ‫ ץ‬to J.
until no more items can be added to J;
return J
end

GO TO operation
The second useful function is GOTO( I, X) where I is a set of items and
X is a grammar symbol. GOTO( I, X) is defined to be the closure of the
set of all items A αX. β such that A αX. β is in I.

The Set of items Constructions:

procedure ITEMS(G');
begin
c = {closure({S̀ .S})};
repeat
for each set of items I in c and each grammar symbol x
such that GOTO(I, x) is not empty and is not in c do
add GOTO(I, x) to C
until no more sets of items can be added to C
end
Example: consider the grammar:
E E+T | T
T T*F | F
F (E) | id

1. Find first and follow


First Follow
E ( , id $,),+
T ( , id $ , ) , + ,*
F ( , id $ , ) , +, *

2. Find set of I.
3. Find parsing table.

4. Check the sentence (parse the input).


SLR Parsing tables

INPUT: An augmented grammar G̀


OUTPUT: The SLR parsing table functions action and GOTO for G̀
Canonical LR Parser
It is possible to carry more information in the state by re- defining items,
to include a terminal symbol as a second component. The general form of
an item becomes { A α . Bβ, X}, where A α . B is a production
and X is a terminal or the right end marker $, we call such an object an
LR(1) item.
parsing method there are four basic steps:
1. Find first function
2. Find set of I.
3. Find canonical parsing table.
4. Check the sentence (parse the input).

Algorithm :Construction of the sets of LR(1) items for a grammar


Input : a grammar G
Output: The sets of LR(l) items
Method: Using the procedure Closure and GO TO and the main routine
for constructing the sets of items.
An LR parser using a previous table is called the canonical LR parser.

Notes:
1. A α . Bβ, X
α, β : anything
X : terminal or signdolar
B: nonterminal

2. first(βX)

α = λ , B= S, β= λ, X= $
first(βX)= first($)=$

Example: consider the following grammar


S CC
C cC | d

1. find the first function


first
S c,d
C c,d
2. find set of I

[Link] canonical LR parsing table

4. check the input string cdcd


stack Input

0 cdcd$
0c3 dcd$
0c3d4 cd$
0c3C8 cd$
0C2 cd$
0C2c6 d$
0C2c6d7 $
0C2c6C9 $
0C2C5 $
0S1 $
Accept
LALR Parser

we now introduce LALR { Look ahead LR } technique. For a


comparison of parsing size, the SLR and LALR tables for a grammar
always have the same number of states, and a smaller than canonical LR
table. Thus, it is much easier and more economical to construct SLR or
LALR tables than the canonical LR tables.

Example: Consider the grammar


S CC
C cC | d

As we mentioned, there are three pairs of sets of items that can be


merged.
I3 and I6

C c.C, c|d C c.C, $


C .cC, c|d C .cC, $
C .d, c|d C .d, $
These two sets are replaced by their union:
I36: C c.C, c|d |$
C .cC, c|d |$
C .d, c|d |$
Also I4 and I7

C d . , c|d C d.,$

Their union :
I47: C d . , c|d |$

And also I8 and I9

C cC. , c|d C cC. , $

Their union:
I89: C cC. , c|d |$

The LALR parsing table is:

State Action Goto


c d $ S C
0 s36 s47 1 2
1 acc
2 s36 s47 5
36 s36 s47 89
47 r3 r3 r3
5 r1
89 r2 r2 r2
Conflict in shift-Reduce parsing
"Conficts" occur when an ambiguity in the grammar creates a situation where
the parser does not know which step to perform at a given point during parsing. There
are two kinds of conflicts that occur
1. shift-reduce: a shift reduce conflict occurs when the grammar indicates that
different successful parses might occur with either a shift or a reduce at a given point
during parsing.
The vast majority of situations where this conflict occurs can be correctly resolved by
shifting.
2- reduce-reduce : a reduce-reduce conflict occurs when the parser has two or more
handles at the same time on the top of the stack. Whatever choice the parser makes is
just as likely to be wrong as not. In this case it is usually best to rewrite the grammar
to eliminate the conflict, possibly by factoring.

Example1: shift reduce conflict:

Example2: reduce reduce conflict:


The relative precedence of + followed by * uniquely determines how the parsing
action conflict between reducing E E+E and shifting on * in state 7 should be
resolved.
Error recovery in LR parsing
An LR parser will announce error as soon as there is no valid continuation for
the portion of the input thus for scanned. Thus we may fill in each blank entry in the
action field with a pointer to an error routine that will take an appropriate action
selected by the compiler designer. The action may include insertion or deletion of
symbols from the stack or input or both, or alteration and transposition of input
symbols.
Semantic Analysis
 The semantic analysis phase checks the source program for
semantic errors and gathers type information for the subsequent
code-generation phase.
 It uses the parse tree to identify the operators and operands of
expressions and statements, see the figure below
 An important component is type checking.
 Here the compiler checks that each operator has operands that are
permitted by the source language specification.
 Static semantic checks are performed at compile time
 Type checking
 Every variable is declared before used
 Identifiers are used in appropriate contexts
 Check subroutine call arguments
 Dynamic semantic check are performed at run time, and the
compiler produces code that performs these checks
 Array subscript values are within bounds
 Arithmetic errors, e.g. division by zero
 A variable is used but hasn't been initialized
 When a check fails at run time, an exception is raised

Type checking
A type checker verifies that the type of a construct matches that
expected by its context. For example, the –in arithmetic operator mod
in pascal requires integer operands, so a type checker must verify that
the operands of mod have type integer.
Intermediate Code Generation
 Translate from abstract-syntax trees to intermediate codes.
 Generating a low-level intermediate representation with two properties:
o It should be easy to produce
o It should be easy to translate into the target machine
 One of the popular intermediate code is three-address code. A three-
address code:
 Each statement contains at most 3 operands; in addition to “: =”,
i.e., assignment, at most one operator.
 An” easy” and “universal” format that can be translated into most
assembly languages.

some of the basic operations which in the source program, to change in the
Assembly language:
operations H.L.L Assembly language
Math. OP +,-,*,/ Add, sub, mult, div
Boolean. OP &, | , ~ And, or, not
Assignment = mov
Jump goto JP, JN, JC
conditional If, then CMP
Loop instruction For, do, repeat These most have and I.C.G before
until, while do change it to assembly language.
The operation which change H.L.L to assembly language, is called the
intermediate code generation and there is the division operation come with
it, which mean every statement have a single operation.
Ex (1):
X=A+B*C/D–Y*N
T1 T3
T2
T4
T5
T1 = B * C
T2 = T1 / D
T3 = Y * N
T4 = A + T2
T5 = T4 - T3

Ex (2):
Y = cos (A * B) + C / N - Y * P
T1 T4 T3
T2
T5
T6

T1 = A * B
T2 = cos T1
T3 = Y * P
T4 = C / N
T5 = T2 + T4
T6 = T5 – T3
Mathematical operation
There are two kinds of operation, which are deals with
mathematical operation, such as the parsing for these operations:
1. Triple form
Ex: X = A + B * C / ( - N )
OP Arg1 Arg2
(0) * B C
(1) - N __
(2) / (0) (1)
(3) + A (2)
= X (3)

Ex: Y = A + C * X / B [i]
OP Arg1 Arg2
(0) * C X
(1) =[ ] B i
(2) / (0) (1)
(3) + A (2)
= Y (3)

Ex: X[i] = N * C / Y[i]


OP Arg1 Arg2
(0) * N C
(1) =[ ] Y i
(2) / (0) (1)
(3) [ ]= X i
= (3) (2)

Ex: X = A + B * ( c / d ) - y
OP Arg1 Arg2
(0) / c d
(1) * B (0)
(2) + A (1)
(3) - (2) y
= X (3)
Ex: A = C * X [i,j]
OP Arg1 Arg2
(0) =[ ] X P
(1) * C (0)
= A (1)

2. Quadruple form

Ex: X = A * C / N + P
OP Arg1 Arg2 Result
* A C t1
/ t1 N t2
+ t2 P t3
= t3 X

Ex: A = N[i] * C / N
OP Arg1 Arg2 Result
=[ ] N i t1
* t1 C t2
/ t2 N t3
= t3 A

Ex: A = C * y / X[i,j]
OP Arg1 Arg2 Result
* C y t1
=[ ] X P t2
/ t1 t2 t3
= t3 A
Ex: X = A + B * ( c / d ) - y
OP Arg1 Arg2 Result
/ c d t1
* B t1 t2
+ A t2 t3
- t3 y t4
= t4 X

Ex: X[i] = a * c + y[i] – n[j] / V


OP Arg1 Arg2 Result
* a c t1
=[ ] n j t2
/ t2 V t3
=[ ] y i t4
+ t1 t4 t5
- t5 t3 t6
[ ]= X i t7
= t6 t7
Code Optimization
Compilers should produce target code that is as good as can be
written by hand. The code produced by straightforward compiling
algorithms can often be made to run faster or take less space, or both.
This improvement is achieved by program transformations that are
traditionally called Optimizations.

Function- Preserving Transformations


There are a number of ways in which a compiler can improve a
program without changing the function it computes. Common sub
expression elimination, copy propagation, dead- code elimination, and
constant floding are common examples of such function- preserving
transformations.

1. Common sub expressions

Ex1: X = A + C * N – M
Y=B+C*N*e

Sol: Q = C * N
X=A+Q–M
Y=B+Q*e

Ex2:

Before optimize after optimize


t6 = 4 * i t6 = 4 * i
X = a [t6] X = a [t6]
t7 = 4 * i t8 = 4 * j
t8 = 4 * j t9 = a [t8]
t9 = a [t8] a [t6] = t9
a [t7] = t9 a[t8] = X
t10 = 4 * j
a[t10] = X

Note: The value of the variable which are optimize will not be change.
2. Copy Propagation

Ex:

When the common sub expression in c = d + e is eliminated in


previous section, the algorithm uses a new variable t to hold the value of
d + e. Since control may reach c = d + e either after the assignment to a
or after the assignment to b, it would be incorrect to replace c = d + e by
either c=a or by c=b

3. Dead- Code Elimination


A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is dead or
useless code, statements that compute values that never get used.

Ex1: X=3
If X > 4 Then
.
.
end

*This condition will not do, so we must use the optimization.

Ex2: A= false
If A Then
Begin
.
.
end

*This condition also will not do


4. Loop Optimization
The running time of a program may be improved if we decrease the
number of instructions in an inner loop, even if we increase the amount of
code outside the loop. Three techniques are important for loop
optimization: Code Motion, which moves code outside a loop; Induction
Variable elimination; and, reduction in strength.

 Code Motion
An important modification that decreases the amount of code in a
loop is code motion. This transformation takes an expression that yields
the same result independent of the number of times a loop is executed and
places the expression before the loop.

Ex1: While ( I <= limit-2)

Sol: t = limit-2
While (I <= t )

Ex2: While X<A + C*y do


Begin
.
.
end

Sol: P = C * y
While X<A + P do
Begin
.
.
end

 Induction Variable

Ex: X = 2 * y + 2 * h + 4
Sol: X = 2 * (y + h) +4

Note: use the algebra method to optimize the mathematical expression.

 Reduction in strength
Reduction in strength, which replaces an expensive operation by a
cheaper one, such as a multiplication by an addition.

Ex: t4 = 4*j - 4
Code Generation

The final phase in compiler is the code generator. It takes as input


an intermediate representation of the source program and produces as
output an equivalent target program, as indicated in Fig

Position of code generator

Code generation takes a linear sequence of 3-address intermediate code


instructions, and translates each instruction into one or more instructions.
The big issues in code generation are
 Instruction selection
 Register allocation and assignment

Instruction selection: for each type of three-address statement, we can


design a code skeleton that outlines the target code to be generated for
that construct.

Example: every three address statement of the form X = Y + Z, where


X,Y and Z are statically allocated, can be translated into the code
sequence

Mov Y , R0 /* load Y into register R0 */


Add Z , R0 /* add Z to R0 */
Mov R0 , X /* store R0 into X */

Register allocation and assignment


The efficient utilization of registers involving operands is
particularly important in generating good code. The use of registers is
often subdivided into two sub problems:
Register allocation: selecting the set of variables that will reside in
registers at each point in the program
Resister assignment: selecting specific register that a variable reside in,
the goal of these operations is generally to minimize the total number of
memory accesses required by the program.

Ex:

+
+

 The code
Mov R0,a
Mov R1,b
Add R1,R0
Mov R2,c
Sub R0,R2
Add R1 , R0
Add R0 , R1
Mov d, R0

You might also like