Compiler Design
[Link] Krishna
Professor in CSE
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language
Basics
Agenda
01 UNIT I : PART I
1. Language
Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language Basics
Agenda UNIT I : PART I
01
[Link]
Processors
Compiler
Language Processing
System
Language Processors
Programming languages are notations for describing
computations to people and to machines.
But, before a program can be run, it first must be
translated into a form in which it can be executed by a
computer.
The software systems that do this translation are called
compilers.
Compiler
Simply stated, a compiler is a program that can read a
program in one language — the source language — and
translate it into an equivalent program in another language
— the target language
If the target program is an executable machine-language
program, it can then be called by the user to process
inputs and produce outputs
An important role of the compiler is to report any
errors in the source program that it detects during
the translation process.
Language Processing System
Several other programs may be required to create
an executable target program.
1. Preprocessor
2. Compiler
3. Assembler
4. Linker/Loader
1. Preprocessor
The task of collecting the source program is sometimes
entrusted to a separate program, called a preprocessor.
The preprocessor may also expand shorthands, called
macros, into source language statements.
Preprocessors produce input to compilers.
They may perform the following functions:
1. Macro processing
2. File inclusion
3. "Rational" preprocessing
4. Language Extensions
2. Compiler
The (modified or pre-processed or pure) source
program is then fed to a compiler.
The compiler may produce an assembly-language
program as its output, because assembly language is
easier to produce as output and is easier to debug.
3. Assembler
The assembly language is then processed by a program
called an assembler that produces relocatable machine
code as its output.
Assembly code is a mnemonic version of machine
code, in which
names are used instead of binary codes for
operations, and names are also given to memory
addresses.
4. Linker/Loader
The linker resolves external memory addresses,
where the code in one file may refer to a
location in another file.
The loader then puts together the entire
executable object files into memory for
execution.
Agenda UNIT I : PART I
01
[Link]
Processors
Compiler
Language Processing
System
…. Thank you
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a
Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language Basics
Agenda UNIT I : PART I
01
2. Structure of a
Compiler
Analysis - Synthesis
Phases of a Compiler
Structure of a compiler
A compiler maps a source program into a semantically equivalent target program. A
compiler breaks into two parts:
1. Analysis and
2. Synthesis.
1. Analysis
The analysis part often called the front end of the compiler,
breaks up the source program into constituent pieces and
imposes a grammatical structure on them. It then uses this
structure to create an intermediate representation of the source
program.
detects that the source program is either syntactically ill
formed or semantically unsound, then it must provide
informative messages, so the user can take corrective action.
collects information about the source program and stores it in
a data structure called a symbol table, which is passed along with
the intermediate representation to the synthesis part.
2. Synthesis
The synthesis part often called the back end of the
compiler
constructs the desired target program from the
intermediate representation and the information in
the symbol table.
Phases of a Compiler
Compiler operates as a sequence of phases, each
of which transforms one representation of the source
program to another.
A typical decomposition of a compiler into phases is
shown in Fig.
1 Lexical Analysis
The first phase of a compiler is called lexical analysis or
scanning (scanner).
The lexical analyzer
o reads the stream of characters making up the source
program
o groups the characters into meaningful sequences called
lexemes
o produce output as a sequence of tokens
For example, suppose a source program contains the assignment statement
position = initial + rate * 60
Representation of the assignment statement, after lexical analysis as the sequence of tokens
(id,1) (=) (id, 2) (+) (id, 3) (*) (60)
2 Syntax Analysis
The second phase of the compiler is syntax analysis or
parsing (Parser).
The parser creates a tree-like intermediate representation
that depicts the grammatical structure of the token stream.
A typical representation is a syntax tree in which
• each interior node represents an operation and
• the children of the node represent the arguments of the
operation.
A syntax tree for the token stream is shown as the output of the syntactic analyzer. This tree
shows the order in which the operations in the assignment position = initial + rate * 60
are to be performed.
3 Semantic Analysis
An important part of semantic analysis is type checking
where the compiler checks that each operator has
matching operands.
The language specification may permit some type
conversions called coercions.
Notice that the output of the semantic analyzer has an extra
node for the operator inttofloat, which explicitly converts its
integer argument into a floating-point number
4 Intermediate Code Generation
In the process of translating a source program into target
code, a compiler may construct one or more intermediate
representations, which can have a variety of forms.
We consider an intermediate form called three-address
code, which consists of a sequence of assembly-like
instructions with three operands per instruction.
Each operand can act like a register.
5 Code Optimization
The machine-independent code-optimization phase attempts to
improve the intermediate code so that better target code will result.
Usually better means faster, but other objectives may be desired,
such as shorter code, or target code that consumes less power.
The optimizer can deduce that the conversion of 60 from integer to
floating point can be done once and for all at compile time, so the
inttofloat operation can be eliminated by replacing the integer 60 by
the floating-point number 60.0.
6 Code Generation
The code generator takes as input an intermediate representation
of the source program and maps it into the target language.
The intermediate instructions are translated into sequences of
machine instructions that perform the same task.
A crucial aspect of code generation is the judicious assignment of
registers to hold variables. For example, using registers R1 and R2,
the intermediate code might get translated into the machine code
Symbol-Table Management
The symbol table is a data structure
which stores information about all
variables/identifiers.
The data structure should be designed to allow the
compiler
to find the record for each name quickly and
to store or retrieve data from that record quickly.
Lexical and Syntax Analysis will fill up symbol table and
Code optimizer and Code generator make use of this
symbol table
Agenda UNIT I : PART I
01
2. Structure of a
Compiler
Analysis - Synthesis
Phases of a Compiler
…. Thank you
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of
Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language Basics
Agenda
01 UNIT I : PART IV
Evolution of
Programming
Languages
The Move to Higher-level
Languages
Impacts on Compilers
The Evolution of Programming Languages
The first electronic computers appeared in the 1940's and were programmed in machine
language by sequences of 0's and 1's that explicitly told the computer what operations to
execute and in what order.
The operations themselves were very low level:
move data from one location to another,
add the contents of two registers,
compare two values, and
so on.
Needless to say, this kind of programming was slow, tedious, and error prone. And once
written, the programs were hard to understand and modify.
1. The Move to Higher-level Languages
2. Impacts on Compilers
1. The Move to Higher-level Languages
The first step towards more people-friendly programming languages was the development of
mnemonic assembly languages in the early 1950's.
Initially, the instructions in an assembly language were just mnemonic representations
of machine instructions.
Later, macro instructions were added to assembly languages so that a programmer
could define parameterized shorthands for frequently used sequences of machine
instructions.
A major step towards higher-level languages was made in the latter half of the 1950's with
the development of FORTRAN for scientific computation, COBOL for business data
processing, and LISP for symbolic computation.
Today, there are thousands of programming languages with innovative features to
help make programming easier, more natural, and more robust. They can be classified
in a variety of ways.
1. The Move to Higher-level Languages
1. Classification is by generation.
First-generation languages are the machine languages
Second-generation the assembly languages
Third-generation the higher-level languages like FORTRAN, COBOL, Lisp, C,
C++, C#, and Java
Fourth-generation languages are languages designed for specific applications like
NOMAD for report generation, SQL for database queries, and Postscript for text
formatting.
1. The Move to Higher-level Languages
2. An object-oriented language is one that supports object-oriented programming, a
programming style in which a program consists of a collection of objects that interact
with one another.
Simula67 and Smalltalk are the earliest major object-oriented languages.
Languages such as C++, C#, Java, and Ruby are more recent object-oriented
languages.
3. Scripting languages are interpreted languages with high-level operators designed for
"gluing together" computations.
These computations were originally called "scripts." Awk, JavaScript, Perl, PHP,
Python, Ruby, and Tel are popular examples of scripting languages.
Programs written in scripting languages are often much shorter than equivalent
programs written in languages like C.
1. The Move to Higher-level Languages
2. An object-oriented language is one that supports object-oriented programming, a
programming style in which a program consists of a collection of objects that interact
with one another.
Simula67 and Smalltalk are the earliest major object-oriented languages.
Languages such as C++, C#, Java, and Ruby are more recent object-oriented
languages.
3. Scripting languages are interpreted languages with high-level operators designed for
"gluing together" computations.
These computations were originally called "scripts." Awk, JavaScript, Perl, PHP,
Python, Ruby, and Tel are popular examples of scripting languages.
Programs written in scripting languages are often much shorter than equivalent
programs written in languages like C.
2. Impacts on Compilers
Since the design of programming languages and compilers are intimately related, the
advances in programming languages placed new demands on compiler writers.
They had to devise algorithms and representations to translate and support the
new language features.
A compiler must translate correctly the potentially infinite set of programs that
could be written in the source language.
The problem of generating the optimal target code from a source program is
undecidable in general; thus, compiler writers must evaluate tradeoffs about what
problems to tackle and what heuristics to use to approach the problem of
generating efficient code.
Agenda
01 UNIT I : PART IV
Evolution of
Programming
Languages
The Move to Higher-level
Languages
Impacts on Compilers
… Thank You
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of
building a Compiler
5. Applications of Compiler
Technology
6. Programming Language Basics
Agenda
01 UNIT I : PART I
The Science of
building a Compiler
Modelling in Compiler Design
and Implementation
The Science of Code
Optimization
The Science of Building a Compiler
The study of compilers is mainly
a study of how we design the right mathematical models and
choose the right algorithms,
while balancing the need for generality and power against simplicity and efficiency.
Some of most fundamental models are
finite-state machines and
regular expressions.
These models are useful for describing the lexical units of programs (keywords,
identifiers, and such) and for describing the algorithms used by the compiler to
recognize those units.
Also among the most fundamental models are context-free grammars, used to
describe the syntactic structure of programming languages such as the nesting of
parentheses or control constructs.
1 Modeling in Compiler Design and Implementation
A compiler must accept all source programs that conform to the
specification of the language.
Any transformation performed by the compiler while translating a
source program must preserve the meaning of the program being
compiled.
2 The Science of Code Optimization
The term "optimization" in compiler design refers to the attempts that a
compiler makes to produce code that is more efficient than the obvious code.
Compiler optimizations must meet the following design objectives:
1. The optimization must be correct, that is, preserve the meaning of the
compiled program,
2. The optimization must improve the performance of many programs,
3. The compilation time must be kept reasonable, and
4. The engineering effort required must be manageable.
Agenda
01 UNIT I : PART I
The Science of
building a Compiler
Modelling in Compiler Design
and Implementation
The Science of Code
Optimization
… Thank You
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of
Compiler
Technology
6. Programming Language Basics
Agenda
01 UNIT I : PART I
Applications of
Compiler
Technology
Implementation of High-Level
Programming Language
Optimizations for Computer
Architectures
Design of New Computer
Architectures
Program Translations
Software Productivity Tools
Applications of Compiler Technology
Compiler design is not only about compilers. Additionally, compiler design
impacts several other areas of computer science.
The most important interactions and applications of the technology include:
1. Implementation of High-Level Programming Languages
2. Optimizations for Computer Architectures
3. Design of New Computer Architectures
4. Program Translations
5. Software Productivity Tools
1. Implementation of High-Level Programming Language
A high-level programming language defines a programming abstraction: the programmer
expresses an algorithm using the language, and the compiler must translate that program to
the target language.
Generally, higher-level programming languages are easier to program in, but are less
efficient, that is, the target programs run more slowly.
Programmers using a low-level language have more control over a computation and
can, in principle, produce more efficient code.
Unfortunately, lower-level programs are harder to write and — worse still — less
portable, more prone to errors, and harder to maintain.
Optimizing compilers include techniques to improve the performance of generated code,
thus offsetting the inefficiency introduced by high-level abstractions.
2 Optimizations for Computer Architectures
The rapid evolution of computer architectures has also led to an insatiable demand for new
compiler technology.
Almost all high-performance systems take advantage of the same two basic techniques:
1. Parallelism can be found at several levels:
at the instruction level, where multiple operations are executed simultaneously and
at the processor level, where different threads of the same application are run on
different processors.
2. Memory hierarchies are a response to the basic limitation that we can build very fast
storage or very large storage, but not storage that is both fast and large.
3 Design of New Computer Architectures
In the early days of computer architecture design, compilers were developed
after the machines were built. That has changed.
Since programming in high level languages is the norm, the performance of a
computer system is determined
not by its raw speed but also by how well compilers can exploit its
features.
Thus, in modern computer architecture development, compilers are developed in
the processor-design stage, and compiled code, running on simulators, is used to
evaluate the proposed architectural features.
4 Program Translations
While we normally think of compiling as a translation from a high-level language to
the machine level, the same technology can be applied to translate between different
kinds of languages.
The following are some of the important applications of program-translation
techniques.
i. Binary Translation
ii. Database Query Interpreters
iii. Hardware Synthesis
iv. Compiled Simulation
4 Program Translations
i. Binary Translation
Compiler technology can be used to translate the binary code for one machine to that
of another, allowing a machine to run programs originally compiled for another
instruction set.
Binary translation technology has been used by various computer companies to
increase the availability of software for their machines.
ii. Database Query Interpreters
Besides specifying software and hardware, languages are useful in many other
applications. For example, query languages, especially SQL (Structured Query
Language), are used to search databases.
Database queries consist of predicates containing relational and boolean operators.
They can be interpreted or compiled into commands to search a database for
records satisfying that predicate.
4 Program Translations
iii. Hardware Synthesis
Not only is most software written in high-level languages; even hardware designs are mostly
described in high-level hardware description languages like Verilog and VHDL (Very high-
speed integrated circuit Hardware Description Language).
Hardware designs are typically described at the register transfer level (RTL), where
variables represent registers and expressions represent combinational logic.
Hardware-synthesis tools translate RTL descriptions automatically into gates, which
are then mapped to transistors and eventually to a physical layout.
Unlike compilers for programming languages, these tools often take hours optimizing the
circuit. Techniques to translate designs at higher levels, such as the behavior or functional
level, also exist.
4 Program Translations
iv. Compiled Simulation
Simulation is a general technique used in many scientific and engineering disciplines to
understand a phenomenon or to validate a design.
Inputs to a simulator usually include the description of the design and specific input
parameters for that particular simulation run.
Simulations can be very expensive.
5 Software Productivity Tools
Errors are rampant in programs;
• errors may crash a system,
• produce wrong results,
• render a system vulnerable to security attacks, or
• even lead to catastrophic failures in critical systems.
Testing is the primary technique for locating errors in programs.
An interesting and promising complementary approach is to use data-flow analysis to locate
errors statically (that is, before the program is run).
Dataflow analysis can find errors along all the possible execution paths, and not just
those exercised by the input data sets, as in the case of program testing.
Many of the data-flow-analysis techniques, originally developed for compiler optimizations,
can be used to create tools that assist programmers in their software engineering tasks.
5 Software Productivity Tools
The problem of finding all program errors is undecidable.
A data-flow analysis may be designed to warn the programmers of all possible
statements violating a particular category of errors.
• Type Checking
Type checking is an effective and well-established technique to catch
inconsistencies in programs.
It can be used to catch errors, for example, where an operation is applied to the
wrong type of object, or if parameters passed to a procedure do not match the
signature of the procedure.
Program analysis can go beyond finding type errors by analyzing the flow of data
through a program. For example, if a pointer is assigned null and then immediately
dereferenced, the program is clearly in error.
5 Software Productivity Tools
• Bounds Checking
It is easier to make mistakes when programming in a lower-level language than a
higher-level one.
For example, many security breaches in systems are caused by buffer overflows in programs
written in C. Because C does not have array bounds checks, it is up to the user to
ensure that the arrays are not accessed out of bounds.
Failing to check that the data supplied by the user can overflow a buffer, the program may
be tricked into storing user data outside of the buffer.
An attacker can manipulate the input data that causes the program to misbehave and
compromise the security of the system. Techniques have been developed to find buffer
overflows in programs, but with limited success.
5 Software Productivity Tools
• Memory-Management Tools
Garbage collection is another excellent example of the trade off between
efficiency and a combination of ease of programming and software
reliability.
Automatic memory management obliterates all memory-management errors
(e.g., "memory leaks"), which are a major source of problems in C and C++
programs.
Various tools have been developed to help programmers find memory management
errors. For example, Purify is a widely used tool that dynamically catches
memory management errors as they occur. Tools that help identify some of
these problems statically have also been developed.
Agenda
01 UNIT I : PART I
Applications of
Compiler
Technology
Implementation of High-Level
Programming Language
Optimizations for Computer
Architectures
Design of New Computer
Architectures
Program Translations
Software Productivity Tools
… Thank You
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming
Language Basics
Agenda
01 UNIT I : PART I
Programming Language
Basics
The Static/Dynamic Distinction
Environments and States
Static Scope and Block
Structure
Explicit Access Control
Dynamic Scope
Parameter Passing Mechanisms
Aliasing
1 The Static/Dynamic Distinction
An important issue that we face when designing a compiler for a language is
what decisions can the compiler make about a program.
1. If a language uses a policy that allows the compiler to decide an issue,
then we say that the language uses a static policy or that the issue can be
decided at compile time.
2. On the other hand, a policy that only allows a decision to be made when
we execute the program is said to be a dynamic policy or to require a
decision at run time.
Most languages, such as C and Java, use static scope.
2 Environments and States
Another important distinction we must make when discussing programming
languages is whether changes occurring as the program runs affect
the values of data elements or
interpretation of names for that data
For example, the execution of an assignment such as x = y +1 changes the value
denoted by the name x.
More specifically, the assignment changes the value in whatever location is
denoted by x.
It may be less clear that the location denoted by x can change at run time.
3 Static Scope and Block Structure
Most languages, including C and its family, use static scope.
The scope rules for C are based on program structure;
• the scope of a declaration is determined implicitly by where the
declaration appears in the program.
Later languages, such as C++, Java, and C#, also provide explicit control
over scopes through the use of keywords like
1. public
2. private
3. protected.
C uses braces { and } to delimit a block.
4 Explicit Access Control
Classes and structures introduce a new scope for their members.
If p is an object of a class with a field (member) x, then the use of x in p.x refers to field
x in the class definition.
5 Dynamic Scope
Technically, any scoping policy is dynamic if it is based on factor(s) that
can be known only when the program executes.
The term dynamic scope, however, usually refers to the following policy:
• a use of a name x refers to the declaration of x in the most recently called
procedure with such a declaration.
Dynamic scoping of this type appears only in special situations.
6 Parameter Passing Mechanisms
All programming languages have a notion of a procedure, but they can differ in how
these procedures get their arguments.
The great majority of languages use either
call-by-value
call-by-reference
or both.
7 Aliasing
There is an interesting consequence of call-by-reference parameter passing or its
simulation, as in Java, where references to objects are passed by value.
It is possible that two formal parameters can refer to the same location; such
variables are said to be aliases of one another.
As a result, any two variables, which may appear to take their values from two
distinct formal parameters, can become aliases of each other, as well.
Agenda
01 UNIT I : PART I
Programming Language
Basics
The Static/Dynamic Distinction
Environments and States
Static Scope and Block
Structure
Explicit Access Control
Dynamic Scope
Parameter Passing Mechanisms
Aliasing
…. Thank You
Agenda 01 END OF
UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language
Basics
….. THANK YOU