Compilers
Introduction, Structure, and Applications
Based on "Compilers: Principles, Techniques, and Tools" (Chapter 1)
Language Processors
Definition Error Reporting
A compiler is a specialized program that translates a source An essential function is to identify and report any
program written in a high-level language into an equivalent errors detected in the source program during the
target program in another language. translation process.
Performance
Primary Goal
Compiled programs typically execute much faster
The core objective is to transform human-readable code into than interpreted ones because the translation
machine-executable instructions while ensuring functional happens once, prior to execution.
equivalence.
Compilers vs. Interpreters
Compilers Interpreters
Translates source code into target machine code once. Prioritizes Executes source program directly, statement by statement.
execution speed and efficiency. The resulting program runs Prioritizes development flexibility and provides superior error
independently of the compiler. diagnostics during execution.
Hybrid Systems Key Trade-offs
Modern environments like Java compile source code into Compilation offers high performance for production, while
intermediate bytecode, which is then interpreted or JIT-compiled by interpretation allows for faster debugging and cross-platform
a Virtual Machine. portability.
The Language-Processing System
1 Preprocessors
Handles macro expansion, file inclusion, and language extensions
before compilation begins.
2 Compilers
Translates preprocessed source into assembly language, providing a
bridge to machine code.
3 Assemblers
Converts assembly language into relocatable machine code (binary
instructions).
4 Linkers & Loaders
Linkers resolve external references; Loaders place the executable into
memory for execution.
The Two-Part Structure
of a Compiler
Analysis (Front End) Synthesis (Back End)
Breaks the source program into constituent pieces and imposes a grammatical Constructs the desired target program from the intermediate representation
structure. It creates an intermediate representation and manages the symbol provided by the analysis phase.
table.
Tailored to the specific architecture of the target processor to ensure optimal
Largely independent of the target machine, focusing on the source language performance.
rules.
Analysis Phase:
Lexical & Syntax Analysis
Lexical Analysis (Scanning)
Reads the character stream and groups them into lexemes.
Outputs tokens in the form (token-name, attribute-value).
Syntax Analysis (Parsing)
Uses tokens to create a tree-like representation: the Syntax Tree.
Guided by the formal grammar of the programming language.
Ensures structural rules and operator precedence are followed. Figure: The Front-End Phases of a Compiler
Analysis Phase: Semantic Analysis
Consistency Checking
Uses the syntax tree and symbol table to ensure the program adheres to language
rules.
Type Checking
Verifies operators and operands match expected data types to prevent type errors.
Coercions & Conversions
Performs automatic conversions (e.g., int to float) when required by operations.
Phases of a Compiler: From Analysis to Synthesis
Information Gathering
Collects and stores type information in the symbol table for later phases.
Intermediate Code Generation
The Abstract Machine Three-Address Code
Compilers generate an explicit low-level representation that A common IR where each instruction has at most three
acts as a program for an abstract machine. This operands. It resembles assembly but remains machine-
representation bridges the gap between high-level source independent.
and low-level target code.
t1 = inttofloat(60)
Key Properties t2 = id3 * t1
An effective intermediate representation (IR) must be easy t3 = id2 + t2
to produce from the analysis phase and easy to translate into id1 = t3
the final target machine language.
Synthesis Phase:
Optimization & Code Gen
Code Optimization Code Generation
Attempts to improve the intermediate code so that the The final phase maps the optimized intermediate
final target program runs faster or consumes less power. representation into the specific target machine language.
Techniques: Eliminating redundant calculations, constant
Resource Management
folding (e.g., converting 60 to 60.0), and loop
transformations. A critical aspect is the judicious assignment of variables to CPU
registers to maximize execution efficiency and minimize
memory access.
Symbol Table & Error Management
Symbol Table Error Handling
Centralized Data Storage Detection & Recovery
A central data structure that stores information about every identifier Each phase must detect errors and provide informative messages while
(variable, function, etc.) found in the source program. attempting to recover and continue the compilation process.
Attribute Recording Phase Integration
Records vital attributes such as type, scope, and memory location, The error handler interacts with every phase, ensuring robust
which are accessed and updated across all compiler phases. operation and consistent reporting of syntax, semantic, and logical
issues.
Applications of Compiler Technology
Language Implementation Architecture Optimization Productivity & Translation
Exploiting instruction-level and processor-level
Parallelism:
High-Level Languages: parallelism. Static analysis
Software for bug detection and
Tools:
Translating abstract syntax into security.
Managing
Memory caches and registers to hide memory latency.
Hierarchies:
efficient machine code.
Binary translation
Program and HDL synthesis
Translation:
Object Orientation: for hardware.
Driving the development of RISC and VLIW
New Designs:
Optimizing virtual method
architectures.
dispatches and data abstraction.