AKPERAN ORSHI POLYTECHNIC, YANDEV
DPEARTMENT OF COMPUTER SCIENCE
LECTURE NOTE
COM 413: COMPILER CONSTRUCTION
1
Lecture Note 1: Introduction to Compiler Construction
Introduction
A compiler is a special program that translates a source program written in a high-level
programming language (like C, Java, or Python) into a lower-level language (such as assembly or
machine code) so that it can be executed by a computer.
Definitions
Source Program: The original code written in a high-level language.
Target Program: The output of the compiler, typically in machine code or an
intermediate form.
Compilation: The process of translating source code into target code.
Types of Language Translators
1. Assembler – Converts assembly language to machine code.
2. Compiler – Converts high-level language into machine code.
3. Interpreter – Directly executes instructions in a high-level language without compiling
into machine code.
4. Preprocessor – Processes directives before compilation begins (e.g., #include in C).
Differences between Compiler and Interpreter
Feature Compiler Interpreter
Translation Translates the whole program at once Translates and executes line by line
Speed Faster execution after compilation Slower due to line-by-line execution
Errors All errors shown after compilation Errors shown one at a time
2
Feature Compiler Interpreter
Output Creates a separate executable file No separate file created
Why Learn Compiler Construction?
Understand how programming languages work.
Write more efficient code.
Create domain-specific languages.
Useful in designing software development tools.
General Structure of a Compiler
A typical compiler consists of two main parts:
Front End:
Lexical Analysis
Syntax Analysis
Semantic Analysis
Back End:
Intermediate Code Generation
Code Optimization
Code Generation
Each phase performs a specific task in the translation process.
Sample Questions
1. Define a compiler.
2. List and explain the major differences between an interpreter and a compiler.
3. What are the main phases of a compiler?
Lecture Note 2: Structure and Phases of a Compiler
3
2.0 Introduction
A compiler operates in multiple stages (phases), with each stage performing a specific role in
the process of translating source code to machine code. Understanding these phases helps in
debugging, optimization, and compiler design.
Major Phases of a Compiler
1. Lexical Analysis
Also called the scanner.
Breaks the source code into tokens (keywords, identifiers, symbols).
Removes whitespace and comments.
Produces a stream of tokens as output.
2. Syntax Analysis
Also called the parser.
Takes the token stream and builds a parse tree using grammar rules.
Ensures that the code follows the syntactical rules of the programming language.
3. Semantic Analysis
Checks for meaningful constructs (e.g., type checking, scope resolution).
Uses the symbol table to store identifiers and types.
Detects errors such as undeclared variables or type mismatches.
4. Intermediate Code Generation
Converts the parse tree into a simple, intermediate representation (IR).
Easier to optimize and portable between machine architectures.
Common IR: Three-address code (TAC).
5. Code Optimization
Improves intermediate code to make it more efficient.
Reduces memory usage, improves speed.
Examples: constant folding, dead code elimination.
6. Code Generation
4
Translates the optimized IR into target machine code.
Allocates registers, generates instructions.
Produces the final executable or object file.
Supporting Components
1. Symbol Table
Stores variable names, function names, types, scopes.
Used in semantic analysis and code generation.
2. Error Handler
Detects and reports lexical, syntax, and semantic errors.
Provides meaningful error messages.
3. Preprocessor (Optional)
Processes directives like #define or #include before compilation.
Phases Grouped into Stages
Frontend Middle-end Backend
Lexical Analysis Intermediate Code Generation Code Optimization
Syntax Analysis Code Generation
Semantic Analysis
Compiler Structure Diagram
5
Sample Questions
1. List and explain the phases of a compiler.
2. What is the role of the lexical analyzer?
3. Explain the difference between syntax and semantic analysis.
4. What is an intermediate code and why is it used?
Lecture Note 3: Lexical Analysis
Introduction
Lexical analysis is the first phase of a compiler. It reads the source code and converts it into a
sequence of tokens, which are meaningful character strings such as keywords, identifiers,
operators, and symbols.
Role of the Lexical Analyzer
Reads source code line by line or character by character.
Groups characters into tokens.
Removes whitespace, comments, and irrelevant symbols.
Reports lexical errors such as invalid identifiers or illegal characters.
6
Passes token stream to the syntax analyzer.
Key Terminologies
Lexeme: Actual character sequence forming a token (e.g., int, x, =, 5)
Token: Abstract category for a lexeme
(e.g., KEYWORD, IDENTIFIER, ASSIGN_OP, NUMBER)
Pattern: Rule that defines how lexemes for a token are formed (often described using
regular expressions)
Examples of Tokens
Lexeme Token Type
If KEYWORD
count IDENTIFIER
= ASSIGN_OP
123 NUMBER
; SEMICOLON
Regular Expressions (Basic)
Used to describe patterns for lexemes.
Pattern Matches
a* zero or more a's
[a-zA-Z_][a-zA-Z0-9_]* Valid identifier
7
Pattern Matches
[0-9]+ One or more digits (numbers)
`if else`
Finite Automata (Introductory)
A Finite Automaton (FA) is used to implement regular expressions.
There are two types:
DFA – Deterministic Finite Automaton
NFA – Nondeterministic Finite Automaton
A compiler uses DFA to recognize patterns efficiently.
Lexical Errors
Lexical errors occur when the input cannot be matched to any valid token.
Examples:
Invalid variable names (e.g., 2abc)
Unrecognized symbols (e.g., @, # in code)
Tools for Lexical Analysis
Lex/Flex: Lexical analyzer generators that take regular expressions and generate C code.
Manual implementation: Can be done using switch statements or state machines.
Practical Example
C Code Snippet:
8
int a = 10;
Lexical Analyzer Output:
Token: KEYWORD (int)
Token: IDENTIFIER (a)
Token: ASSIGN_OP (=)
Token: NUMBER (10)
Token: SEMICOLON (;)
Sample Questions
1. What is the purpose of lexical analysis in a compiler?
2. Define token, lexeme, and pattern with examples.
3. List three types of lexical errors.
4. Write a regular expression for valid variable names in C.
Lecture Note 4: Syntax Analysis
Introduction
Syntax analysis (also called parsing) is the second phase of a compiler. It takes the token stream
produced by the lexical analyzer and arranges it into a tree-like structure based on the grammar
of the programming language.
Purpose of Syntax Analysis
Checks if the token sequence follows the rules of the language’s grammar.
Constructs parse trees or syntax trees.
Reports syntax errors (e.g., missing semicolons, unbalanced parentheses).
Passes structured output to the semantic analyzer.
Key Concepts
9
1. Grammar
A set of rules that define the syntax of a language.
Often written in Context-Free Grammar (CFG).
2. Context-Free Grammar (CFG)
A CFG has:
Terminals: Actual symbols of the language (e.g., if, else, +, id)
Non-terminals: Abstract symbols (e.g., expr, stmt)
Start symbol: Usually the root of the parse tree (e.g., program)
Production rules: Define how non-terminals can be replaced
Example:
E→E+T|T
T→T*F|F
F → (E) | id
Parse Tree
A parse tree represents the syntactic structure of a string according to a grammar.
Each node is a grammar symbol. Leaves are terminals.
Example:
For the expression a + b * c, the parse tree follows the rules of operator precedence.
Types of Parsers
1. Top-Down Parsing
Begins with the start symbol and tries to derive the input string.
Methods:
Recursive Descent Parser
Predictive Parser
2. Bottom-Up Parsing
10
Begins with the input and tries to reduce it to the start symbol.
Method:
Shift-Reduce Parser
LR Parser (Left-to-right, Rightmost derivation)
Leftmost and Rightmost Derivations
Leftmost Derivation: Always expand the leftmost non-terminal first.
Rightmost Derivation: Always expand the rightmost non-terminal first.
Syntax Errors
These occur when the structure of the code violates language rules.
Examples:
if (x > 5 (missing ))
int 5value; (invalid identifier)
Error Recovery Techniques
1. Panic Mode: Skip tokens until a synchronizing token is found (e.g., ;).
2. Phrase Level Recovery: Replace or insert tokens.
3. Error Productions: Add grammar rules that anticipate errors.
4. Global Correction: Try to find the minimal number of changes to fix the input.
Practical Example
Input: a + b * c
Token Stream: id + id * id
Parse Tree:
11
Sample Questions
1. What is the function of a syntax analyzer in a compiler?
2. Define context-free grammar and give an example.
3. Differentiate between top-down and bottom-up parsing.
4. Draw a parse tree for the expression: x * (y + z)
Lecture Note 5: Semantic Analysis
Introduction
Semantic analysis is the third major phase of compilation. While syntax analysis checks the
structure of a program, semantic analysis ensures the correctness of meaning based on the
rules of the programming language.
Purpose of Semantic Analysis
Checks whether declarations and statements make logical sense.
Verifies type correctness, scope rules, and variable/function usage.
Builds and uses the symbol table.
Detects semantic errors, e.g., type mismatches, undeclared variables, multiple
declarations.
Key Semantic Tasks
1. Type Checking
12
Ensures that operands in expressions are of compatible types.
E.g., int + float is invalid in some strict languages.
2. Scope Resolution
Ensures that a variable or function used is declared in the correct scope.
3. Symbol Table Construction
A data structure that stores information about identifiers:
Name
Type
Scope level
Memory location (optional)
4. Name Analysis
Ensures each identifier is declared before use.
Checks for duplicate declarations in the same scope.
Symbol Table
A symbol table is used throughout the semantic analysis phase and beyond.
Example structure:
Name Type Scope Line No.
X int Global 2
result float Local 5
Operations:
Insert new symbol
Lookup existing symbol
Delete on scope exit
13
Semantic Errors and Examples
Code Semantic Error
int x; x = "hello"; Type mismatch
y = 5; Undeclared variable
int x; int x; Redeclaration
return 5; (in void) Invalid return type
Attribute Grammar
Adds semantic rules to the grammar.
Uses attributes (like type, value) and rules to propagate information across parse trees.
Static vs Dynamic Semantics
Static Semantics: Checked at compile time (e.g., type rules, scope).
Dynamic Semantics: Related to the meaning during execution (runtime behavior), not
usually handled by the compiler.
Sample Questions
1. What is the role of the semantic analyzer?
2. List four common types of semantic errors.
3. Describe the structure and use of a symbol table.
4. Differentiate between static and dynamic semantics.
14
Lecture Note 6: Intermediate Code Generation
Introduction
Intermediate Code Generation (ICG) is the phase where the compiler converts the output of
the semantic analyzer into a simple, low-level code representation that is independent of the
target machine architecture. This code is easier to optimize and translate into final machine
code.
Purpose of Intermediate Code
Acts as a bridge between high-level source code and low-level machine code.
Simplifies code optimization and code generation.
Makes the compiler portable across different hardware platforms.
Features of Good Intermediate Code
Easy to generate and easy to translate into machine code.
Machine-independent (abstract representation).
Easy to optimize.
Forms of Intermediate Representations
1. Three-Address Code (TAC)
Each instruction has at most three operands.
Format: x = y op z
Uses temporary variables for intermediate results.
Example:
a=b+c*d
t1 = c * d
t2 = b + t1
a = t2
2. Quadruples
Each instruction is a 4-tuple: (op, arg1, arg2, result)
15
Example:
(*, c, d, t1)
(+, b, t1, t2)
(=, t2, -, a)
3. Triples
Similar to quadruples but uses position/index instead of variable names.
4. Abstract Syntax Trees (ASTs)
Tree-like structure representing operations and operands.
Useful for expression evaluation and optimization.
Generating Intermediate Code
Intermediate code is generated from the annotated parse tree.
Attributes (e.g., types, addresses) are used to determine the appropriate code.
Example: Converting an Expression
Source Code:
x = (a + b) * (c - d)
Intermediate Code (TAC):
t1 = a + b
t2 = c - d
t3 = t1 * t2
x = t3
Temporary Variables
Temporary variables (t1, t2, t3, etc.) are used to hold intermediate results.
Automatically managed by the compiler.
16
Sample Questions
1. What is the purpose of intermediate code in a compiler?
2. Write the three-address code for the expression: x = a + b * c - d.
3. Compare quadruples and triples in intermediate code representation.
4. What makes a good intermediate representation?
Lecture Note 7: Code Optimization
Introduction
Code Optimization is the phase of a compiler that improves intermediate code to make the
resulting machine code more efficient — without changing its functionality. It aims to improve:
Execution speed
Memory usage
Power consumption (for embedded systems)
Objectives of Code Optimization
Eliminate unnecessary computations.
Reduce the number of instructions.
Improve loop efficiency.
Optimize resource usage (registers, memory, CPU).
Types of Optimizations
1. Machine-Independent Optimization
Does not consider the target machine.
Works on intermediate code.
2. Machine-Dependent Optimization
17
Performed during or after code generation.
Uses specific features of the target CPU (e.g., registers, pipeline).
Common Code Optimization Techniques
Technique Description
Evaluates constant expressions at compile time. Example: x = 2
Constant Folding + 3 becomes x = 5.
Replaces variables with known constant values. Example: x = 5;
Constant Propagation y = x + 2; becomes y = 7.
Removes code that has no effect on the output. Example:
Dead Code Elimination Unused variable assignments.
Common Subexpression
Elimination Reuses previously computed expressions.
Improves loop performance by techniques like loop unrolling,
Loop Optimization invariant code motion.
Replaces expensive operations with cheaper ones. Example: x
Strength Reduction = y * 2 becomes x = y + y.
Examples of Code Optimization
Before Optimization:
x = a + b; y = a + b;
After Common Subexpression Elimination:
t = a + b; x = t; y = t;
18
Loop Invariant Code Motion (moving calculations out of the loop):
Before:
for (i = 0; i < n; i++) { y = a * b; x[i] = y + i; }
After:
y = a * b; for (i = 0; i < n; i++) { x[i] = y + i; }
Peephole Optimization
A machine-level optimization that looks at a small set ("peephole") of instructions and tries to
replace them with more efficient ones.
Example:
MOV A, B
MOV B, A
can be optimized to:
MOV A, B
Trade-Offs in Optimization
More optimization may increase compilation time.
Some optimizations may make debugging harder.
Balance is needed between compilation time, execution speed, and code readability.
Sample Questions
1. What is code optimization, and why is it important?
2. Explain dead code elimination with an example.
3. Describe two loop optimization techniques.
4. Differentiate between machine-dependent and machine-independent optimization.
19
Lecture Note 8: Code Generation
Introduction
Code Generation is the final phase of the compiler where the optimized intermediate code is
translated into target machine code or assembly language that can be executed by the
computer hardware.
Objectives of Code Generation
Produce efficient machine code.
Minimize the number of instructions.
Manage hardware resources like registers and memory.
Maintain correctness and preserve program semantics.
Input and Output
Input: Optimized intermediate code (e.g., three-address code).
Output: Machine-level code or assembly code.
Main Tasks in Code Generation
Task Description
Instruction Selection Choose appropriate machine instructions for operations.
Register Allocation Assign variables and temporaries to CPU registers.
Instruction Scheduling Arrange instructions to avoid hazards and improve pipeline usage.
Memory Management Allocate storage locations for variables.
Instruction Selection
20
Translate each intermediate code instruction into one or more machine instructions.
Example:
Intermediate code: t1 = a + b
Machine instructions (simplified):
LOAD R1, a
LOAD R2, b
ADD R3, R1, R2
STORE t1, R3
Register Allocation
Registers are limited; the compiler must decide which variables stay in registers and
which go to memory.
Strategies:
Graph Coloring: Represent variables as nodes and interference as edges; color
nodes to assign registers.
Simple heuristics: Spill variables to memory when registers run out.
Instruction Scheduling
Reorder instructions to improve CPU pipeline performance and avoid stalls.
Ensures instructions that depend on each other do not cause delays.
Handling Control Flow
Translate control flow constructs (if, loops) into jumps and branch instructions.
Example:
Intermediate code:
if (x > 0) goto L1
...
L1: ...
Machine code involves conditional jumps.
21
Example
Intermediate Code:
t1 = a + b
t2 = t1 * c
x = t2
Machine Code (Simplified):
LOAD R1, a
LOAD R2, b
ADD R3, R1, R2
LOAD R4, c
MUL R5, R3, R4
STORE x, R5
Sample Questions
1. What are the main tasks of the code generation phase?
2. Explain the importance of register allocation.
3. How is control flow handled during code generation?
4. Write machine code instructions for the intermediate code: t = a - b.
Lecture Note 9: Symbol Tables and Error Handling
Introduction
The symbol table and error handling are critical components in compiler design, supporting
semantic analysis, code generation, and improving compiler reliability.
Symbol Tables
22
Purpose
Store information about identifiers (variables, functions, constants).
Facilitate quick lookup and retrieval of information during compilation.
Information Stored
Name of identifier
Type (int, float, function, array, etc.)
Scope level (global, local)
Memory location or address
Attributes (e.g., parameters of a function, size of arrays)
Structure
Usually implemented as hash tables for fast access.
Can be stack-based to handle scope: push when entering scope, pop when leaving.
Operations
Insert: Add new identifier info.
Lookup: Search for identifier.
Delete: Remove identifiers out of scope.
Handling Scopes
Global scope: Variables/functions accessible throughout.
Local scope: Variables declared within functions or blocks.
Symbol tables may be organized in a hierarchical manner or use stack of tables.
Error Handling
Compiler Errors
Lexical Errors: Illegal tokens.
Syntax Errors: Violations of grammar rules.
23
Semantic Errors: Type mismatches, undeclared variables, etc.
Error Detection
Early detection helps prevent cascading errors.
Use lookahead tokens and error productions in grammar.
Error Recovery Techniques
Technique Description
Panic Mode Skip tokens until a known synchronizing token (like ;) is found.
Phrase Level Replace or insert tokens to fix simple errors.
Error Productions Add rules that anticipate common errors in grammar.
Global Correction Make minimal changes to the input to correct errors.
Error Reporting
Provide clear and informative error messages.
Include line numbers and token information.
Help the programmer fix mistakes quickly.
Sample Questions
1. What information is stored in a symbol table?
2. How are scopes managed in symbol tables?
3. List and explain two error recovery techniques.
4. Why is error reporting important in compiler design?
24
Lecture Note 10: Linkers and Loaders
Introduction
After code generation, the compiled program consists of one or more object
files. Linkers and Loaders are system programs that prepare these files to run on the computer
by combining, relocating, and loading them into memory.
The Linker
Purpose
Combines multiple object files into a single executable.
Resolves references between object files.
Assigns final addresses to code and data.
Types of Linking
Static Linking: Combines all needed libraries and code at compile time into one
executable.
Dynamic Linking: Links libraries at runtime, reducing executable size and allowing shared
libraries.
Linker Tasks
Symbol Resolution: Finds addresses of external symbols used in object files.
Relocation: Adjusts addresses in object files so they fit together correctly.
Library Linking: Adds code from libraries when referenced.
The Loader
Purpose
Loads the executable program into memory.
Sets up the program’s runtime environment.
Transfers control to the program’s starting point.
Loader Tasks
25
Reads executable file.
Allocates memory.
Resolves any dynamic linking (if applicable).
Initializes registers and stack.
Relocation
Adjusts address-dependent code/data so program can run at different memory
locations.
Uses relocation tables generated by the assembler/linker.
Example: Linking Process
You write two source files: main.c and utils.c.
Each is compiled into object files: main.o and utils.o.
Linker combines main.o and utils.o into [Link].
It resolves function calls from main to utils.
Sample Questions
1. What is the role of a linker in program execution?
2. Differentiate between static and dynamic linking.
3. What does a loader do before transferring control to a program?
4. Explain relocation in the context of linking and loading.
Lecture Note 11: Compiler Design Tools
Introduction
Compiler design involves many repetitive tasks such as lexical analysis and parsing. To simplify
these, various tools have been developed to automate parts of compiler construction.
26
Lexical Analyzer Generators
Lex / Flex: Tools to generate lexical analyzers (scanners) from regular expressions.
Input: Specification of tokens using regular expressions.
Output: Source code (usually C) for the lexical analyzer.
Parser Generators
Yacc / Bison: Tools to generate parsers from context-free grammar specifications.
Use grammar rules to produce parsing code.
Can generate LALR parsers (a type of efficient bottom-up parser).
Benefits of Using Compiler Tools
Saves time and effort.
Reduces human error.
Makes the compiler easier to maintain and extend.
Ensures compliance with formal grammar and token rules.
Overview of Tool Workflow
1. Write lexical specification (e.g., token definitions with regular expressions).
2. Generate lexical analyzer with Lex/Flex.
3. Write grammar specification for the parser.
4. Generate parser with Yacc/Bison.
5. Integrate lexical analyzer and parser into the compiler framework.
Other Tools
Abstract Syntax Tree (AST) generators: Tools that help build tree representations of
source code.
Intermediate code generators: Some frameworks help automate code generation.
Debuggers and Profilers: For testing and optimizing the compiler.
27
Sample Questions
1. What is the role of Lex in compiler construction?
2. How does Yacc help in building a compiler?
3. Mention two advantages of using compiler design tools.
4. Describe the typical workflow when using Lex and Yacc.
Lecture Note 12: Compiler Optimization Techniques
Introduction
Compiler optimization techniques improve the quality of the generated code by making it faster,
smaller, or more power-efficient without changing its meaning.
Levels of Optimization
Local Optimization: Applies within a basic block (a straight-line code sequence without
branches).
Global Optimization: Applies across basic blocks within a function.
Interprocedural Optimization: Applies across multiple functions or modules.
Common Optimization Techniques
Technique Description
Constant Folding Evaluate constant expressions at compile time.
Constant Propagation Replace variables with known constants.
Dead Code Elimination Remove code that never executes or affects output.
Loop Invariant Code Motion Move computations outside loops when possible.
28
Technique Description
Common Subexpression Elimination Avoid recomputing expressions by reusing values.
Strength Reduction Replace expensive operations with cheaper ones.
Inlining Replace function calls with the function body.
Data Flow Analysis
Used to gather information about the possible set of values calculated at various points
in the program.
Helps in optimizations like constant propagation, dead code elimination.
Example: Loop Optimization
Before:
for (i = 0; i < n; i++) { y = a * b; x[i] = y + i; }
After (Loop Invariant Code Motion):
y = a * b; for (i = 0; i < n; i++) { x[i] = y + i; }
Trade-offs
More optimization can increase compilation time.
Some optimizations might interfere with debugging.
Balance needed between speed, size, and compile time.
Sample Questions
1. What is the difference between local and global optimization?
29
2. Explain constant propagation with an example.
3. How does loop invariant code motion improve performance?
4. Why is data flow analysis important in optimization?
30