0% found this document useful (0 votes)
9 views30 pages

COM 413 Lecture Note

The document provides an overview of compiler construction, detailing the roles and phases involved in translating high-level programming languages into machine code. It covers definitions, types of language translators, and the differences between compilers and interpreters, as well as the specific phases of a compiler including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, and code optimization. Additionally, it includes examples, key terminologies, and sample questions for each phase to aid understanding.

Uploaded by

Samist Fabusiwa
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views30 pages

COM 413 Lecture Note

The document provides an overview of compiler construction, detailing the roles and phases involved in translating high-level programming languages into machine code. It covers definitions, types of language translators, and the differences between compilers and interpreters, as well as the specific phases of a compiler including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, and code optimization. Additionally, it includes examples, key terminologies, and sample questions for each phase to aid understanding.

Uploaded by

Samist Fabusiwa
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

AKPERAN ORSHI POLYTECHNIC, YANDEV

DPEARTMENT OF COMPUTER SCIENCE

LECTURE NOTE

COM 413: COMPILER CONSTRUCTION

1
Lecture Note 1: Introduction to Compiler Construction

Introduction

A compiler is a special program that translates a source program written in a high-level


programming language (like C, Java, or Python) into a lower-level language (such as assembly or
machine code) so that it can be executed by a computer.

Definitions

 Source Program: The original code written in a high-level language.

 Target Program: The output of the compiler, typically in machine code or an


intermediate form.

 Compilation: The process of translating source code into target code.

Types of Language Translators

1. Assembler – Converts assembly language to machine code.

2. Compiler – Converts high-level language into machine code.

3. Interpreter – Directly executes instructions in a high-level language without compiling


into machine code.

4. Preprocessor – Processes directives before compilation begins (e.g., #include in C).

Differences between Compiler and Interpreter

Feature Compiler Interpreter

Translation Translates the whole program at once Translates and executes line by line

Speed Faster execution after compilation Slower due to line-by-line execution

Errors All errors shown after compilation Errors shown one at a time

2
Feature Compiler Interpreter

Output Creates a separate executable file No separate file created

Why Learn Compiler Construction?

 Understand how programming languages work.

 Write more efficient code.

 Create domain-specific languages.

 Useful in designing software development tools.

General Structure of a Compiler

A typical compiler consists of two main parts:

 Front End:

 Lexical Analysis

 Syntax Analysis

 Semantic Analysis

 Back End:

 Intermediate Code Generation

 Code Optimization

 Code Generation

Each phase performs a specific task in the translation process.

Sample Questions

1. Define a compiler.

2. List and explain the major differences between an interpreter and a compiler.

3. What are the main phases of a compiler?

Lecture Note 2: Structure and Phases of a Compiler

3
2.0 Introduction

A compiler operates in multiple stages (phases), with each stage performing a specific role in
the process of translating source code to machine code. Understanding these phases helps in
debugging, optimization, and compiler design.

Major Phases of a Compiler

1. Lexical Analysis

 Also called the scanner.

 Breaks the source code into tokens (keywords, identifiers, symbols).

 Removes whitespace and comments.

 Produces a stream of tokens as output.

2. Syntax Analysis

 Also called the parser.

 Takes the token stream and builds a parse tree using grammar rules.

 Ensures that the code follows the syntactical rules of the programming language.

3. Semantic Analysis

 Checks for meaningful constructs (e.g., type checking, scope resolution).

 Uses the symbol table to store identifiers and types.

 Detects errors such as undeclared variables or type mismatches.

4. Intermediate Code Generation

 Converts the parse tree into a simple, intermediate representation (IR).

 Easier to optimize and portable between machine architectures.

 Common IR: Three-address code (TAC).

5. Code Optimization

 Improves intermediate code to make it more efficient.

 Reduces memory usage, improves speed.

 Examples: constant folding, dead code elimination.

6. Code Generation

4
 Translates the optimized IR into target machine code.

 Allocates registers, generates instructions.

 Produces the final executable or object file.

Supporting Components

1. Symbol Table

 Stores variable names, function names, types, scopes.

 Used in semantic analysis and code generation.

2. Error Handler

 Detects and reports lexical, syntax, and semantic errors.

 Provides meaningful error messages.

3. Preprocessor (Optional)

 Processes directives like #define or #include before compilation.

Phases Grouped into Stages

Frontend Middle-end Backend

Lexical Analysis Intermediate Code Generation Code Optimization

Syntax Analysis Code Generation

Semantic Analysis

Compiler Structure Diagram

5
Sample Questions

1. List and explain the phases of a compiler.

2. What is the role of the lexical analyzer?

3. Explain the difference between syntax and semantic analysis.

4. What is an intermediate code and why is it used?

Lecture Note 3: Lexical Analysis

Introduction

Lexical analysis is the first phase of a compiler. It reads the source code and converts it into a
sequence of tokens, which are meaningful character strings such as keywords, identifiers,
operators, and symbols.

Role of the Lexical Analyzer

 Reads source code line by line or character by character.

 Groups characters into tokens.

 Removes whitespace, comments, and irrelevant symbols.

 Reports lexical errors such as invalid identifiers or illegal characters.


6
 Passes token stream to the syntax analyzer.

Key Terminologies

 Lexeme: Actual character sequence forming a token (e.g., int, x, =, 5)

 Token: Abstract category for a lexeme


(e.g., KEYWORD, IDENTIFIER, ASSIGN_OP, NUMBER)

 Pattern: Rule that defines how lexemes for a token are formed (often described using
regular expressions)

Examples of Tokens

Lexeme Token Type

If KEYWORD

count IDENTIFIER

= ASSIGN_OP

123 NUMBER

; SEMICOLON

Regular Expressions (Basic)

Used to describe patterns for lexemes.

Pattern Matches

a* zero or more a's

[a-zA-Z_][a-zA-Z0-9_]* Valid identifier

7
Pattern Matches

[0-9]+ One or more digits (numbers)

`if else`

Finite Automata (Introductory)

 A Finite Automaton (FA) is used to implement regular expressions.

 There are two types:

 DFA – Deterministic Finite Automaton

 NFA – Nondeterministic Finite Automaton

 A compiler uses DFA to recognize patterns efficiently.

Lexical Errors

Lexical errors occur when the input cannot be matched to any valid token.
Examples:

 Invalid variable names (e.g., 2abc)

 Unrecognized symbols (e.g., @, # in code)

Tools for Lexical Analysis

 Lex/Flex: Lexical analyzer generators that take regular expressions and generate C code.

 Manual implementation: Can be done using switch statements or state machines.

Practical Example

C Code Snippet:

8
int a = 10;

Lexical Analyzer Output:

 Token: KEYWORD (int)

 Token: IDENTIFIER (a)

 Token: ASSIGN_OP (=)

 Token: NUMBER (10)

 Token: SEMICOLON (;)

Sample Questions

1. What is the purpose of lexical analysis in a compiler?

2. Define token, lexeme, and pattern with examples.

3. List three types of lexical errors.

4. Write a regular expression for valid variable names in C.

Lecture Note 4: Syntax Analysis

Introduction

Syntax analysis (also called parsing) is the second phase of a compiler. It takes the token stream
produced by the lexical analyzer and arranges it into a tree-like structure based on the grammar
of the programming language.

Purpose of Syntax Analysis

 Checks if the token sequence follows the rules of the language’s grammar.

 Constructs parse trees or syntax trees.

 Reports syntax errors (e.g., missing semicolons, unbalanced parentheses).

 Passes structured output to the semantic analyzer.

Key Concepts

9
1. Grammar

 A set of rules that define the syntax of a language.

 Often written in Context-Free Grammar (CFG).

2. Context-Free Grammar (CFG)

A CFG has:

 Terminals: Actual symbols of the language (e.g., if, else, +, id)

 Non-terminals: Abstract symbols (e.g., expr, stmt)

 Start symbol: Usually the root of the parse tree (e.g., program)

 Production rules: Define how non-terminals can be replaced

Example:

E→E+T|T

T→T*F|F

F → (E) | id

Parse Tree

A parse tree represents the syntactic structure of a string according to a grammar.


Each node is a grammar symbol. Leaves are terminals.

Example:
For the expression a + b * c, the parse tree follows the rules of operator precedence.

Types of Parsers

1. Top-Down Parsing

 Begins with the start symbol and tries to derive the input string.

 Methods:

 Recursive Descent Parser

 Predictive Parser

2. Bottom-Up Parsing

10
 Begins with the input and tries to reduce it to the start symbol.

 Method:

 Shift-Reduce Parser

 LR Parser (Left-to-right, Rightmost derivation)

Leftmost and Rightmost Derivations

 Leftmost Derivation: Always expand the leftmost non-terminal first.

 Rightmost Derivation: Always expand the rightmost non-terminal first.

Syntax Errors

These occur when the structure of the code violates language rules.

Examples:

 if (x > 5 (missing ))

 int 5value; (invalid identifier)

Error Recovery Techniques

1. Panic Mode: Skip tokens until a synchronizing token is found (e.g., ;).

2. Phrase Level Recovery: Replace or insert tokens.

3. Error Productions: Add grammar rules that anticipate errors.

4. Global Correction: Try to find the minimal number of changes to fix the input.

Practical Example

Input: a + b * c

Token Stream: id + id * id

Parse Tree:

11
Sample Questions

1. What is the function of a syntax analyzer in a compiler?

2. Define context-free grammar and give an example.

3. Differentiate between top-down and bottom-up parsing.

4. Draw a parse tree for the expression: x * (y + z)

Lecture Note 5: Semantic Analysis

Introduction

Semantic analysis is the third major phase of compilation. While syntax analysis checks the
structure of a program, semantic analysis ensures the correctness of meaning based on the
rules of the programming language.

Purpose of Semantic Analysis

 Checks whether declarations and statements make logical sense.

 Verifies type correctness, scope rules, and variable/function usage.

 Builds and uses the symbol table.

 Detects semantic errors, e.g., type mismatches, undeclared variables, multiple


declarations.

Key Semantic Tasks

1. Type Checking

12
 Ensures that operands in expressions are of compatible types.

 E.g., int + float is invalid in some strict languages.

2. Scope Resolution

 Ensures that a variable or function used is declared in the correct scope.

3. Symbol Table Construction

 A data structure that stores information about identifiers:

 Name

 Type

 Scope level

 Memory location (optional)

4. Name Analysis

 Ensures each identifier is declared before use.

 Checks for duplicate declarations in the same scope.

Symbol Table

A symbol table is used throughout the semantic analysis phase and beyond.

Example structure:

Name Type Scope Line No.

X int Global 2

result float Local 5

Operations:

 Insert new symbol

 Lookup existing symbol

 Delete on scope exit

13
Semantic Errors and Examples

Code Semantic Error

int x; x = "hello"; Type mismatch

y = 5; Undeclared variable

int x; int x; Redeclaration

return 5; (in void) Invalid return type

Attribute Grammar

 Adds semantic rules to the grammar.

 Uses attributes (like type, value) and rules to propagate information across parse trees.

Static vs Dynamic Semantics

 Static Semantics: Checked at compile time (e.g., type rules, scope).

 Dynamic Semantics: Related to the meaning during execution (runtime behavior), not
usually handled by the compiler.

Sample Questions

1. What is the role of the semantic analyzer?

2. List four common types of semantic errors.

3. Describe the structure and use of a symbol table.

4. Differentiate between static and dynamic semantics.

14
Lecture Note 6: Intermediate Code Generation

Introduction

Intermediate Code Generation (ICG) is the phase where the compiler converts the output of
the semantic analyzer into a simple, low-level code representation that is independent of the
target machine architecture. This code is easier to optimize and translate into final machine
code.

Purpose of Intermediate Code

 Acts as a bridge between high-level source code and low-level machine code.

 Simplifies code optimization and code generation.

 Makes the compiler portable across different hardware platforms.

Features of Good Intermediate Code

 Easy to generate and easy to translate into machine code.

 Machine-independent (abstract representation).

 Easy to optimize.

Forms of Intermediate Representations

1. Three-Address Code (TAC)

 Each instruction has at most three operands.

 Format: x = y op z

 Uses temporary variables for intermediate results.

Example:

a=b+c*d

t1 = c * d

t2 = b + t1

a = t2

2. Quadruples

 Each instruction is a 4-tuple: (op, arg1, arg2, result)

15
Example:

(*, c, d, t1)

(+, b, t1, t2)

(=, t2, -, a)

3. Triples

 Similar to quadruples but uses position/index instead of variable names.

4. Abstract Syntax Trees (ASTs)

 Tree-like structure representing operations and operands.

 Useful for expression evaluation and optimization.

Generating Intermediate Code

 Intermediate code is generated from the annotated parse tree.

 Attributes (e.g., types, addresses) are used to determine the appropriate code.

Example: Converting an Expression

Source Code:
x = (a + b) * (c - d)

Intermediate Code (TAC):

t1 = a + b

t2 = c - d

t3 = t1 * t2

x = t3

Temporary Variables

 Temporary variables (t1, t2, t3, etc.) are used to hold intermediate results.

 Automatically managed by the compiler.

16
Sample Questions

1. What is the purpose of intermediate code in a compiler?

2. Write the three-address code for the expression: x = a + b * c - d.

3. Compare quadruples and triples in intermediate code representation.

4. What makes a good intermediate representation?

Lecture Note 7: Code Optimization

Introduction

Code Optimization is the phase of a compiler that improves intermediate code to make the
resulting machine code more efficient — without changing its functionality. It aims to improve:

 Execution speed

 Memory usage

 Power consumption (for embedded systems)

Objectives of Code Optimization

 Eliminate unnecessary computations.

 Reduce the number of instructions.

 Improve loop efficiency.

 Optimize resource usage (registers, memory, CPU).

Types of Optimizations

1. Machine-Independent Optimization

 Does not consider the target machine.

 Works on intermediate code.

2. Machine-Dependent Optimization

17
 Performed during or after code generation.

 Uses specific features of the target CPU (e.g., registers, pipeline).

Common Code Optimization Techniques

Technique Description

Evaluates constant expressions at compile time. Example: x = 2


Constant Folding + 3 becomes x = 5.

Replaces variables with known constant values. Example: x = 5;


Constant Propagation y = x + 2; becomes y = 7.

Removes code that has no effect on the output. Example:


Dead Code Elimination Unused variable assignments.

Common Subexpression
Elimination Reuses previously computed expressions.

Improves loop performance by techniques like loop unrolling,


Loop Optimization invariant code motion.

Replaces expensive operations with cheaper ones. Example: x


Strength Reduction = y * 2 becomes x = y + y.

Examples of Code Optimization

Before Optimization:

x = a + b; y = a + b;

After Common Subexpression Elimination:

t = a + b; x = t; y = t;

18
Loop Invariant Code Motion (moving calculations out of the loop):

Before:

for (i = 0; i < n; i++) { y = a * b; x[i] = y + i; }

After:

y = a * b; for (i = 0; i < n; i++) { x[i] = y + i; }

Peephole Optimization

A machine-level optimization that looks at a small set ("peephole") of instructions and tries to
replace them with more efficient ones.

Example:

MOV A, B

MOV B, A

can be optimized to:

MOV A, B

Trade-Offs in Optimization

 More optimization may increase compilation time.

 Some optimizations may make debugging harder.

 Balance is needed between compilation time, execution speed, and code readability.

Sample Questions

1. What is code optimization, and why is it important?

2. Explain dead code elimination with an example.

3. Describe two loop optimization techniques.

4. Differentiate between machine-dependent and machine-independent optimization.

19
Lecture Note 8: Code Generation

Introduction

Code Generation is the final phase of the compiler where the optimized intermediate code is
translated into target machine code or assembly language that can be executed by the
computer hardware.

Objectives of Code Generation

 Produce efficient machine code.

 Minimize the number of instructions.

 Manage hardware resources like registers and memory.

 Maintain correctness and preserve program semantics.

Input and Output

 Input: Optimized intermediate code (e.g., three-address code).

 Output: Machine-level code or assembly code.

Main Tasks in Code Generation

Task Description

Instruction Selection Choose appropriate machine instructions for operations.

Register Allocation Assign variables and temporaries to CPU registers.

Instruction Scheduling Arrange instructions to avoid hazards and improve pipeline usage.

Memory Management Allocate storage locations for variables.

Instruction Selection

20
 Translate each intermediate code instruction into one or more machine instructions.

 Example:
Intermediate code: t1 = a + b
Machine instructions (simplified):

LOAD R1, a

LOAD R2, b

ADD R3, R1, R2

STORE t1, R3

Register Allocation

 Registers are limited; the compiler must decide which variables stay in registers and
which go to memory.

 Strategies:

 Graph Coloring: Represent variables as nodes and interference as edges; color


nodes to assign registers.

 Simple heuristics: Spill variables to memory when registers run out.

Instruction Scheduling

 Reorder instructions to improve CPU pipeline performance and avoid stalls.

 Ensures instructions that depend on each other do not cause delays.

Handling Control Flow

 Translate control flow constructs (if, loops) into jumps and branch instructions.

 Example:
Intermediate code:

if (x > 0) goto L1

...

L1: ...

Machine code involves conditional jumps.

21
Example

Intermediate Code:

t1 = a + b

t2 = t1 * c

x = t2

Machine Code (Simplified):

LOAD R1, a

LOAD R2, b

ADD R3, R1, R2

LOAD R4, c

MUL R5, R3, R4

STORE x, R5

Sample Questions

1. What are the main tasks of the code generation phase?

2. Explain the importance of register allocation.

3. How is control flow handled during code generation?

4. Write machine code instructions for the intermediate code: t = a - b.

Lecture Note 9: Symbol Tables and Error Handling

Introduction

The symbol table and error handling are critical components in compiler design, supporting
semantic analysis, code generation, and improving compiler reliability.

Symbol Tables

22
Purpose

 Store information about identifiers (variables, functions, constants).

 Facilitate quick lookup and retrieval of information during compilation.

Information Stored

 Name of identifier

 Type (int, float, function, array, etc.)

 Scope level (global, local)

 Memory location or address

 Attributes (e.g., parameters of a function, size of arrays)

Structure

 Usually implemented as hash tables for fast access.

 Can be stack-based to handle scope: push when entering scope, pop when leaving.

Operations

 Insert: Add new identifier info.

 Lookup: Search for identifier.

 Delete: Remove identifiers out of scope.

Handling Scopes

 Global scope: Variables/functions accessible throughout.

 Local scope: Variables declared within functions or blocks.

 Symbol tables may be organized in a hierarchical manner or use stack of tables.

Error Handling

Compiler Errors

 Lexical Errors: Illegal tokens.

 Syntax Errors: Violations of grammar rules.

23
 Semantic Errors: Type mismatches, undeclared variables, etc.

Error Detection

 Early detection helps prevent cascading errors.

 Use lookahead tokens and error productions in grammar.

Error Recovery Techniques

Technique Description

Panic Mode Skip tokens until a known synchronizing token (like ;) is found.

Phrase Level Replace or insert tokens to fix simple errors.

Error Productions Add rules that anticipate common errors in grammar.

Global Correction Make minimal changes to the input to correct errors.

Error Reporting

 Provide clear and informative error messages.

 Include line numbers and token information.

 Help the programmer fix mistakes quickly.

Sample Questions

1. What information is stored in a symbol table?

2. How are scopes managed in symbol tables?

3. List and explain two error recovery techniques.

4. Why is error reporting important in compiler design?

24
Lecture Note 10: Linkers and Loaders

Introduction

After code generation, the compiled program consists of one or more object
files. Linkers and Loaders are system programs that prepare these files to run on the computer
by combining, relocating, and loading them into memory.

The Linker

Purpose

 Combines multiple object files into a single executable.

 Resolves references between object files.

 Assigns final addresses to code and data.

Types of Linking

 Static Linking: Combines all needed libraries and code at compile time into one
executable.

 Dynamic Linking: Links libraries at runtime, reducing executable size and allowing shared
libraries.

Linker Tasks

 Symbol Resolution: Finds addresses of external symbols used in object files.

 Relocation: Adjusts addresses in object files so they fit together correctly.

 Library Linking: Adds code from libraries when referenced.

The Loader

Purpose

 Loads the executable program into memory.

 Sets up the program’s runtime environment.

 Transfers control to the program’s starting point.

Loader Tasks

25
 Reads executable file.

 Allocates memory.

 Resolves any dynamic linking (if applicable).

 Initializes registers and stack.

Relocation

 Adjusts address-dependent code/data so program can run at different memory


locations.

 Uses relocation tables generated by the assembler/linker.

Example: Linking Process

 You write two source files: main.c and utils.c.

 Each is compiled into object files: main.o and utils.o.

 Linker combines main.o and utils.o into [Link].

 It resolves function calls from main to utils.

Sample Questions

1. What is the role of a linker in program execution?

2. Differentiate between static and dynamic linking.

3. What does a loader do before transferring control to a program?

4. Explain relocation in the context of linking and loading.

Lecture Note 11: Compiler Design Tools

Introduction

Compiler design involves many repetitive tasks such as lexical analysis and parsing. To simplify
these, various tools have been developed to automate parts of compiler construction.

26
Lexical Analyzer Generators

 Lex / Flex: Tools to generate lexical analyzers (scanners) from regular expressions.

 Input: Specification of tokens using regular expressions.

 Output: Source code (usually C) for the lexical analyzer.

Parser Generators

 Yacc / Bison: Tools to generate parsers from context-free grammar specifications.

 Use grammar rules to produce parsing code.

 Can generate LALR parsers (a type of efficient bottom-up parser).

Benefits of Using Compiler Tools

 Saves time and effort.

 Reduces human error.

 Makes the compiler easier to maintain and extend.

 Ensures compliance with formal grammar and token rules.

Overview of Tool Workflow

1. Write lexical specification (e.g., token definitions with regular expressions).

2. Generate lexical analyzer with Lex/Flex.

3. Write grammar specification for the parser.

4. Generate parser with Yacc/Bison.

5. Integrate lexical analyzer and parser into the compiler framework.

Other Tools

 Abstract Syntax Tree (AST) generators: Tools that help build tree representations of
source code.

 Intermediate code generators: Some frameworks help automate code generation.

 Debuggers and Profilers: For testing and optimizing the compiler.

27
Sample Questions

1. What is the role of Lex in compiler construction?

2. How does Yacc help in building a compiler?

3. Mention two advantages of using compiler design tools.

4. Describe the typical workflow when using Lex and Yacc.

Lecture Note 12: Compiler Optimization Techniques

Introduction

Compiler optimization techniques improve the quality of the generated code by making it faster,
smaller, or more power-efficient without changing its meaning.

Levels of Optimization

 Local Optimization: Applies within a basic block (a straight-line code sequence without
branches).

 Global Optimization: Applies across basic blocks within a function.

 Interprocedural Optimization: Applies across multiple functions or modules.

Common Optimization Techniques

Technique Description

Constant Folding Evaluate constant expressions at compile time.

Constant Propagation Replace variables with known constants.

Dead Code Elimination Remove code that never executes or affects output.

Loop Invariant Code Motion Move computations outside loops when possible.

28
Technique Description

Common Subexpression Elimination Avoid recomputing expressions by reusing values.

Strength Reduction Replace expensive operations with cheaper ones.

Inlining Replace function calls with the function body.

Data Flow Analysis

 Used to gather information about the possible set of values calculated at various points
in the program.

 Helps in optimizations like constant propagation, dead code elimination.

Example: Loop Optimization

Before:

for (i = 0; i < n; i++) { y = a * b; x[i] = y + i; }

After (Loop Invariant Code Motion):

y = a * b; for (i = 0; i < n; i++) { x[i] = y + i; }

Trade-offs

 More optimization can increase compilation time.

 Some optimizations might interfere with debugging.

 Balance needed between speed, size, and compile time.

Sample Questions

1. What is the difference between local and global optimization?

29
2. Explain constant propagation with an example.

3. How does loop invariant code motion improve performance?

4. Why is data flow analysis important in optimization?

30

You might also like