0% found this document useful (0 votes)

5 views30 pages

Compiler Design Notes

The document provides comprehensive revision notes on compiler design, covering key topics such as compiler phases, lexical analysis, and parsing techniques. It includes historical milestones in compiler development, various types of compilers, and detailed explanations of the compilation process, symbol tables, and intermediate representations like Three-Address Code. Additionally, it discusses the role of lexical analyzers, the use of regular expressions, and the design of lexical analyzer generators.

Uploaded by

vectorsigma389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views30 pages

Compiler Design Notes

Uploaded by

vectorsigma389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

COMPILER DESIGN

Complete Revision Notes

UE23CS351B | PES University
Prof. Preet Kanwal
Units 1 & 2 | Full Coverage for Exams

Coverage: Compilers, Language Processing, Compiler Phases, Symbol Table, Lexical

Analysis, Lex Tool, RE-NFA-DFA, C Grammar, Parsers, LL(1), LR(0), SLR, CLR, LALR,
Syntax Directed Definitions
UNIT 1: Compilers, Lexical Analysis & Top-Down Parsers
1.1 Introduction to Compilers
Compiler: A program that reads source code in one language and translates it into an equivalent
program in another (usually lower-level) language.
Interpreter: Directly executes source program operations. Slower to execute but provides better error
diagnostics.

History of Compilers
Year Event
1951 First practical compiler by Corrado Bohm (PhD thesis)
1952 First implemented compiler: Grace Hopper's A-0 System. Hopper coined "compiler".
1954-57 FORTRAN (FORmula TRANslation) - first widely used high-level language & complete
compiler (IBM, John W. Backus)
1969-73 C language created by Ken Thompson & Dennis Ritchie alongside Unix. B compiler
bootstrapped.
1987 GCC (GNU Compiler Collection) released by Richard Stallman - initially only C

Variants of Compilers
Variant Description Example
Self-hosting Compiler written in the same language it compiles. GCC (C), Java, Python
Can compile itself.
Bootstrapping Technique for producing a self-hosting compiler. C written in B; C++ in C
Start with core compiler in another language. (Cfront)
Native Compiler Generates code for the same platform it runs on. Turbo C
Cross Compiler Generates code for a different platform than Linux/x86 building for
where it runs. Used for embedded systems. Arduino/ARM
Transpiler Source-to-source: translates between high-level Emscripten (C++ to
languages at same abstraction level. JavaScript)
Decompiler Low-level to high-level. Inherently difficult (loses Ghidra (written in C++)
info during compilation).
JIT Compiler Compiles code just-in-time during runtime. Java Virtual Machine (JVM)
Combination of AOT compiler and interpreter.
Compiler-compiler Generates parser/compiler from formal grammar Lex, YACC
description. Parser generators use BNF/EBNF.
1.2 Language Processing System
Pipeline: Source to Executable
Stage Tool Input Output GCC Command
Preprocessing Preprocessor Source .c file .i file (macros gcc -E hello.c > hello.i
(cpp) expanded)
Compilation Compiler (gcc) .i preprocessed .s assembly file gcc -S hello.i
file
Assembly Assembler (as) .s assembly .o relocatable as hello.s -o hello.o
object file
Linking Linker (ld) .o object files + executable gcc hello.o
libraries ([Link])
Loading OS Loader executable process in ./[Link]
memory

Static vs Shared Libraries

Property Static Library (.a / .lib) Shared Library (.so / .dll)
Linking Machine code copied into executable Only a small table added; code loaded
at runtime
Executable size Larger Smaller
Version Compiled-in version always runs Latest version auto-used (no
recompile needed)
Memory Each process has own copy One copy in memory shared by all
processes
Problem Larger binaries DLL Hell (early Windows)
1.3 Phases of a Compiler
The compiler works in 7 phases. All phases have access to the Symbol Table and Error Handler.
Phase Name Input Output
1 Lexical Analysis Source characters Token stream
(Scanner)
2 Syntax Analysis (Parser) Token stream Parse Tree / AST
3 Semantic Analysis Parse Tree Annotated parse tree (type-checked)
4 Intermediate Code Annotated parse tree Three-Address Code (TAC)
Generation
5 Machine-Independent TAC Optimized IR (DAG, CSE, etc.)
Optimization
6 Code Generation Optimized IR Target assembly code
7 Machine-Dependent Assembly Optimized target code
Optimization

Front End vs Back End

Front End (Analysis) Back End (Synthesis)
Phases 1-3 + IR Generation Optimization + Code Gen
Produces IR + Symbol Table Target program from IR
Complexity O(n) or O(n log n) NP-complete in general
Automation Mostly automated (lex/yacc) Much harder to automate
Focus Recognize legal programs, report errors Instruction selection, scheduling,
register allocation

IR Optimizations (Compiler Middle End)

• Constant Folding: Evaluate constant expressions at compile time. E.g., 2+3 -> 5
• Constant Propagation: Replace variables with known constant values.
• Common Subexpression Elimination (CSE): Replace identical sub-expressions with a single
variable.
• Dead Code Elimination: Remove unreachable or unused code.

Parse Tree vs AST: Parse Tree (Concrete Syntax Tree) retains all lexemes including non-
terminals and punctuation - large and complicated. AST (Abstract Syntax Tree) retains only
information needed for code generation - no punctuation, no non-terminals. All nodes are
tokens + dummy structural nodes.
1.4 Symbol Table
Definition: A data structure created and maintained by the compiler to store information about every
identifier (variable, function, type) in the source program.

Operations
Operation Description
insert(name, type, ...) Add a new identifier with its attributes
lookup(name) Retrieve attributes of an identifier
delete(name) Remove an identifier (at end of scope)
update(name, ...) Modify attributes of an existing identifier

Implementation Options
Structure Search Time Notes
Linear List O(n) Simple but slow for large programs
Hash Table O(1) average Most commonly used; fast lookup
Tree (BST) O(log n) Ordered; useful for structured scoping
Linked lists of hash tables O(1) + scoping Used for nested scopes
Each entry stores: name, type, size, scope level, offset in memory, and any other relevant attributes.
1.5 Three-Address Code (TAC / 3AC)
An intermediate representation where each instruction has at most one operator and at most three
addresses (operands).
Form: x = y op z (at most 1 operator per instruction)
Complex expressions are broken using temporary variables t0, t1, ...

TAC Example: while (number > 0) { factorial = factorial * number; --number; }

L0: if number > 0 goto L1
goto L2
L1: t1 = factorial * number
factorial = t1
t2 = number - 1
number = t2
goto L0
L2: ...

Static Single Assignment (SSA): Each variable is assigned exactly once. Simplifies data flow
analysis. Used in modern compilers like GCC and LLVM.
1.6 Lexical Analysis
Key Definitions
Term Definition Example
Token Abstract symbol representing a type of lexical unit. <ID, 3> or <keyword, while>
Structure: <token-name, attribute-value>
Lexeme Actual sequence of characters in source that "number", "factorial", "while"
matches a pattern
Pattern Rule (regex) defining valid lexemes for a token [a-zA-Z_][a-zA-Z0-9_]* for
identifiers

Common Token Classes

Token Pattern Example Lexemes
keyword Exact string match if, while, for, int, return
id [a-zA-Z_]+[a-zA-Z0-9_]* sum, avg1, factorial
number digit+(.\ digit+)?([eE][-+]?digit+)? 10001, 3.14, 1.5e3
Comparison <|>|<=|>=|==|!= <=, !=
Assign = =
ArithOp +|-|*|/ +, *
Inc_op / ++|-- ++, --
Dec_op
Punctuation (|)|{|}|[|]|;|, ;, (, )

Role of Lexer
• Primary: Convert source into token stream by grouping characters into lexemes
• Enter lexemes representing identifiers into the symbol table
• Strip out comments and whitespace
• Track line numbers; correlate errors with source line numbers
• Expand macros
Why separate from parser? (1) Simplicity - comments removed before parser; (2) Efficiency -
specialized buffering; (3) Portability - input specifics isolated to lexer.

Buffer Pairs & Sentinels

Used for efficient character reading with lookahead:
• Buffer Pairs: Two buffers of N bytes (N = disk block size = 4096). Alternately reloaded. Two
pointers: lexeme_begin (start of current lexeme) and forward (scans ahead). When forward
reaches buffer end, reload other buffer.
• Sentinels: EOF added at end of each buffer. Avoids double-checking buffer boundaries - only
one check per character (is it EOF?). If EOF in middle of buffer = end of input. If EOF at buffer
end = reload buffer.

Lexical Errors
Error Type Example
Spelling error di{...} instead of if{...}
Unmatched string printf("hi); missing closing quote
Illegal characters printf("hi")$$; -- $$ not valid
Identifier too long External/global identifiers in C cannot exceed 31 characters
Error messages must: pinpoint the error, be specific, be understandable, and not be redundant.
1.7 Lex Tool
Lex: Reads a specification file with regular expressions and generates a C routine ([Link].c) that
performs lexical analysis. YACC handles grammar-based parsing.

Structure of a .l File
%{
/* C code copied verbatim (includes, declarations) */
%}

/* Definitions section: substitution strings */

letter [a-zA-Z_]
digit [0-9]

/* Rules section: pattern { action } */

{letter}({letter}|{digit})* { printf("ID: %s\n", yytext); }
[0-9]+ { yylval = atoi(yytext); return INTEGER; }
\n { yylineno++; }
. { /* ignore */ }

/* Auxiliary functions */
int yywrap() { return 1; }
int main() { yylex(); return 0; }

Predefined Lex Variables

Variable/Function Type Description
yylex() int Main entry point. Call to invoke lexer. Returns token type.
yytext char* Pointer to matched string (lexeme)
yyleng int Length of matched string
yylval union Value associated with token (passed to parser)
yyin FILE* Input file (default: stdin)
yyout FILE* Output file (default: stdout)
yylineno int Current line number (use %option yylineno)
yywrap() int Called when input exhausted. Return 1 if done, 0 for more.
ECHO macro Writes matched string to yyout. Default action for unmatched.
BEGIN condition macro Switch start condition
INITIAL condition Default start condition

Compilation and Running

lex echo.l # generate [Link].c
gcc [Link].c # compile to [Link]
./[Link] # run
./[Link] < [Link] # redirect input

Maximal Munch Rule (Most Important!)

Maximal Munch Rule (Longest Match Rule):
1. The LONGEST match wins (among all patterns that match a prefix of input).
2. Among patterns matching the SAME length, the EARLIEST rule in the file wins.
Example: Patterns: abb->1, aba->2, a->3. String "ababa" -> "aba" matches rule 2 (length 3 > length 1
for "a"), then "b" unmatched, then "a" matches rule 3. Output: 2b3
Failure case: Patterns a*->1, ab->2, bb->3. String "aabbb": maximal munch gives "aa"->1, "bb"->3, last
"b" unmatched. Without maximal munch: a->1, ab->2, bb->3.

Regular Expressions in Lex

Pattern Matches
. Any character except newline
\n Newline
\t Tab
^ Beginning of line
$ End of line
? Zero or one copy of preceding expression
* Zero or more copies
+ One or more copies
a|b a or b (alternation)
(ab)+ One or more copies of ab
[abc] One of: a, b, c
[a-z] Any letter a through z
[^ab] Anything except a or b
[ \t\n]+ Whitespace

Start Conditions
• Mechanism for conditionally activating patterns in Lex.
• Inclusive (%s): Being in condition scn ADDS those rules to the active set.
• Exclusive (%x): Being in condition scn means ONLY those rules (prefixed with <scn>) are
active.
• Use case: multiline comments, quoted strings, indentation handling (Python).
%x COMMENT
%%
"/*" BEGIN(COMMENT);
<COMMENT>"*/" BEGIN(INITIAL);
<COMMENT>. ; /* discard comment content */
<COMMENT>\n yylineno++;
1.8 Design of Lexical Analyzer Generator: RE -> NFA -> DFA
Pipeline
Lex internally converts each pattern (RE) to NFA, combines all NFAs, then converts to DFA for efficient
simulation.
Step Process Algorithm
1 Each RE pattern -> NFA Thompson's Construction
2 Combine NFAs with new start state + epsilon Union construction
transitions to each NFA's start
3 Combined NFA -> DFA Subset Construction
4 DFA used to simulate lexer Maximal munch on input

Thompson's Construction (RE to NFA)

RE Pattern NFA Construction
Single char a (start) --a--> (accept)
epsilon (start) --eps--> (accept)
r1 | r2 (union) New start with eps to both NFA starts; both NFA accepts to new accept
r1 r2 (concat) NFA1 accept merges with NFA2 start via eps
r* (Kleene star) New start & accept; eps to inner NFA; inner accept loops back; eps to new
accept

Subset Construction (NFA to DFA)

• State in DFA = SET of NFA states (epsilon-closure of reachable states)
• Start state of DFA = epsilon-closure({NFA start state})
• DFA state is accepting if any NFA state in the set is accepting
• For conflicts: if multiple accepting NFA states, use earliest pattern rule

Lookahead Operator in Lex (/)

r1/r2: Match r1 only if followed by r2. The r2 part is NOT consumed. Example: fortran/ means match
"fortran" only if not followed by alphanumeric.
1.9 C Language Grammar
P -> S
S -> Declr; S | Assign; S | if (Cond) {S} S | while (Cond) {S} S |
if (Cond) {S} else {S} S | for (Assign; Cond; UnaryExpr) {S} S |
return E; S | lambda
Declr -> Type ListVar
Type -> int | float
ListVar -> X | ListVar, X
X -> id | Assign
Assign -> id = E
Cond -> E RelOp E
RelOp -> < | > | <= | >= | == | !=
UnaryExpr -> E++ | ++E | E-- | --E

E -> E + T | E - T | T
T -> T * F | T / F | F
F -> G ^ F | G
G -> (E) | id | num

Token Counting Example

Code: while (number > 0) { factorial = factorial * number; --number; }
Tokens (17): <while>, <(>, <ID:number>, <>>, <0>, <)>, <{>, <ID:factorial>, <=>, <ID:factorial>, <*>,
<ID:number>, <;>, <-->, <ID:number>, <;>, <}>
1.10 Parsing Overview
Error Recovery Strategies
Strategy Mechanism Notes
Panic Mode Discard input until synchronizing token Simple; may skip large portions
Recovery (;, ), }) found, then restart
Phrase Level Local corrections: insert missing ;), Must be careful not to cause further
Recovery replace erroneous token errors
Error Productions Add common errors as explicit Complicates grammar
grammar productions with informative
actions
Global Correction Find minimum edit distance from Extremely hard to implement; may not
erroneous to correct program match programmer intent

Top-Down vs Bottom-Up Parsers

Property Top-Down (LL) Bottom-Up (LR)
Direction Start symbol to terminals Terminals to start symbol
Derivation Leftmost derivation Reverse rightmost derivation
Grammar Must be left-factored and left- No such restrictions needed
requirements recursion-free
Input reading Left to right Left to right
Types RDP, LL(1)/Predictive Parser LR(0), SLR(1), LALR(1), CLR(1)
Power Less powerful More powerful (LL(1) subset LR(1))
1.11 Grammar Transformations for Top-Down Parsing
1. Left Factoring
Factor out common prefixes to avoid ambiguity in first-choice prediction.
Before: S -> iCtSex | iCtS | a
After: S -> iCtSX | a X -> eS | lambda

2. Elimination of Left Recursion

For Indirect Left Recursion: Order the non-terminals. Replace each Ai in Aj productions (j < i) with
Ai's body to convert indirect to immediate, then eliminate.
1.12 Recursive Descent Parsing (RDP)
• A top-down parser that implements one recursive procedure per non-terminal.
• Parses without backtracking if grammar is suitable.
• Power: Accepts LL(*) grammars - MORE powerful than LL(1) table-driven parsers (which
accept only LL(1)).
• Grammar must be left-factored and left-recursion-free.
• Implementation language must support recursion efficiently.

RDP vs Predictive RDP (without backtracking) Predictive/LL(1) Parser

Parser
Mechanism Mutually recursive procedures, one Explicit stack + parsing table
per non-terminal
Grammar class LL(*) - more powerful LL(1) only
Lookahead Flexible (can look ahead multiple Exactly 1 token
tokens)
Implementation Manual coding required Generated from grammar
Error recovery More flexible Panic mode with SYNC tokens
1.13 FIRST and FOLLOW Sets
FIRST Set Rules
FIRST(alpha) = set of terminals that can appear as first symbol of strings derived from
alpha
Rule 1: FIRST(t) = {t} where t is any terminal
Rule 2: If X -> lambda, add lambda to FIRST(X)
Rule 3: If X -> Y1 Y2 ... Yn:
Add FIRST(Y1) to FIRST(X)
If lambda in FIRST(Y1), add FIRST(Y2), etc.
If lambda in FIRST(Yi) for ALL i, add lambda to FIRST(X)
Key: FIRST sets never contain $

FOLLOW Set Rules

FOLLOW(A) = set of terminals that can appear immediately to right of A in some
sentential form
Rule 1: FOLLOW(S) = {$} where S is start symbol
Rule 2: If A -> alpha B beta:
Add (FIRST(beta) - lambda) to FOLLOW(B)
If lambda in FIRST(beta), also add FOLLOW(A) to FOLLOW(B)
Rule 3: If A -> alpha B (B at end of production):
Add FOLLOW(A) to FOLLOW(B)
Key: FOLLOW sets never contain lambda; always contain $ for start symbol

Worked Example: Arithmetic Grammar

E -> T E' E' -> + T E' | lambda T -> F T' T' -> * F T' | lambda F ->
id | num | (E)

FIRST(F) = {id, num, (}

FIRST(T) = FIRST(F) = {id, num, (}
FIRST(E) = FIRST(T) = {id, num, (}
FIRST(E') = {+, lambda}
FIRST(T') = {*, lambda}

FOLLOW(E) = {$, )}
FOLLOW(E') = FOLLOW(E) = {$, )}
FOLLOW(T) = (FIRST(E') - lambda) + FOLLOW(E) = {+} + {$,)} = {+, $, )}
FOLLOW(T') = FOLLOW(T) = {+, $, )}
FOLLOW(F) = (FIRST(T') - lambda) + FOLLOW(T) = {*} + {+,$,)} = {*, +, $, )}
1.14 LL(1) Predictive Parsing
LL(1) Table Construction
For each production A -> alpha:
Step 1: For each terminal a in FIRST(alpha), add A->alpha to M[A, a]
Step 2: If lambda in FIRST(alpha), for each terminal b in FOLLOW(A), add A-
>alpha to M[A, b]
Step 3: If lambda in FIRST(alpha) and $ in FOLLOW(A), add A->alpha to M[A,
$]

LL(1) Parsing Algorithm

Stack: [S, $] Input: [w, $]

if [Link] == $ and input == $: ACCEPT

When is a Grammar NOT LL(1)?

Condition Meaning Example
FIRST(alpha) & FIRST(beta) Two alternatives A->alpha|beta A -> ab | ac (need left factoring)
not empty share first terminal
FIRST(alpha) & FOLLOW(A) When A can vanish, its follow X -> eS|lambda where e in
not empty (for A- conflicts with another option FOLLOW(X)
>alpha|lambda)
Grammar not left-factored Common prefixes cause M[A,t] S -> iCtS | iCtSeS
to have multiple entries
Left recursion present Parser enters infinite loop E -> E + T | T

LL(k) Grammars
• LL(k): Choose production after reading k lookahead symbols.
• Hierarchy: LL(k) is a strict subset of LL(k+1).
• S -> aaBb | aaCb requires k=3 to distinguish (B or C) -> LL(3) not LL(1)
• Key fact: LL(1) is strictly contained in LR(1). Every LL(1) grammar is LR(1) but not vice versa.

Error Recovery in LL(1): Panic Mode with SYNC Tokens

For each non-terminal X, add SYNC tokens under elements of FOLLOW(X) in the parsing table:
if M[X, a] == blank: // syntax error
ignore input symbol a // skip it
if M[X, a] == sync:
if X is ONLY symbol on stack: ignore a // keep X, skip input
else: pop X from stack // discard X, let ancestor handle
UNIT 2: Bottom-Up Parsers & Syntax Directed Definitions
2.1 Bottom-Up Parsing Concepts
Key Definitions
Term Definition
Handle Substring at top of stack that matches RHS of a production rule; can be
reduced.
Handle Pruning Process: iteratively find handle, reduce to LHS. Repeat until start symbol.
Viable Prefix Any prefix of a right sentential form that does not extend past the rightmost
handle. The LR(0) automaton recognizes exactly viable prefixes.
Reduction Replace TOS substring (matching RHS) with LHS of production. Bottom-up
parsing is REVERSE rightmost derivation.
Rightmost Derivation Output of bottom-up parser when w in L(G)
in Reverse

Shift-Reduce Parsing Actions

Action Description
Shift Push next input symbol onto stack; advance input pointer
Reduce Pop |RHS| symbols from stack; push LHS symbol. If reducing A->alpha (|alpha|=k),
pop 2k elements (k symbols + k states), push A, push GOTO[state, A]
Accept Stack contains only start symbol S and input is $
Error No valid action; invoke error recovery

Conflicts in Shift-Reduce Parsing

Conflict Type Cause Example Resolution in
YACC
Shift-Reduce TOS matches some RHS Dangling else: if (c) if (c) s Prefer SHIFT
AND is prefix of another else s - is else for inner or (associates else
production (can shift or outer if? with innermost if)
reduce)
Reduce-Reduce TOS matches RHS of two id could reduce to expr or Use FIRST rule
different productions parameter listed in grammar
2.2 LR Parser Types
LR(k): L=left-to-right scan, R=rightmost derivation reversed, k=lookahead tokens.
Parser Lookahead for States Power Notes
Reduce
LR(0) None - reduces entire n states Weakest Reduces blindly; no grammar
row with epsilon-prod is LR(0)
SLR(1) FOLLOW(LHS) Same n states Moderate Uses same LR(0) automaton,
as LR(0) better reduce precision
LALR(1) Precise lookahead Same n states Strong YACC generates LALR(1);
(merged LR(1) states) as LR(0) practical for real languages
CLR(1)/LR(1) Exact lookahead from Can be O(2^n) Strongest Impractical for large
LR(1) items grammars; too many states

Power Hierarchy: LR(0) ⊂ SLR(1) ⊂ LALR(1) ⊂ CLR(1)

State counts: n_LR(0) = n_SLR = n_LALR <= n_CLR
Also: LL(1) ⊂ LR(1). Every LL(1) grammar is LR(1).

LR Automaton Construction
LR(0) automaton is used by LR(0) and SLR(1). LR(1) automaton is used by LALR(1) and CLR(1).
2.3 LR(0) Parser
LR(0) Items
An LR(0) item is a production with a dot (.) on its RHS showing parsing progress.
A -> . X Y Z (nothing seen yet, expecting X next)
A -> X . Y Z (X seen, expecting Y next)
A -> X Y . Z (X, Y seen, expecting Z)
A -> X Y Z . (entire RHS seen - FINAL/KERNEL item - can REDUCE)

LR(0) Automaton Construction Algorithm

Steps to build LR(0) automaton:
1. Augment grammar: Add S' -> S as new start production
2. Number all original productions (for reduce actions)
3. Start state I0: Closure({S' -> . S})
Closure(I): If A -> alpha . B beta in I, add all B -> . gamma to I. Repeat.
Goto(I, X): Shift dot past X in all items containing . X. Take closure.
4. Repeat Goto for every symbol after every dot until no new states

LR(0) Table Construction

Situation Action
State I has transition to J on terminal ACTION[I, a] = s J (shift to state J)
a
State I has GOTO on non-terminal A GOTO[I, A] = J
to state J
State I has FINAL item A -> alpha . ACTION[I, a] = r k for ALL terminals a (entire row) - weakness
of LR(0)
State I contains S' -> S . ACTION[I, $] = accept
LR(0) Parsing Algorithm: On reduce A -> alpha (|alpha| = k): pop 2*k symbols off stack, then push A,
then push GOTO[current_state, A].

LR(0) Conflicts
Conflict Cause Example
Shift-Reduce State has both a shift item (A -> E -> T . +E and E -> T .
alpha . a beta) and a final item (B ->
gamma .)
Reduce-Reduce State has two or more final items A -> alpha . and B -> beta . both in
same state
Any grammar with Final item (A -> .) in I0 causes r/r Y -> lambda always causes LR(0) failure
epsilon productions conflict with shift items
2.4 SLR(1) Parser
Key Improvement over LR(0): Uses FOLLOW sets to restrict where reductions are placed.
SLR(1) Rule: For final item A -> alpha . in state I:
Add ACTION[I, a] = r_k ONLY for terminals a in FOLLOW(A)
(NOT the entire row as in LR(0))

SLR(1) Conflicts
Conflict Condition Solution
Shift-Reduce Shift terminal is in FOLLOW(LHS) of Not SLR(1) - need CLR/LALR
reduce
Reduce-Reduce FOLLOW(A) & FOLLOW(B) not Not SLR(1) - need CLR/LALR
empty, where A->alpha. and B-
>beta. in same state
No conflict Shift terminals and FOLLOW sets Grammar is SLR(1)
are all disjoint

Viable Prefixes
• A viable prefix is a prefix of a right sentential form that does not extend past the rightmost
handle.
• The parser stack always contains a viable prefix - never allows an invalid prefix.
• The LR(0) automaton recognizes EXACTLY the set of viable prefixes.
• This property (valid prefix property) is shared by both LL(1) and LALR(1) parsers.
2.5 CLR(1) / LR(1) Parser
LR(1) Items
An LR(1) item = [LR(0) item, lookahead set]
A -> alpha . beta , {a, b, c}

Initial item: S' -> . S , {$}

For parent item A -> alpha . B beta , a:

Child item B -> . gamma , FIRST(beta a)

LR(1) Lookahead Calculation

Lookahead Calculation Rule:
Parent: A -> alpha . B beta , a
Child: B -> . gamma , FIRST(beta a)
Where "a" is the lookahead of the parent item and "beta" is what follows B in the parent.
If beta is empty (B is at end): Child lookahead = {a} (propagated from parent)

CLR(1) Table Construction

Situation Action
Transition from I to J on terminal a ACTION[I, a] = s J
Transition from I to J on non-terminal GOTO[I, A] = J
A
Final item A -> alpha ., {b, c} in state ACTION[I, b] = r k AND ACTION[I, c] = r k (ONLY for
I lookahead tokens)
S' -> S . , {$} ACTION[I, $] = accept

CLR(1) Conflicts
Conflict Cause
Reduce-Reduce Two final items in same state with overlapping lookaheads: [A -> alpha., {a}] and
[B -> beta., {a}]
Shift-Reduce Final item [A -> alpha., {a}] and shift item [B -> gamma . a delta] in same state
Problem: CLR(1) can create O(2^n) states - impractical for large grammars. Solution: LALR(1)
2.6 LALR(1) Parser
Strategy: Build CLR(1) automaton, then MERGE states with identical LR(0) cores (same productions
with dots in same positions) but different lookaheads. Union the lookahead sets.

LALR(1) Key Properties:

- Same number of states as LR(0) automaton (n_LALR = n_LR(0))
- Better lookahead precision than SLR (uses LR(1) lookaheads, not FOLLOW
sets)
- YACC generates LALR(1) parsers
- Merging CAN introduce r/r conflicts not in CLR (but CANNOT introduce new
s/r conflicts)

LALR Conflicts
Conflict Type In LALR? Explanation
Reduce-Reduce Possible (can If merged states have final items with overlapping lookaheads
be introduced from different states: [A->a.,{a}] merged with [B->b.,{a}] gives r/r
by merging) conflict on "a"
Shift-Reduce CANNOT be S/R conflict in LALR must also exist in CLR. Merging only
introduced by unifies lookaheads, never creates new shifts.
merging

Merging Example
Suppose CLR has I4 = {A -> alpha., {=, $}} and I11 = {A -> alpha., {=}} with same LR(0) core.
After merge: I4-11 = {A -> alpha., {=, $}}. Note the merged state's transitions must also be merged.
Check: If I4 went to state X on some symbol and I11 went to state Y on same symbol, they must be X-
Y merged state.
2.7 Parser Comparison Summary
Feature LR(0) SLR(1) LALR(1) CLR(1)
Lookahead for reduce None (entire FOLLOW(A) LR(1) lookahead LR(1) lookahead
row) (precise)
Number of states n n (same as n (same as Can be >> n
LR(0)) LR(0))
Power Weakest Moderate Very strong Strongest
Practical use Theoretical only Rarely used now YACC, most Rarely (too many
parsers states)
Automaton type LR(0) LR(0) LR(1) merged LR(1)
Error detection Slowest Faster Fast Fastest
(spurious (immediate)
reductions)
Handles epsilon No Sometimes Usually Yes
grammars

Grammar Classification Reference Table

Grammar LR(0) SLR LALR CLR Notes
S->AA, A->aA|b Yes Yes Yes Yes Classic example
E->T+E|T, T->id No(s/r) Yes Yes Yes SLR resolved by
FOLLOW(E)={$}
S->A|a, A->a No(r/r) No(r/r) No(r/r) No(r/r) Ambiguous grammar - no
parser handles
S->AaAb|BbBa, A- No(r/r) No(r/r) Yes Yes SLR fails:
>lambda,B->lambda FOLLOW(A)=FOLLOW(B)
S->L=R|R, L->*R|id, R->L No(s/r) No(s/r) Yes Yes Classic SLR failure case
S->Aa|bAc|Bc|bBa, A->d, B- No(r/r) No(r/r) No(r/r) Yes LALR fails; CLR only
>d
S->dA|aB, A->bA|c, B->bB|c Yes Yes Yes Yes All parsers work
S->SS+|SS*|a Yes Yes Yes Yes Postfix notation

LL(1) vs LALR(1)
Property LL(1) LALR(1)
Parse approach Top-down, leftmost derivation Bottom-up, rightmost derivation
reversed
Stack contents Predicted (future) symbols - useful for Past symbols (what has been seen)
error repair
Error detection Earlier (no spurious reductions) May do reductions before detecting
error
Error repair Easier (stack shows expected future) Harder (stack shows past, not
expected future)
Time complexity O(n) O(n)
Property LL(1) LALR(1)
Grammar coverage Subset of LALR(1) Superset of LL(1)
2.8 Syntax Directed Definitions (SDD)
SDD = CFG + Semantic Rules. Grammar symbols have attributes; productions have associated rules
to compute attribute values.
Purpose: Semantic Analysis - compute meanings, perform type checking, build symbol table, generate
code.

Types of Attributes
Type Computed From Traversal Order Example
Synthesized Children's attributes Post-order (bottom-up) [Link] = [Link] + [Link]
(bottom-up)
Inherited Parent's and siblings' Complex (dependency- [Link] = [Link] (type
attributes based) declaration)

SDD Example: Desk Calculator

Production Semantic Rule
E -> E1 + T [Link] = [Link] + [Link]
E -> T [Link] = [Link]
T -> T1 * F [Link] = [Link] * [Link]
T -> F [Link] = [Link]
F -> num [Link] = [Link]
F -> id [Link] = [Link]

S-Attributed SDDs
S-Attributed SDD: Contains ONLY synthesized attributes. Can be evaluated by bottom-up parser in
single pass.
Evaluation: Stack-based during bottom-up parsing. On reduce A -> beta, compute [Link] from beta's
attributes on stack.

Inherited Attributes Example: Type Declarations

Production Semantic Rule
D -> T L [Link] = [Link]
T -> int [Link] = integer
T -> real [Link] = float
L -> L1, id [Link] = [Link]; addType([Link], [Link])
L -> id addType([Link], [Link])
[Link] is an inherited attribute - propagated top-down from T (which has synthesized type) to all L nodes.

Dependency Graphs
• Nodes = attribute instances in the parse tree
• Edge from c to b means b depends on c (compute c first)
• Must be a DAG (directed acyclic graph) - no cycles
• Topological sort gives valid evaluation order
To evaluate an SDD:
1. Construct parse tree for input
2. Build dependency graph (one node per attribute instance)
3. Topological sort the dependency graph
4. Evaluate semantic rules in topological order
5. Result: Annotated parse tree

SDD Examples: Binary Operations

Example Production Semantic Rule
Count 1s B -> 1 [Link] = 1
Count 1s B -> 0 [Link] = 0
Count 1s L -> L1 B [Link] = [Link] + [Link]
Binary to Decimal B -> 0/1 [Link] = 0 or 1
Binary to Decimal L -> L1 B [Link] = 2 * [Link] + [Link]
Count balanced S -> (S1) [Link] = [Link] + 1
parens
Count balanced S -> a [Link] = 0
parens
Type determination T -> [Link] [Link] = float
Type determination T -> num [Link] = integer
Type determination E -> E1 + T if [Link]==float or [Link]==float then
[Link]=float else [Link]=integer
Infix to Postfix E -> E1 + T printf("+")
Sign detection E -> -E1 if [Link]==POS then [Link]=NEG else
[Link]=POS
Sign detection E -> E1*E2 [Link] = POS if [Link]==[Link], else NEG
Quick Reference: Key Algorithms & Facts
LR Parsing Stack Operations
Shift a:
push a onto stack
push next_state onto stack
advance input

Reduce A -> alpha (|alpha| = k):

pop 2*k elements from stack (k symbols + k states)
push A onto stack
push GOTO[current_top_state, A] onto stack
DO NOT advance input

Accept: stack = [S, 1] and input = [$]

Summary: When Does Each Parser Fail?

Parser Fails When...
LR(0) (1) State has both shift item and final item (s/r conflict), OR (2) State has two or more
final items (r/r conflict), OR (3) Grammar has epsilon productions (always creates
conflicts in I0)
SLR(1) (1) Shift terminal is in FOLLOW(LHS) causing s/r conflict, OR (2) FOLLOW(A) &
FOLLOW(B) overlap for two final items in same state
LALR(1) (1) After merging CLR states, final items in merged state have overlapping
lookaheads (r/r conflict). NOTE: Merging cannot create new s/r conflicts.
CLR(1) (1) Two final items in same state with same lookahead (r/r), OR (2) Final item
lookahead matches a shift terminal (s/r). Never fails for practical deterministic
languages.

Error Detection Speed

Parser Error Detection
CLR(1) Fastest - detects error immediately with no spurious reductions
LALR(1) May perform one or a few reductions before detecting error
SLR(1) May perform more spurious reductions
LR(0) Slowest - may perform many spurious reductions (no lookahead)

Final Important Facts:

1. Ambiguous grammars: No parser (LL or LR) can handle without resolving conflicts
manually.
2. No grammar with epsilon productions is LR(0) - always causes conflict in I0.
3. YACC generates LALR(1) parsers. It resolves s/r by preferring shift, r/r by first rule.
4. In LR parsing: always keep state numbers on stack; states alternate with grammar
symbols.
5. S-attributed SDDs can be evaluated in one bottom-up pass (most practical compilers use
this).
6. Dependency graph must be acyclic (DAG) for SDD evaluation to be possible.

Compiler Design Concepts Explained
0% (1)
Compiler Design Concepts Explained
5 pages
Understanding Compiler Basics
No ratings yet
Understanding Compiler Basics
136 pages
Compiler Design: Phases and Analysis
No ratings yet
Compiler Design: Phases and Analysis
50 pages
Compiler Construction Overview
No ratings yet
Compiler Construction Overview
37 pages
Overview of Compiler Design Phases
No ratings yet
Overview of Compiler Design Phases
111 pages
Compiler Design Overview and Phases
100% (1)
Compiler Design Overview and Phases
28 pages
Compiler Design: Key Concepts Overview
No ratings yet
Compiler Design: Key Concepts Overview
5 pages
Compiler Design and Implementation Guide
No ratings yet
Compiler Design and Implementation Guide
60 pages
Compiler Design Fundamentals
No ratings yet
Compiler Design Fundamentals
31 pages
Compiler vs Interpreter Explained
No ratings yet
Compiler vs Interpreter Explained
19 pages
Compiler CIA 1
No ratings yet
Compiler CIA 1
6 pages
UNIT1 Compiler
No ratings yet
UNIT1 Compiler
195 pages
Compiler Design Complete Notes
No ratings yet
Compiler Design Complete Notes
20 pages
Compiler Concepts and Phases Explained
No ratings yet
Compiler Concepts and Phases Explained
19 pages
Compiler Design: Lexical Analysis Overview
No ratings yet
Compiler Design: Lexical Analysis Overview
95 pages
Language Translator Compiler Overview
No ratings yet
Language Translator Compiler Overview
71 pages
Compiler Design Unit1 Unit2 Notes
No ratings yet
Compiler Design Unit1 Unit2 Notes
11 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
22 pages
Introduction to Compiler Design Basics
No ratings yet
Introduction to Compiler Design Basics
115 pages
Compiler Design and Language Processing
No ratings yet
Compiler Design and Language Processing
19 pages
Compiler Design: Key Concepts Explained
No ratings yet
Compiler Design: Key Concepts Explained
4 pages
Introduction to Compiler Basics
No ratings yet
Introduction to Compiler Basics
29 pages
Language Processors and Compiler Phases
No ratings yet
Language Processors and Compiler Phases
14 pages
Lect Notes
100% (1)
Lect Notes
40 pages
Compiler Design - Complete Notes
No ratings yet
Compiler Design - Complete Notes
29 pages
Ch-1 Compiler Design
No ratings yet
Ch-1 Compiler Design
7 pages
Compiler Design Overview
No ratings yet
Compiler Design Overview
17 pages
Compiler Design: Phases and Tools
100% (1)
Compiler Design: Phases and Tools
36 pages
Module 1
No ratings yet
Module 1
120 pages
Compiler Design Overview and Structure
No ratings yet
Compiler Design Overview and Structure
250 pages
50XE
No ratings yet
50XE
54 pages
Compiler Design Module-I
No ratings yet
Compiler Design Module-I
49 pages
Lexical Notes
No ratings yet
Lexical Notes
38 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
18 pages
01 Compiler Design
No ratings yet
01 Compiler Design
37 pages
Understanding Compiler Design Basics
No ratings yet
Understanding Compiler Design Basics
42 pages
Compiler Design Overview and Phases
100% (1)
Compiler Design Overview and Phases
193 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
20 pages
Compiler Design and Language Processing
No ratings yet
Compiler Design and Language Processing
42 pages
Overview of Compiler Design Concepts
No ratings yet
Overview of Compiler Design Concepts
5 pages
Compiler Basics: Lexical Analysis & Structures
No ratings yet
Compiler Basics: Lexical Analysis & Structures
94 pages
SS Unit-2
No ratings yet
SS Unit-2
17 pages
Compiler Structure and Phases Explained
No ratings yet
Compiler Structure and Phases Explained
22 pages
Compiler Design and Phases Explained
No ratings yet
Compiler Design and Phases Explained
76 pages
Compiler Design Principles and Phases
No ratings yet
Compiler Design Principles and Phases
29 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
67 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
23 pages
Compiler vs Interpreter Explained
No ratings yet
Compiler vs Interpreter Explained
2 pages
COMP451 - IntroRegX - EDITED
No ratings yet
COMP451 - IntroRegX - EDITED
8 pages
Compilerdesignnotes
100% (1)
Compilerdesignnotes
210 pages
Compiler Design and Phases Overview
No ratings yet
Compiler Design and Phases Overview
49 pages
Compiler Design Overview and Types
No ratings yet
Compiler Design Overview and Types
14 pages
Compiler Phases and Functions Explained
No ratings yet
Compiler Phases and Functions Explained
40 pages
Compiler Design Unit1
No ratings yet
Compiler Design Unit1
16 pages
Compiler Detailed QA
No ratings yet
Compiler Detailed QA
49 pages
Bottom-Up Parsing Techniques Explained
No ratings yet
Bottom-Up Parsing Techniques Explained
29 pages
Foundations of Computing Concepts
No ratings yet
Foundations of Computing Concepts
50 pages
CDca 2
No ratings yet
CDca 2
10 pages
Compiler Design Concepts Explained
No ratings yet
Compiler Design Concepts Explained
31 pages
Bottom-Up Parsing in Compiler Design
No ratings yet
Bottom-Up Parsing in Compiler Design
12 pages
Compiler Stages and Functions Explained
No ratings yet
Compiler Stages and Functions Explained
53 pages
Compiler Optimization Techniques Overview
No ratings yet
Compiler Optimization Techniques Overview
37 pages
Time: 3 - N.B. Each In-Separate: Khulna University of Engineering Technology
No ratings yet
Time: 3 - N.B. Each In-Separate: Khulna University of Engineering Technology
8 pages
CLR(1) and LALR(1) Parser Overview
No ratings yet
CLR(1) and LALR(1) Parser Overview
76 pages
Understanding Handle Pruning in Parsing
No ratings yet
Understanding Handle Pruning in Parsing
95 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
36 pages
LR Parsers II The Canonical LR (1) Table Construction
No ratings yet
LR Parsers II The Canonical LR (1) Table Construction
24 pages
Compiler Design Viva Questions Guide
No ratings yet
Compiler Design Viva Questions Guide
12 pages
LR Parsing Techniques Overview
No ratings yet
LR Parsing Techniques Overview
46 pages
Compiler Design Viva Questions Guide
No ratings yet
Compiler Design Viva Questions Guide
17 pages
Understanding LALR Bottom-Up Parsing
No ratings yet
Understanding LALR Bottom-Up Parsing
21 pages
CS346 Bottom Up Parser
No ratings yet
CS346 Bottom Up Parser
64 pages
Compiler Construction Overview
100% (1)
Compiler Construction Overview
11 pages
Compiler Design: Parser Concepts & Errors
No ratings yet
Compiler Design: Parser Concepts & Errors
7 pages
Bottom-Up Parsing and SLR Parsing Techniques
No ratings yet
Bottom-Up Parsing and SLR Parsing Techniques
58 pages
Canonical LR Parser Implementation
No ratings yet
Canonical LR Parser Implementation
28 pages
Introduction to Simple LR Parsing
No ratings yet
Introduction to Simple LR Parsing
8 pages
CLR Parsing in Compiler Design
No ratings yet
CLR Parsing in Compiler Design
23 pages
Context-Sensitive Analysis Overview
No ratings yet
Context-Sensitive Analysis Overview
69 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
49 pages
Bottom-Up Parsing Techniques in Compilers
No ratings yet
Bottom-Up Parsing Techniques in Compilers
103 pages
Comprehensive Guide to Syntax Analysis
No ratings yet
Comprehensive Guide to Syntax Analysis
49 pages
LR(1) and LALR(1) Parsing Techniques
No ratings yet
LR(1) and LALR(1) Parsing Techniques
21 pages
Canonical LR(1) Parser Overview
No ratings yet
Canonical LR(1) Parser Overview
10 pages
Understanding LR Grammars and Parsing
No ratings yet
Understanding LR Grammars and Parsing
14 pages

Compiler Design Notes

Uploaded by

Compiler Design Notes

Uploaded by

COMPILER DESIGN

Complete Revision Notes

Coverage: Compilers, Language Processing, Compiler Phases, Symbol Table, Lexical

Static vs Shared Libraries

Front End vs Back End

IR Optimizations (Compiler Middle End)

TAC Example: while (number > 0) { factorial = factorial * number; --number; }

Common Token Classes

Buffer Pairs & Sentinels

/* Definitions section: substitution strings */

/* Rules section: pattern { action } */

Predefined Lex Variables

Compilation and Running

Maximal Munch Rule (Most Important!)

Regular Expressions in Lex

Thompson's Construction (RE to NFA)

Subset Construction (NFA to DFA)

Lookahead Operator in Lex (/)

Token Counting Example

Top-Down vs Bottom-Up Parsers

2. Elimination of Left Recursion

RDP vs Predictive RDP (without backtracking) Predictive/LL(1) Parser

FOLLOW Set Rules

Worked Example: Arithmetic Grammar

FIRST(F) = {id, num, (}

LL(1) Parsing Algorithm

while ([Link] != $):

if [Link] == $ and input == $: ACCEPT

When is a Grammar NOT LL(1)?

Error Recovery in LL(1): Panic Mode with SYNC Tokens

Shift-Reduce Parsing Actions

Conflicts in Shift-Reduce Parsing

Power Hierarchy: LR(0) ⊂ SLR(1) ⊂ LALR(1) ⊂ CLR(1)

LR(0) Automaton Construction Algorithm

LR(0) Table Construction

Initial item: S' -> . S , {$}

For parent item A -> alpha . B beta , a:

LR(1) Lookahead Calculation

CLR(1) Table Construction

LALR(1) Key Properties:

Grammar Classification Reference Table

SDD Example: Desk Calculator

Inherited Attributes Example: Type Declarations

SDD Examples: Binary Operations

Reduce A -> alpha (|alpha| = k):

Accept: stack = [S, 1] and input = [$]

Summary: When Does Each Parser Fail?

Error Detection Speed

Final Important Facts:

You might also like