Theory of
Computation &
Compiler Design
Module 6 Semantic Analysis
Semantic Analysis
Semantic Analysis
Third Phase in Compiler Design
Last in the Analysis set
Catches all remaining errors
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
What is Semantic Analysis
Checks the logical information's of
the code.
Performed once after parsing,
validation of syntax
Why Semantic Analysis
Syntax analyzer alone cannot catch all errors.
Some language constructor are not context free.
Example: Identifier Declaration and use
{ wcw | w ∈ (a + b)* }
1st w represents the identifier declaration
2nd w represents a use of the identifier
How Semantic Analysis Works
It interprets symbols, types
and relations.
It checks for the meaning in the
syntax structure of the
programming code.
How Semantic Analysis Works
Static semantics — can be analyzed at
compile-time
Dynamic semantics — analyzed at
runtime
Division by zero
Array bounds checks
• Not a clear distinction or boundary
•Theory says that while some problems
can be found at compile-time, not all can
• So, must have runtime semantic checks
Beyond Syntax Caught in Semantic
Example 1
Int A; String B;
A=B+3;
B is type error
Example 2
Int x,y;
x=y+z;
z is not
declared
Purpose of Semantic Analysis Phase
compute additional information needed
for compilation that is beyond the
capabilities of Context-Free Grammars
and Standard Parsing Algorithms
It performs Static semantic analysis,
which takes place prior to execution
Type Checking
There is a level of correctness that goes
beyond grammar and syntax.
Example
extern int foo(int a,int b,int c,int d);
int fee()
{
int f[3],g[1],h,i,j,k; char *p;
foo(h,i,"ab",j,k);
k = f*i+j;
h = g[17];
printf("%s,%s\n",p,q);
p = &(i+j);
k = 3.14159;
p = p*2;
}
Type Checking
There is a level of correctness that goes
beyond grammar and syntax. The errors in the program.
Example
declared g[1], used g[17]
extern int foo(int a,int b,int c,int d);
wrong number of args to foo
int fee()
"ab" is not an int
{
int f[3],g[1],h,i,j,k; char *p; f declared as an array, used as a scalar
foo(h,i,"ab",j,k); 3.14159 is not an int
k = f*i+j; & cannot be applied to a non-atomic
h = g[17]; expression
printf("%s,%s\n",p,q); * cannot be applied to pointers
p = &(i+j); None of these are simply syntax errors. They
require a deeper analysis of the meaning
k = 3.14159; (semantics) of the program.
p = p*2;
}
Syntax Directed Definitions (SDD)
The theoretical framework for
semantic analysis uses syntax
directed definitions,
sometimes called attribute
grammars.
SDD Contd…
A syntax directed definition is an extension of
a context free grammar in which Each
grammar symbol is assigned certain attributes
Each production is augmented with semantic
rules which are used to define the values of
attributes
SDD Contd…
The attributes give meaning to the structures in
the parse tree, such as:
The data type of an expression or variable
The value of a constant expression
The number and types of parameters expected
by a function
The memory location and size of an expression
The symbol table associated with a compound
statement
Code generated to execute a statement or
expression
SDD Contd…
The process of computing the attributes of
each node is called annotating the parse tree,
which is then called an annotated or attributed
parse tree (or syntax tree).
One problem for compiler writers is that
language references do not normally provide
an attribute grammar, only the pure syntax of
the context-free grammar.
It is left to the compiler writer to construct an
attribute grammar from the English language
descriptions of the components of the
programming language.
Kinds of Attributes
Synthesized attribute — value only computed
when symbol is on left-hand side of production
– Attributes can be computed independently of
context
–S-attributed grammar has only
synthesized attributes
Inherited attribute — value computed in
productions where symbol is on right-hand side
– Attributes computed using context
– Cannot avoid these in semantic analysis
Synthesized attributes (S- Attribute)
A Synthesized attribute is an attribute of the non-terminal on the
left-hand side of a production.
Synthesized attributes represent information that is being passed
up the parse tree.
The attribute can take value only from its children (Variables in
the RHS of the production).
For eg. let’s say A -> BC is a production of a grammar, and A’s
attribute is dependent on B’s attributes or C’s attributes then it
will be synthesized attribute.
Inherited attributes (L-Attribute)
An attribute of a nonterminal on the right-hand side
of a production is called an inherited attribute.
The attribute can take value either from its parent or
from its siblings (variables in the LHS or RHS of the
production).
Synthesized attributes Example
These attributes get values from the attribute values of
their child nodes. To illustrate, assume the following
production:
S → ABC
If S is taking values from its child nodes (A,B,C), then it is
said to be a synthesized attribute, as the values of ABC
are synthesized to S.
Inherited attributes Example
In contrast to synthesized attributes, inherited
attributes can take values from parent and/or siblings.
S → ABC
A can get values from S, B and C. B can take values from
S and A. Likewise, C can take values from S, A, and B.
Conclusion
S → ABC S can take values from A, B,
and C (synthesized). A can take values
from S only. B can take values from S
and A. C can get values from S, A, and B.
L-attributed SDT No non-terminal can get values from
the sibling to its right.
Attributes in L-attributed SDTs are
S-attributed evaluated by depth-first and left-to-
SDT right parsing manner.
L-attributed SDT We may conclude that
if a definition is S-attributed, then it is
also L-attributed as L-attributed
definition encloses S-attributed
definitions.
Syntax Directed Definition
K Sathyarajasekaran VIT Chennai
Syntax Directed Definition
Syntax Directed Definition (SDD)
Its a kind of abstract specification.
It is generalization of context free
grammar in which each grammar
production X –> a is associated with it a
set of production rules of the form s =
f(b1, b2, ……bk) where s is the attribute
obtained from function f.
The attribute can be a string, number,
type or a memory location.
Semantic rules are fragments of code
which are embedded usually at the end
of production and enclosed in curly
braces ({ }).
Computation of Synthesized Attributes
Write the SDD using appropriate
semantic rules for each production
in given grammar.
The annotated parse tree is
generated and attribute values are
computed in bottom up manner.
The value obtained at root node is
the final output.
Example S Attributes SDD
Grammar Production Semantic Actions
S --> E S --> E Print([Link])
E --> E1 + T [Link]=[Link]*[Link]
E --> E1 + T
E --> T [Link]=[Link]
E --> T T --> T1 * F [Link]=[Link]*[Link]
T --> T1 * F T --> F [Link]=[Link]
F --> digit [Link]=[Link]
T --> F
F --> digit
Example S Attributes SDD
Let us assume an input string
4 * 5 + 6 for computing synthesized
attributes. The annotated parse tree
for the input string is
Computation of Inherited Attributes
Construct the SDD using semantic
actions.
The annotated parse tree is generated
and attribute values are computed in top
down manner.
Example L Attributes SDD
Grammar Production Semantic Actions
S --> T L S --> T L [Link]=[Link]
T --> int [Link]=int
T --> int
T --> float [Link]=float
T --> T --> double [Link]=double
float L --> L1, id [Link]=[Link]
Enter_type([Link] , [Link])
LT --> L1, id L --> id Entry_type([Link] , [Link])
Ldouble
--> id
Example L Attributes SDD
Let us assume an input string
int a , c
for computing inherited attributes.
The annotated parse tree for the
input string is
Symbol Table
K Sathyarajasekaran VIT Chennai
What is Symbol Table
Symbol Table is an important data
structure created and maintained by
the compiler.
In order to keep track of semantics
of variable i.e. it stores information
about scope and binding information
about names, information about
instances of various entities such as
variable and function names, classes,
objects, etc.
How Symbol table Works
It is built in lexical and syntax
analysis phases.
The information is collected by the
analysis phases of compiler and is
used by synthesis phases of
compiler to generate code.
It is used by compiler to achieve
compile time efficiency.
How it works with Analysis Set
Lexical Analysis: Creates new table entries in the table,
example like entries about token.
Syntax Analysis: Adds information regarding attribute
type, scope, dimension, line of reference, use, etc in the
table.
Semantic Analysis: Uses available information in the
table to check for semantics i.e. to verify that
expressions and assignments are semantically
correct(type checking) and update it accordingly.
How it works with Synthesis Set
Intermediate Code generation: Refers symbol table for
knowing how much and what type of run-time is
allocated and table helps in adding temporary variable
information.
Code Optimization: Uses information present in symbol
table for machine dependent optimization.
Target Code generation: Generates code by using
address information of identifier present in the table.
Items stored in Symbol table
Variable names and constants
Procedure and function names
Literal constants and strings
Compiler generated temporaries
Labels in source languages
Information used by compiler from
Symbol table:
Data type and name
Declaring procedures
Offset in storage
If structure or record then, pointer to
structure table.
For parameters, whether parameter
passing by value or by reference
Number and type of arguments passed to
function
Base Address
Operations of Symbol table
The basic operations defined on a symbol table include
Operation Function
Allocate To allocate a new empty symbol table
Free To remove all entries and free storage of symbol table
Lookup To search for a name and return pointer to its entry
Insert To insert a name in a symbol table and return a
pointer to its entry
Set_attribute To associate an attribute with a given entry
Get_attribute To get an attribute associated with a given entry
Implementation of Symbol table
Following are commonly used data structure for implementing symbol table
• List
• Linked List
• Hash Table
• Binary Search Tree
Conclusion
This symbol table data structure hierarchy is stored in
the semantic analyzer and whenever a name needs to
be searched in a symbol table, it is searched using
the following algorithm:
First a symbol will be searched in the current scope,
i.e. current symbol table.
If a name is found, then search is completed, else it
will be searched in the parent symbol table until,
Either the name is found or global symbol table has
been searched for the name.