4.
LANGUAGE DESCRIPTION
FUNDAMENTAL SYNTACTIC ANALYSIS CONCEPT ON UNDERLYING MODERN
PROGRAMMING LANGUAGE
Language description refers to the process of formally defining the syntax, semantics, and other aspects
of a programming or natural language. It involves specifying the rules and structures that govern how
the language is used and interpreted. Language descriptions are typically documented in language
specifications or standards documents, which serve as authoritative references for developers,
compilers, interpreters, and language users. These descriptions help ensure consistency and clarity in
understanding and implementing the language.
1.0 Introduction
Describing a programming language in a clear and understandable way is crucial for its success. Some
languages have faced issues with multiple slightly different versions due to vague definitions. Different
groups like evaluators, implementers, and users need to understand the language description.
Evaluators test new languages before they are finished. Implementers need precise descriptions to
understand how the language works. Users rely on language manuals to understand how to write
software. Programming languages are studied based on syntax (how statements are written) and
semantics (what they mean). Syntax is easier to describe than semantics because there's a clear way to
do it. However, there's no universal way to describe semantics yet. For example, in Java, the syntax of a
while statement is `while (boolean_expr) statement`. The semantics of this statement mean that when
the current value of the Boolean expression is true, the embedded statement is executed. Otherwise,
control continues after the while construct, and control implicitly returns to the Boolean expression to
repeat the process.
2.0 OBJECTIVES
At the end of this unit, you should be able to:
• How to learn a syntactical analysis;
• how learn new grammar and syntax of grammar
• understand generative grammar
• to consider rules of grammar in order to define the logical meaning as well as correctness of the
sentences
3.0
3.1 Syntactic Analysis
Syntactic analysis deals with the structure of languages, whether natural (like English) or artificial (like
Java). Syntax rules determine which combinations of symbols from the language's alphabet are valid.
Syntax is like grammar in spoken languages; it dictates the specific order of words to convey meaning.
In programming, syntax refers to the precise arrangement of code elements, crucial for effective
communication with computers. Syntax analysis, performed by compilers, ensures the correctness of
program structure. This analysis can be done using lexical, concrete, and abstract syntax techniques.
Lexical syntax defines basic symbol rules, concrete syntax specifies program representation, and
abstract syntax conveys vital program information Syntax defines the structure of programs without
considering their meaning, focusing on the appearance and sequence of symbols and instructions.
In simple terms, syntactic analysis is about how words are put together to make sentences in languages,
whether they're spoken (like English) or computer-based (like Java). Just like grammar rules help us
speak correctly, syntax rules help computers understand code. It's like the recipe for a cake; if you mix
up the ingredients or the order, you won't get the cake you want. Syntax analysis checks if the code
follows the right recipe. It's important because computers are picky about how we write code.
3.1.1 Language Recognizer
The syntax analysis part of a compiler, called a language recognizer, checks if programs follow the
rules of the language it translates. Instead of testing all possible character combinations, it only needs to
check if given programs are correct. This process ensures programs are syntactically accurate. Syntax
analyzers, also known as parsers, are responsible for this task. Think of a language recognizer as a
filter; it separates correctly written sentences from those that are incorrect.
3.1.2 Language Generator
A language generator creates sentences of a language, but it's not very useful for fully understanding a
language. However, people prefer generators over recognizers because they're easier to read and
understand. Syntax checking in a compiler (a recognizer) isn't helpful for programmers because it only
works in trial-and-error mode. For instance, a programmer can only know if a statement's syntax is
correct by trying it in the compiler. In contrast, comparing a statement to the generator's structure can
often determine its correctness. Formal generation and recognition devices for the same language are
closely related, leading to formal languages.
3.1.3 Parsing
Parsing is like dissecting a sentence to understand its structure. It involves analyzing a text made of
tokens (like words) to figure out its grammar according to a set of rules. It's similar to how we break
down sentences to understand their meaning. Parsing can also refer to how sentences are split up,
especially in tricky ones that lead you down the wrong path. It's like figuring out the puzzle of a
sentence to see how all the pieces fit together.
[Link] Parse Tree
Parse trees, created from grammars, illustrate the hierarchical structure of sentences in languages. They
show how elements fit together, like branches of a tree. In a parse tree, each internal node has a non-
terminal symbol label, and each leaf has a terminal symbol label. Sub-trees represent specific parts of
the sentence. For instance, in the example statement "A = B * (A + C)," a parse tree would depict how
each component relates to the others, highlighting the structure and relationships within the statement.
For the expression "A = B * (A + C)", the parse tree would show how each part of the expression fits
together hierarchically.
=
/ \
A *
/ \
B +
/ \
A C
In this tree:
The root node is "=".
The left child of "=" is "A".
The right child of "=" is "*".
The left child of "*" is "B".
The right child of "*" is "+".
The left child of "+" is "A".
The right child of "+" is "C".
Each node represents either an operator or an operand, and the structure of the tree reflects the order of
operations and the relationship between the different parts of the expression.
[Link] Parser
A parser is a crucial part of interpreters or compilers in computing. It ensures that the syntax of a
program is correct and constructs a data structure, like a parse tree or abstract syntax tree, based on the
input tokens. Usually, it works with a lexical analyzer that breaks down input characters into tokens.
Parsers can be created manually or generated automatically by tools in some programming languages.
In simpler terms, a parser is like a grammar checker for computer programs. It makes sure that the code
you write follows the rules of the programming language. It then organizes the code into a structure
that the computer can understand. Sometimes, it works together with another tool that breaks down the
code into smaller parts. You can create a parser yourself, or some programming languages have tools
that help do it automatically.
[Link]. Types of Parsers
Parsers are tools that analyze the structure of input according to grammar rules. They work in two main
ways:
- Top-down parsing: This method starts with the start symbol and tries to build parse trees by
expanding rules from the top down. It consumes tokens from left to right and handles ambiguity by
exploring all possible options. Examples include Recursive Descent and LL parsers.
- Bottom-up parsing: In this approach, the parser starts with the input and works towards the start
symbol. It breaks down the input into basic elements and then builds up from there. LR parsers and
Shift-Reduce parsing fall into this category.
Parsing involves breaking down a sentence into symbols and analyzing each against the language's
grammar. Languages structure meaning according to their syntax, which is represented as a parse tree
in computer science or deep structure in generative grammar.
3.1.4 Syntactic Ambiquity
Syntactic ambiguity refers to sentences that can be interpreted in multiple ways due to their structure,
rather than the meanings of individual words.
3.1.5 Operator Precedence
Operator precedence dictates the order in which operations are evaluated within an expression.
Parentheses can override this order. Operations within parentheses are always performed first.
Arithmetic operators are evaluated before comparison operators, and comparison operators are
evaluated before logical operators. Within each category, operators are evaluated from left to right.
Multiplication and division are evaluated before addition and subtraction. The string concatenation
operator (&) is evaluated after arithmetic operators but before comparison operators. The "Is" operator
checks if two object references point to the same object; it doesn't compare their values.
Syntax analysis is a critical step in language implementation, regardless of the approach used. It relies
on a formal syntax description, often using context-free grammars like BNF. Syntax analyzers aim to
identify syntax errors and generate parse trees for programs. They can be either top-down or bottom-up
parsers. While parsers for all unambiguous grammars have a complexity of O(n^3), those used for
programming languages work on subclasses of unambiguous grammars with a complexity of O(n). In
essence, syntax analysis ensures that a given input string follows the rules and structure defined by the
formal grammar of the programming language.
In conclusion, syntax analysis is an essential step in language implementation, regardless of the
approach used. It is based on a formal syntax description, often using a context-free grammar like BNF.
Syntax analyzers aim to detect syntax errors in a program and generate a parse tree. They can be either
top-down or bottom-up parsers, each with its own approach to constructing parse trees. While parsers
for unambiguous grammars have a complexity of O(n^3), those used in programming language
implementation work on subclasses of unambiguous grammars and have a complexity of O(n).
Summarily, syntax analysis is a phase in compiler design where the input string is checked against the
rules and structure of a formal grammar. It verifies the syntactical structure and ensures that the input
conforms to the correct syntax of the programming language.
Assignment
1. Who are language descriptions for?
2. Describe the operation of a general language generator.
3. Describe the operation of a general language recognizer.
4. The two mathematical models of language description are generation and recognition.
Describe how each can define the syntax of a programming language
FUNDAMENTAL SEMANTIC ANALYSIS CONCEPT ON UNDERLYING MODERN
PROGRAMMING LANGUAGE
Introduction
After parsing checks that the program's structure is syntactically correct, semantic analysis goes deeper
to ensure the program makes logical sense in the programming language. Just like a proper English
sentence needs subject-verb agreement and coherent meaning, a semantically valid program must have
properly defined elements, adhere to type systems, and respect access control rules. Semantic analysis
is crucial for identifying and eliminating incorrect programs before proceeding to code generation. The
objectives include understanding operational semantics and type systems, grasping fundamental
semantic issues like variables and names, and evaluating semantic equivalence of program phrases. It's
about ensuring that the structure of the program carries meaning and coherence beyond just correct
syntax.
Semantic Analysis
Semantic analysis in programming languages explores the relationship between syntax (the structure of
code) and computation models (how code runs). It focuses on interpreting programs so that
programmers can understand them easily and predict their outcomes. One approach, called syntax-
directed semantics, maps syntactic constructs to computational models using functions. Semantic
analysis ensures that programs are semantically correct by providing task acknowledgment and
statements.
3.1.1 Operational
Operational semantics focuses on understanding a program's meaning by describing the steps needed
for its execution. Some definitions use structural operational semantics, which describes intermediate
states based on the language itself, while others use abstract machines. Operational semantics consists
of rules for evaluating expressions, statements, and programs. These rules guide the implementation of
interpreters for programming languages, making it possible to translate operational semantics into
various programming languages.
3.1.2 Denotational
Denotational semantics defines a program's meaning using abstract mathematical structures, often
involving language-specific mathematical functions.
3.1.3 Axiomatic
Axiomatic or logical semantics indirectly defines a program by applying logical axioms to its
characteristics, often used in program specification and verification.
3.2 Types of semantic analysis
Semantic analysis involves two main types: static and dynamic semantics.
3.2.1 Static semantics
Static semantics involves defining restrictions on valid program structures that are difficult or
impossible to express in standard syntactic formalisms. These rules are typically checked at compile
time in compiled languages. Examples include ensuring that identifiers are declared before use and that
labels in case statements are distinct. Static semantics often involve defining rules in a logic called a
type system and may include analyses like data flow analysis.
3.2.2 Dynamic Semantics
Dynamic semantics, on the other hand, governs how operations are performed on data during program
execution. It defines how expressions are evaluated and how control structures execute statements.
Dynamic semantics specify the behavior of a program at runtime. While natural language is often used
to specify dynamic semantics, formal semantics have been developed through academic research.
However, their application outside academia in programming language design and implementation has
been limited.
3.3 Semantic Analyzer
The semantic analyzer checks if a program follows the rules of the language by using the syntax tree
and a symbol table. It collects type information and stores it in either the syntax tree or the symbol
table. This information is then used by the compiler during intermediate code generation.
3.4 Semantic Errors
Some of the semantics errors that the semantic analyzer is expected to recognize:
Type mismatch
Undeclared variable
Reserved identifier misuse.
Multiple declaration of variable in a scope.
Accessing an out of scope variable.
Actual and formal parameter mismatch.
3.5 Functions of Semantic Analysis
3.5.1 Type Checking Ensures that data types are used in a way consistent with their definition.
3.5.2 Label Checking A program should contain labels references.
3.5.3 Flow Control Check Keeps a check that control structures are used in a proper manner (example:
no break statement outside a loop)
3.6 Basic semantic issues of variables, names, and special words in programming languages.
3.6.1 Variables
Variables in programming represent data, which can range from simple values to complex ones. They
can change their value based on conditions. When creating a variable, its data type must be declared
because different types of data are used differently in programs. Programming languages define data
types differently, from simple values like age to complex records like a student's performance over a
year.
Variables are symbolic names for quantities or information, allowing them to be used independently of
the information they represent. Compilers replace variables' names with the actual data locations,
though the data stored there may change during program execution.
For example, programming languages differentiate between integers, non-integers (decimals), and
characters. Different levels of data types exist, forming a hierarchy of complexity:
- Elementary level: Includes basic types like integers, decimals, booleans, and characters, supported by
nearly all languages. These data types can be manipulated by common operators like +, -, *, or /. The
compiler translates these operators into machine instructions, such as fixed-point and floating-point
operations.
Structured level: High-level programming languages support structured types, which are built upon
simple types. There are static structures like arrays, records, and sets, and dynamic structures like lists
and trees. Dynamic structures can change in size and shape during program execution.
Abstract level: Programmer-defined abstract data types consist of data objects with declared operations.
The internal representation of these data types is hidden from users to prevent uncontrolled
manipulation of the data objects, following the concept of encapsulation.
3.6.2 NamingConventions
In programming, variables and constants usually have multi-character names like "COST" or "total".
Single-character names like "i", "j", and "k" are commonly used for auxiliary variables, especially for
array indexes. Naming conventions, enforced at the language level, dictate the format of valid
identifiers. For instance, variable names typically can't start with a digit or contain whitespace. The
allowance of punctuation marks in variable names varies across languages; many only permit
underscores. Some languages use sigils to denote variable types, and case-sensitivity of variable names
also differs. Certain forms of variable names may be reserved for internal use, and some languages
require specific case usage for certain entities.
3.6.3 Binding
Binding refers to how a variable is created and used within a program. There are two types: Dynamic
and Static binding.
[Link] Dynamic Binding
Dynamic Binding, also called Dynamic Dispatch, maps a message to a specific code sequence at
runtime, allowing for flexibility when the appropriate method cannot be determined at compile-time. It
can change during program execution.
[Link] Static Binding
Static Binding occurs before runtime and remains constant throughout program execution.
3.6.4 Scope
Scope defines where in the program a variable can be used, while extent (or lifetime) describes when a
variable has a meaningful value during program execution. Scope is a lexical aspect of a variable, and
most languages define specific scopes for variables, which can differ within a program. Scope can be
static or dynamic.
[Link] Static Scope
Static Scope refers to the block of code where a variable is defined, excluding any nested blocks where
the variable is re-declared. It can be determined by examining the program's text and remains constant
regardless of the order of procedure calls during execution.
[Link] Dynamic Scope
Dynamic Scope, on the other hand, extends to all procedures called after its declaration during program
execution until a procedure re-declares the variable.
3.6.5 Referencing
The referencing environment consists of variables that can be accessed within a program. In a statically
scoped language, functions can only reference variables in the static referencing environment. While a
function in such a language has dynamic ancestors (its callers), it cannot access variables declared in
those ancestors.
Difference between syntax and semantics
1. Syntax refers to the structure of a program written in a programming language, while semantics
describes the relationship between the program's meaning and the computational model.
2. Syntactic errors are detected at compile time, while semantic errors are found at runtime and are
harder to identify.
3. For example, in C++, declaring a variable "s" as "int s;" is syntactically correct. However, initializing
it with a non-integer value like "Seven" is semantically incorrect because "Seven" does not represent an
integer.
4. In relation to interpretation, syntax must have meaning, while semantics is associated with the
meaning of a syntactic representation.
SELF EXERCISE ASSIGNMENT
1. List some error sematic errors that you know
2. What is the main function of semantic analyzer
3. Itemize function of sematic analysis?
4. In a tabular form state out the difference between syntax and semantics
Conclusion:
In conclusion, we introduced three methods of semantic description: operational, denotational, and
axiomatic. Operational semantics describes language constructs based on their effects on an ideal
machine. Denotational semantics represents meanings using mathematical objects and recursive
functions. Axiomatic semantics, based on formal logic, is used for proving program correctness.
We discussed static and dynamic semantics, various types of semantic errors, and explored binding and
scope, including their types. Additionally, we analyzed fundamental semantic issues related to
variables, names, and special words in programming languages.