0% found this document useful (0 votes)
8 views33 pages

Syntax and Semantics

The document discusses the fundamental concepts of syntax and semantics in programming languages, emphasizing the distinction between program structure and execution. It covers topics such as grammars, parsing, and the roles of compilers and interpreters, along with examples of language constructs. The importance of understanding both syntactic and semantic rules is highlighted, as they dictate the legality and meaning of programs.

Uploaded by

nnthalu
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views33 pages

Syntax and Semantics

The document discusses the fundamental concepts of syntax and semantics in programming languages, emphasizing the distinction between program structure and execution. It covers topics such as grammars, parsing, and the roles of compilers and interpreters, along with examples of language constructs. The importance of understanding both syntactic and semantic rules is highlighted, as they dictate the legality and meaning of programs.

Uploaded by

nnthalu
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Syntax and Semantics

IN3040 – Programming Languages


Eyvind W. Axelsen - eyvinda@[Link]

Slides adapted from previous years’ slides made


by Birger Møller-Pedersen - birger@[Link]
Outline
• Program != program execution

• Compiler/interpreter
• This is not a compiler course
• …but some basic knowledge of language constructs is needed

• Syntax
§ Grammars
§ BNF
§ Scanning/Parsing
Program != program execution
Program Program
Machine
execution
Syntax Semantics

int x = 1;
• Important topics for this
course: procedure a() {
int x;
• Understand design tradeoffs b();
in PL design }
• Understand how programs
are executed procedure b() {
Which x are we writing to?
x = 2;
• Understand how languages }
are implemented (though
this is not a compiler course) a();
print x; What will be printed?
Syntax != Semantics
A description of a programming language consists of two main components:
• Syntactic rules
• What form does a legal program have.
• Semantic rules:
• Which programs are meaningful?
• What do the sentences (of meaningful programs) in the language mean?

• Static semantics: rules that may be checked before the execution of the program, e.g.:
• All variables must be declared.
• Declaration and use of variables coincide (type check).
• Different languages have different rules!
• Dynamic semantics: What shall happen during the execution of the program?
• Operational semantics, that is a semantics that describes the behavior of an (idealized) abstract machine
performing a program,
• or, mapping to something else (but well-known and well-defined) - denotational semantics.
Syntax
matters!

==
Compiler/interpreter
Program
(source) Scanner

• A compiler translates a
program to another tokens Parser

language
• Typically a machine
language Parse/
Static
(Abstract)
Semantic
• Or a language for a virtual Syntax Tree
Checker
machine
• Do you know any such
Abstract
languges? Syntax Tree Code Machine/
Generator Byte Code
decorated

§ An interpreter reads a program and


simulates its operations. Interpreter
Virtual
Machine

§ Both are (or can be) based upon an Machine


abstract syntax tree representation of the
program
Interpreters
§ You will make one in the obligs!

§ An interpreter is a program that will


– Parse the source code of a given language (the parser)
§ We skip this step in the oblig
– Execute the instructions/sentences one by one, at runtime
§ (without first compiling to machine code)

§ Examples of (typically) interpreted languages:


– Python, Ruby, Perl, Basic, JavaScript, ...
– The latter currently ”rules the web” – interesting development with transpilers

§ Why interpreters?
– Allows very dynamic languages that can do things a compiler cannot check
– Very quick write à run cycle (no compiling, just run the thing)
– Relatively easy to implement (as you will see for yourselves!)

§ Why not?
– Execution speed
– Compiler feedback
Syntax

Imagine a very simple language, that allows us to write simple numeric expressions, e.g.

1+2
42 + 42
11 – 9 + 2

etc.

How do we express the grammar (the syntax) for this language?


Syntax: described by BNF grammars
production, rule

e ::= n
e ::= e+e Terminals are
e ::= e-e nonterminal found in the
n ::= d program text
n ::= nd
Non-terminals
d ::= 0 terminal are not
d ::= 1
...
d ::= 9 terminal

metasymbol
meta-language
Extended BNF
• In Extended BNF (eBNF) we can use the following metasymbols on the
righthand side:
| alternatives

[…] optionality (0 or 1 time)

* zero or more times (from regular expressions – alternatively {...})


+ one or more times (from regular expressions)

(…) grouping symbols (sometimes {...} is used)

e ::= n | e + e | e - e
Grammar from previous slide expressed n ::= d | nd
more concisely with eBNF d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
e
Derivation of sentences Parse tree for 10 - 15 + 12

• The possible sentences in a language defined


by a BNF-grammar are those that emerge by e - e
following this procedure:
1. Start with the start symbol (e).
2. For each nonterminal symbol (e, n, d) n e + e
exchange this with one of the
alternatives on the right hand side of n d
the production defining this n n
nonterminal.
3. Repeat §2 until only terminal symbols d 0 n d n d
remain.
1 d 5 d 2
• This is called a derivation from the start
symbol to a sentence, represented by a e
parse tree / (concrete) syntax tree 1 1
• Removing unnecessary derivations and
nodes gives an abstract syntax tree e - e

e ::= n | e + e | e - e
n ::= d | nd 10 e + e
d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Abstract syntax tree
15 12 Relevant for oblig!
Only one possible production? e ::= n | e + e | e - e
n ::= d | nd
10 - 15 + 12 = ? d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

e e

e - e e + e

10 e + e e - e 12

15 12 10 15

10 – (15 + 12) = (10 – 15) + 12 =


10 – 27 = -5 + 12 =
-17 7
Unambiguous/ Ambiguous Grammars
• If every sentence in the language can be derived by one and only one parse tree,
then the grammar is unambiguous, otherwise it is ambiguous.

e ::= 0 | 1 | e + e | e - e | e * e

• Ambiguity handled by associativity and precedence rules

e
e
1–1+1

e e e e

e e e e

1 - 1 + 1 1 - 1 + 1

Which is «correct»?
1 – 1 * 1 Which is «correct»?

e
e

e e e e

e e e e

1 - 1 +
* 1 1 - 1 *+ 1
A somewhat more interesting language

s ::= v := e | s ; s | if b then s | if b then s else s


v ::= x |y |z
e ::= v |0 |1|2|3|4
b ::= e=e
if b1 then if b2 then s1 else s2
s is s2 executed?

b1 = true
if b1 then s b2 = false
yes
no
if b2 then s1 then
else s1
s2

s b1 = false
b2 = true

if b1 then s else s2 no
yes
17

Parsing matters!
if b2 then s1
x:=1; y:=2; if x=y then y:=3
s –> s;s
s s –> assign (x := 1)
s –> s;s
s –> assign (y := 2)
s s s –> if b then s

v := e s ::= v := e | s ; s |
if b then s | if b then s else s
s s
v ::= x | y | z
x 1
e ::= v | 0 | 1 | 2 | 3 | 4
b ::= e = e
v := e if b then s

y 2
v := e

e = e
y 3

x y
x:=1; y:=2; if x=y
s
then y:=3 s –>
s –>
s;s
s;s
s –> assign (x := 1)
s –> assign (y := 2)
s –> if b then s
s s

s ::= v := e | s ; s |
s s if b then s | if b then s else s
v ::= x | y | z
s e ::= v | 0 | 1 | 2 | 3 | 4
b ::= e = e
v := e

if b then s
y 2

v := e
v := e
e = e
y 3 19

x 1
x y
Alternatives to EBNF grammars for describing
syntactic program structure
• Syntax diagrams
• Metamodels
• Automata/State Machines
Syntax diagram
- Older textbooks and reference manuals
often had this kind of notation for syntax
- «Jernbanediagram»

exp
exp + term

term

term
term * num

num

Corresponding EBNF:
exp ::= exp + term | term
term ::= term * num | num
Metamodels
Object model representing the program (not the execution)

statement ::= assignment | if-then-else | while-do

statement

assignment if-then-else while-do


Why metamodels for language descriptions?
• Inspired by abstract syntax trees in terms of object
structures, interchange formats between tools

• Growing interest in domain specific languages, often


with a mixture of text and graphics

• Meta models often include name, binding and type


information in addition to the pure abstract syntax tree
• «annotated syntax tree»
• Very useful when constructing type checkers, compilers etc
Metamodel levels
(details of this slide is not on the curriuculum, no need to worry J)

Model conformance: M2 → M3, M1 → M2

M3
Class Attribute
Meta-metamodel
(MOF)
• •• ••• • •• • ••

M2 Class Method If-Stmt

Metamodel
(Java)

• •• ••• • •• • ••
M1
class Car process() if( … ) { … }
Model
(A program)

M0
Runtime Runtime Runtime
Runtime objects [Class] [Method] [If-Stmt]
(A program)
Automata/State Machines
§ Transitions marked with terminals, one start state
and a number of stop states
§ Recognizes a string in the language if the terminals
represent a valid sequence of transitions ending up
in a stop state upon reading the last symbol
§ Typically used for the part
of the grammar that
recognizes the smallest
elements (tokens), called
the scanner
Parsing – using the grammar
• To check that a sentence (or a program) is syntactically correct, that is to
construct the corresponding syntax tree.
• Like we have seen some slides back

• In general we would like to construct the tree by reading the sentence


once, from left to right.

• Example grammar

exp ::= exp + term | term


term ::= term * num | num
Top-down parsing
The parse tree is constructed downwards, that is
we start with the start symbol and try to derive the
actual sentence by selecting appropriate rules:

exp ::= exp + term | term exp

term ::= term * num | num


exp term

term

num * num + num


Bottom-up parsing
The tree is constructed upwards. Starts by finding part of
the sentence that corresponds to the right hand side of a
production and reduces this part of the sentence to the
corresponding nonterminal. The goal is to reduce until the
start symbol.

exp ::= exp + term | term exp

term ::= term * num | num


exp term

term

num * num + num


LL(1)-parsing – a practical approach
• LL(1)-parsing is a top-down strategy with a left derivation from the start symbol
(the leftmost symbol).
• A common approach to parsing that is simple and efficient
• The number 1 is for 1 single lookahead-token (from the scanner)

• Recursive descent – LL(k)


§ To each nonterminal there is a method.
§ The method takes care of the rule for for this nonterminal, and may call other methods.
• For each terminal in the right hand side: Check that the next token (from the scanner) is this terminal.
• For each nonterminal in the right hand side: Call the corresponding method.
§ When the method is called, the scanner shall have as its next token the first token of the
corresponding rule.
§ When the method is finished, the scanner shall have as its next token the first token after the
sentence.
Example – recursive descent parser
program ::= stmtList
stmtList ::= stmt +
stmt ::= input | output | assignment
input ::= ? variable
output ::= ! variable
assignment ::= variable = variable operator operand
operator ::= + | -
operand ::= variable | number
variable ::= v digit
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
number ::= digit + void assignment() {
variable();
readToken('=');
variable();
operator();
operand();
}
Example – recursive descent parser
program ::= stmtList
stmtList ::= stmt +
stmt ::= input | output | assignment
input ::= ? variable
output ::= ! variable
assignment ::= variable = variable operator operand
operator ::= + | -
operand ::= variable | number
variable ::= v digit
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
number ::= digit + void stmt() {
if (checkToken('v')) {
assignment(); }
else if (checkToken('?')) {
input(); }
else if (checkToken('!')) {
output(); }
else error();
}
Summary
• A programming language has syntactic and semantic rules

• The syntactic rules control the (textual) shape of legal programs


• A parser recognizes program text, and can transform a program text into a parse
tree, and an abstract syntax tree
• A grammar is ambigous if there are more than one way to parse its sentences

• The semantic rules control which textually legal programs have meaning,
and what this meaning is
• What a program will do at runtime cannot be determined by its syntax alone
• It depends on the semantics!
• More on this later
• Next week: Logic
programming with Volker

• Until then: Have a great


weekend!

Image credit: ESA/Hubble & NASA, J. Bally;


Acknowledgment: M. H. Özsaraç

You might also like