Basics of
Formal Languages and Computability
By
Vijay Ganesh
Last Class:
The Structure (Stages/Phases) of a Compiler
Semantic
Input Lexical Tokens
Parser
AST Analysis (Type
Program Analyzer Checking,…)
Three Intermediate
Address Representation (IR)
Output Code Format Code
Program Generation Optimizations
Costa Busch - LSU
Today’s Lecture
• Formal Languages
• Computability
• Turing machines
• Concept of Undecidability
• Undecidability of program analysis
Costa Busch - LSU
Formal Languages
Language: a possibly infinite set of finite-length strings
over a finite alphabet
String: a sequence (concatenation) of symbols
from the alphabet of the language
Alphabet: Finite set of symbols/characters, e.g.,
Σ = {a,b,…,z} or {0,1}
Example:
Alphabet: Σ = {a,b}
Strings: ε, a, b, aaa, ababa, abbb,….
We represent the set of all strings over Σ as Σ*
Language: A subset of strings of Σ*
Costa Busch - LSU
More Examples of Formal Languages
• The language over unary alphabet {a}: {ε, a, aa,
aaa,…}
• Finite Languages: The cardinality of such language is a
finite number, e.g., The set of all numbers less than
100
• Most languages we study have infinite cardinality: e.g.,
the set of even numbers
• We will study classes of formal languages such as
regular, context-free and context-sensitive languages
that are crucial for understanding compiler
construction
Costa Busch - LSU
Typical Operations on Strings
Consider a binary alphabet Σ = {a,b}
Operation name Properties Example
Concatenation(s1,s2) Non-commutative, a.b ≠b.a
associative a.(b.c) = (a.b).c
ε.a = a. ε = a
Prefix(s1) Returns the prefix ab is a prefix of abba
(Similarly define suffix) ba is a suffix of abba
Reverse(s1) Reverse(Reverse(S)) = S Reverse of ab is ba
Length(s1) Computes the number Length(ab) =2
of chars in a string Length(ε) = 0
Costa Busch - LSU
Typical Operations on Languages
Operation name Properties Representation
Union(L1, L2) Set union of the L1 U L2
corresponding sets of
strings
Intersection(L1, L2) Set intersection of the L1 L2
corresponding sets of
strings
Difference(L1,L2) Set difference of the L1 – L2
corresponding sets of
strings
Complement(L1) Set complement of L1 Σ* - L1
Costa Busch - LSU
Typical Operations on Languages
Operation name Properties Representation
Reverse(L) Reverse all strings in L {a, aab, abab}R = {a, baa,
baba}
Concatenation(L1,L2) The set of strings L1 = {a, aa,aaa,…}
which has a prefix from L2 = {b,bb,bbb,…}
L1 and a suffix from L2 L1.L2 = {ab,abb,…aab,…}
Kleene Star(L) denoted The union of (L)0 U (L)1 U (L)2 …
as L* concatenations of
strings in L1
Complement(L) Set complement of L1 Σ* - L1
Costa Busch - LSU
More on Reverse of a Language
R R
Definition: L = {w : w Î L}
More Examples:
n n
L = {a b : n ³ 0}
R n n
L = {b a : n ³ 0}
Costa Busch - LSU
More on Concatenation of Two Languages
Definition: L1L2 = {xy : x Î L1, y Î L2 }
Example: {a, ab, ba}{b, aa}
= {ab, aaa, abb, abaa, bab, baaa}
Costa Busch - LSU
N-ary Concatenation of a Language
Definition: n
L =# $!
LL" L
n
{a, b} = {a, b}{a, b}{a, b} =
3
{aaa, aab, aba, abb, baa, bab, bba, bbb}
0
Special case: L = {l}
0
{a , bba , aaa } = {l}
Costa Busch - LSU
Example
n n
L = {a b : n ³ 0}
2 n n m m
L = {a b a b : n, m ³ 0}
2
aabbaaabbbÎ L
Costa Busch - LSU
Star-Closure (Kleene *)
All strings that can be constructed from L
Definition: 0 1 2
L* = L " L " L !
Example:
ìl , ü
ïa, bb, ï
ï ï
{a, bb}* = í ý
ï aa , abb, bba , bbbb , ï
ïîaaa, aabb, abba, abbbb,!ïþ
Costa Busch - LSU
Positive Closure
Definition: L = L " L "!
+ 1 2
ìa, bb, ü
+ ï ï
{a, bb} = íaa, abb, bba, bbbb, ý
ïaaa, aabb, abba, abbbb,!ï
î þ
Costa Busch - LSU
The * Operation on alphabets
S * : the set of all possible strings from
alphabet S
S = {a, b}
S* = {l , a, b, aa, ab, ba, bb, aaa, aab,!}
Costa Busch - LSU
The + Operation on alphabets
+ : the set of all possible strings from
S
alphabet S except l
S = {a, b}
S* = {l , a, b, aa, ab, ba, bb, aaa, aab,!}
+
S = S * -l
+
S = {a, b, aa, ab, ba, bb, aaa, aab,!}
Costa Busch - LSU
Two special languages
Language with
Empty language empty string
{} {l }
Size of a language (number of elements):
| {} |= 0
| {l} |= 1
| {a , aa , ab }|= 3
| { l, aa , bb , abba , baba } |= 5
Costa Busch - LSU
Connection between
Formal Languages and Computation
Languages are used to describe computational
problems:
For example, consider the PRIMES problem is
“given a natural number X decide whether X is a
PRIME?”
• Very famous problem that was recently
shown to be in the complexity class P.
Costa Busch - LSU
Connection between
Formal Languages and Computation
All Computational problems can be described as Language or Set
membership problems, i.e., “given a string X, and a language S,
does there exist an algorithm to decide whether X belong to S?”
The PRIMES problem can be equivalently written as a set
membership problem over the language of strings over digits:
• Alphabet: {0,1,2,…,9}
• Example Strings: 0, 100,..
• Language of PRIMES: {2,3,5,7,…}
Question: How can the compilation problem be described using
the paradigm of language membership?
Costa Busch - LSU
Languages, Problems and Their Solutions
Problem S: Computational problems can be described as Language
or Set membership problems, i.e., “given a string X, and a
language S, does there exist an algorithm/function/method to
decide whether X belongs to S?”
(Side Note: We can always reduce optimization problems to membership/decision
problems.)
Solution to S: The algorithm C that takes as input any X and
correctly decides whether X belongs to S is said to be a solution
to the problem S.
However, we need to be a bit more precise about what we mean
by an algorithm C, and the possible outputs C can produce for a
problem S.
Costa Busch - LSU
Languages and Turing Machines
Enter Alan Turing:
• Proposes the Turing Machine (TM)
• Establishes the existence of an Universal Turing Machine
• Any computable function (or method or algorithm) can be
implemented as a “program” on TM
Solution to S in terms of Turing Machines: We say computational
problem S has a solution if there exists a Turing Machine P that
correctly decides for any string X whether X belongs to S.
Informally, a Turing Machine (TM) is a finite-state machine with
a potentially infinite read-write tape
Costa Busch - LSU
A Turing Machine
Tape
...... ......
Read-Write head
Control Unit
• Control unit is a finite
state machine
• Q set of finite states
• q0 initial state
• F final set of states
• Σ finite alphabet
• Transition function:
Q x Σ èQ x Σ x {L,R}
Costa Busch - LSU
Languages and Turing Machines
Any computable function (or method or algorithm) can be implemented
as a “program” on TM
Solution to S in terms of Turing Machines: We say computational
problem S has a solution if there exists a Turing Machine P that
correctly decides for any string X whether X belongs to S.
We need some more precision in our description of the output of P:
•P can correctly say X in S and HALT
•P can correctly say X is not in S and HALT
•P can loop forever
•(P can also have “bugs” and produce incorrect results. We will ignore
this case for now.)
Costa Busch - LSU
Decidability of Languages
Definition of Turing-acceptable languages: We
say a language L is Turing-Acceptable if there
exists a Turing Machine P that given any w
determines if w belongs to L. We say P accepts L.
More precisely, for any string :
wÎ L P halts in an accept state
wÏ L P halts in a non-accept
state or does not terminate
Costa Busch - LSU
Decidability of Languages
Definition of decidable languages: We say a
language L is decidable if there exists a Turing
Machine P that given any w determines whether
w belongs to L.
More precisely, for any string :
wÎ L P halts in an accept state
wÏ L P halts in a non-accept state
Side Note: Languages that are not decidable are called
undecidable.
Costa Busch - LSU
Connection between
Turing-acceptable and decidable languages
Theorem: Every decidable language is Turing-
acceptable.
Proof: Follows directly from the definitions of
decidable and Turing-acceptable languages.
However, not every Turing-acceptable language is
decidable. For example, the halting problem.
(Side note: Turing-acceptable languages are also called recursively-
enumerable.)
Costa Busch - LSU
Non Turing-Acceptable L
Turing-Acceptable L
Decidable
Costa Busch - LSU
A Turing Machine
Tape
...... ......
Read-Write head
Control Unit
Costa Busch - LSU
The Tape
No boundaries -- infinite length
...... ......
Read-Write head
The head moves Left or Right
Costa Busch - LSU
...... ......
Read-Write head
The head at each transition (time step):
1. Reads a symbol
2. Writes a symbol
3. Moves Left or Right
Costa Busch - LSU
Example:
Time 0
...... a b a c ......
Time 1
...... a b k c ......
1. Reads a
2. Writes k
3. Moves Left
Costa Busch - LSU
Time 1
...... a b k c ......
Time 2
...... a f k c ......
1. Reads b
2. Writes f
3. Moves Right
Costa Busch - LSU
The Input String
Input string Blank symbol
...... à à a b a c à à à ......
head
Head starts at the leftmost position
of the input string
Costa Busch - LSU
States & Transitions
Read Write
Move Left
q1 a ® b, L q2
Move Right
q1 a ® b, R q2
Costa Busch - LSU
Example:
Time 1
...... à à a b a c à à à ......
q1
current state
q1 a ® b, R q2
Costa Busch - LSU
Time 1
...... à à a b a c à à à ......
q1
Time 2
...... à à a b b c à à à ......
q2
q1 a ® b, R q2
Costa Busch - LSU
Example:
Time 1
...... à à a b a c à à à ......
q1
Time 2
...... à à a b b c à à à ......
q2
q1 a ® b, L q2
Costa Busch - LSU
Example:
Time 1
...... à à a b a c à à à ......
q1
Time 2
...... à à a b b c g à à ......
q2
q1 à ® g, R q2
Costa Busch - LSU
Determinism
Turing Machines as originally defined
are deterministic
Allowed Not Allowed
a ® b, R q2 a ® b, R q2
q1 q1
q3 a ® d, L q3
b ® d, L
No lambda transitions allowed
Costa Busch - LSU
Partial Transition Function
Example:
...... à à a b a c à à à ......
q1
a ® b, R q2 Allowed:
q1 No transition
for input symbol c
b ® d, L q3
Costa Busch - LSU
Halting
The machine halts in a state if there is
no transition to follow or out of the state
Costa Busch - LSU
Halting Example 1:
...... à à a b a c à à à ......
q1
q1 No transition from q1
HALT!!!
Costa Busch - LSU
Halting Example 2:
...... à à a b a c à à à ......
q1
a ® b, R q2
No possible transition
q1 from q1 and symbol c
b ® d, L q3 HALT!!!
Costa Busch - LSU
Accepting States
q1 q2 Allowed
q1 q2 Not Allowed
•The machine accepts and halts
Costa Busch - LSU
Acceptance
If machine halts
Accept Input
string in an accept state
If machine halts
in a non-accept state
Reject Input or
string
If machine enters
an infinite loop
Costa Busch - LSU
Decidable Languages
Recall that:
A language A is decidable,
if there is a Turing machine M (decider)
that accepts the language A and
halts on every input string
Decision
Turing Machine M On Halt:
YES
Accept
Input
string Decider for A NO
Reject
Costa Busch - LSU
Recall: A computational problem is decidable
if the corresponding language is decidable
We also say that the problem is solvable
Costa Busch - LSU
Halting Problem
Input: •Turing Machine M
•String w
Question: Does M halt while
processing input string w ?
Corresponding language:
HALTTM = { M ,w : M is a Turing machine that
halts on input string w }
Costa Busch - LSU
The Diagonalization proof
Theorem: HALTTM is undecidable
(The halting problem is unsolvable)
Proof:
Basic idea:
Assume for contradiction that
the halting problem is decidable;
we will obtain a contradiction
using a diagonalization technique
Costa Busch - LSU
Suppose that HALTTM is decidable
Input
string Decider
for HALTTM
M ,w
M YES M halts on w
H
w NO doesn’t
M w
halt on
Costa Busch - LSU
Looking inside H
Decider for HALTTM
H qaccept YES
Input string:
M ,w q0 M halts on w?
qreject NO
Costa Busch - LSU
Construct machine H¢ :
H¢ Loop forever
H qaccept YES qa qb
M ,w q0 M halts on w?
qreject NO
If M halts on input w Then Loop Forever
Else Halt
Costa Busch - LSU
Construct machine F :
F
Copy M M, M
M
on tape
H¢
If M halts on input M
Then loop forever
Else halt
Costa Busch - LSU
Run F with input itself
F
Copy <F> <F,F>
<F>
on tape
H¢
If (F halts on input <F>)
Then F loops forever on input <F>
Else (F does not halt <F>) F halts on input <F>
CONTRADICTION!!!
END OF PROOF
Costa Busch - LSU
We have shown:
Undecidable HALTTM
Decidable
Costa Busch - LSU
We can actually show:
Turing-Acceptable HALTTM
Decidable
Costa Busch - LSU
HALTTM is Turing-Acceptable
Turing machine that accepts HALTTM :
1. Run M on input w
M ,w 2. If M halts on w
then accept M,w
Costa Busch - LSU
Static Program Analysis is Undecidable
• We can show that static analysis is Turing-acceptable,
And consequently so is code optimization in general.
• Informal Proof by reduction:
Let say that static analysis is decidable. It
immediately follows that we can easily analyze any
program and determines if it halts on an input. If so
we have solved the halting problem.
CONTRADICTION!!
Costa Busch - LSU
What was Covered in Today’s Lecture
• Formal Languages
– Definition of alphabets, strings, and languages
– Basic operation on strings and languages
• Computability
– What is a computational decision problem?
– We defined decision problems as membership over formal languages
– Introduced the idea of algorithm, and Turing Machines
• Concept of Undecidability
– The Halting problem
– Diagonalization proof of the undecidability of the Halting problem
• Undecidability of program analysis
Costa Busch - LSU