Automata and Formal Languages
Dr. Sankhadeep Chatterjee
Regular Expressions
• Regular expressions are an algebraic way to describe
languages.
• They describe exactly the regular languages.
• If E is a regular expression, then L(E) is the language it
defines.
• We’ll describe RE’s and their languages recursively.
The Operators of Regular Expressions
• The Union of two languages L and M, denoted 𝐿 ∪ 𝑀, is the set
of strings that are in either L or M, or both.
• For example, if 𝐿 = {001, 10, 111} and 𝑀 = 𝜖, 001 then 𝐿 ∪ 𝑀 =
{𝜖, 001,10, 111}
• The Concatenation of languages L and M is the set of strings
that can be formed by taking any string in L and concatenating
it with any string in M.
• For example, if 𝐿 = {001, 10, 111} and 𝑀 = 𝜖, 001 then 𝐿. 𝑀 =
{001, 10, 111, 001001, 10001, 111001}
The Operators of Regular Expressions
• The Closure (or star or Kleene closure) of a language L is denoted L*
represents the set of those strings that can be formed by taking any
number of strings from L.
• If L = {0, 1}, then L* denotes all strings of 0’s and 1’s.
• Note: Formally, 𝐿∗ = ≥𝑖ڂ0 𝐿𝑖 which is an infinite set
• Can you give an example of a language whose closure is not infinite?
• 𝜙 ∗ = {𝜖} as 𝜙 0 = {𝜖} and 𝜙 𝑖 is empty for 𝑖 ≥ 1
Building Regular Expressions
• Basis 1: If ‘a’ is any symbol, then a is a RE, and L(a) = {a}.
• Note: {a} is the language containing one string, and that string is of length 1.
• Basis 2: ε is a RE, and L(ε) = {ε}.
• Basis 3: ∅ is a RE, and L(∅) = ∅.
Building Regular Expressions
• Induction 1: If E1 and E2 are regular expressions, then E1+E2 is a
regular expression, and L(E1+E2) = L(E1)L(E2).
• Induction 2: If E1 and E2 are regular expressions, then E1E2 is a regular
expression, and L(E1E2) = L(E1)L(E2).
• Concatenation : the set of strings wx such that w is in L(E1) and x is in L(E2).
• Induction 3: If E is a RE, then E* is a RE, and L(E*) = (L(E))*.
• Closure, or “Kleene closure” = set of strings w1w2…wn, for some n > 0, where
each wi is in L(E).
• Note: when n=0, the string is ε.
Precedence of Operators
• Parentheses may be used wherever needed to influence the
grouping of operators.
• Order of precedence is * (highest), then concatenation, then +
(lowest).
• Examples:
• L(01+0) = {01, 0}.
• L(0(1+0)) = {01, 00}.
Regular expressions
• Identities & Annihilators
• ∅ is the identity for +.
• R + ∅ = R.
• ε is the identity for concatenation.
• εR = Rε = R.
• ∅ is the annihilator for concatenation.
• ∅R = R ∅ = ∅.
• Examples of Language of Regular Expressions:
• L(01) = {01}.
• L(0*) = {ε, 0, 00, 000,… }.
• L((0+10)*(ε+1)) = all strings of 0’s and 1’s without two consecutive 1’s.
Regular expressions
1. Ø+R=R 8. (R*)* = R*
2. ØR = RØ = Ø 9. ε + RR* = R*
3. εR = R ε = R 10. (PQ)*P = P(QP)*
4. ε* = ε 11. (P+Q)* = (P*Q*)* = (P*+Q*)*
5. R+R=R 12. (P+Q)R = PR + QR
6. R*R* = R* 13. R(P+Q) = RP + RQ
7. RR* = R*R
Examples
• L = {w | length of w = 2} over 𝚺 = {𝒂, 𝒃}
• 𝑎 + 𝑏 . (𝑎 + 𝑏)
• L = {w | length of w ≥ 2} over 𝚺 = {𝒂, 𝒃}
• 𝑎 + 𝑏 . 𝑎 + 𝑏 . (𝑎 + 𝑏)∗
• L = {w | length of w is even} over 𝚺 = {𝒂, 𝒃}
• ( 𝑎 + 𝑏 . (𝑎 + 𝑏))∗
• L = {w | no. of a’s in w = 2} over 𝚺 = {𝒂, 𝒃}
• 𝑏 ∗ 𝑎𝑏 ∗ 𝑎𝑏 ∗
• L = {w | no. of a’s in w ≥ 2} over 𝚺 = {𝒂, 𝒃}
• 𝑏 ∗ 𝑎𝑏 ∗ 𝑎(𝑎 + 𝑏)∗
• L = {w | no. of a’s in w is even} over 𝚺 = {𝒂, 𝒃}
• (𝑏 ∗ 𝑎𝑏 ∗ 𝑎𝑏 ∗ )∗ +𝑏 ∗
Examples
• Language = {w | w contains a single '1' over {0,1}}
• RE: 0∗10∗ (any number of 0s, followed by one 1, then any number of 0s)
• Language = {w | w contains '1' at every odd position and
total length is odd}
• RE: 1((0+1)1)∗
• Logic: Starts with 1 (first odd position), then repeats pairs where the
second symbol in the pair (the next odd position) is always 1
Regular Expression to 𝜖-NFA
a
• Symbol a:
ε
• ε:
• ∅:
RE to ε-NFA: Induction 1 – Union
For E1
ε ε
ε ε
For E2
For E1 E 2
RE to ε-NFA: Induction 2 – Concatenation
ε
For E1 For E2
For E1E2
RE to ε-NFA: Induction 3 – Closure
ε ε
For E
For E*
RE to ε-NFA: Example
• Convert the regular expression (0+1)*1(0+1) to an 𝜖-NFA
RE to ε-NFA: Example
• Convert the regular expression (0+1)*1(0+1) to an 𝜖-NFA
Arden’s Theorem
• Let P and Q be two regular expressions. If P does not contain null
string then R = Q + RP has a unique solution R = QP*
• Proof:
R = Q + (Q + RP)P [After putting the value R = Q + RP]
= Q + QP + RPP
Putting the value of R recursively again and again−
R = Q + QP + QP2 + QP3…..
R = Q (ε + P + P2 + P3 + …. )
R = QP* [As P* represents (ε + P + P2 + P3 + ….) ]
Hence, proved.
Converting DFA to Regular Expression
Step 1: Construct algebraic equations for each state 𝑞𝑖 as follows:
𝒒𝒊 = 𝒒𝒋 . 𝒂 such that 𝛿 𝑞𝑗 , 𝑎 = 𝑞𝑖
Step 2: Add 𝜖 if 𝑞𝑖 is start state
Step 3: Solve the equations using Arden’s Theorem to get the solution
for final state
𝑞1 = 𝑞1 1 + 𝜖 𝑞1 = 𝑞1 1 + 𝜖 𝑞2 = 𝑞2 1 + 𝑞2 0 + 𝑞1 0
𝑞2 = 𝑞2 1 + 𝑞2 0 + 𝑞1 0 𝑞1 = 𝜖 + 𝑞1 1 𝑞2 = 𝑞2 1 + 0 + 1∗ 0
By Arden’s Theorem: 𝑞2 = 1∗ 0 + 𝑞2 1 + 0
By Arden’s Theorem:
𝑅 = 𝑞1
𝑄=𝜖 𝑅 = 𝑞2
𝑃=1 𝑄 = 1∗ 0
𝑃 = (1 + 0)
𝑞1 = 𝑅 = 𝜖. 1∗ = 1∗ 𝑞2 = 𝑅 = 1∗ 0 1 + 0 ∗
Examples
(01)*0 (b+aa)a*