CHAPTER TWO
Regular Expression (RE) and Language
• In previous lectures, we have described the languages in terms of machine like description-finite automata
(DFA or NFA). Now we switch our attention to an algebraic description of languages, called regular
expression.
• Regular Expression are those algebraic expressions used for representing regular languages,
• Regular Expressions offer a declarative way to express the strings we want to accept
• Let Σ be an alphabet, the regular expression over the alphabet Σ are defined inductively as follows;
Basic steps:
• Φ is a regular expression representing empty language.
• Є is a regular expression representing the language of empty strings. i.e.{ Є}
• if “a’’ is a symbol in Σ, then ‘a’ is a regular expression representing the language {a}.
Now the following operations over basic regular expression define the complex regular expression as:
• if “r‘ and “s‘ are the regular expressions representing the language L(r) and L(s) then
• r U s is a regular expression denoting the language L(r) U L(s).
• r.s is a regular expression denoting the language L(r).L(s).
• r* is a regular expression denoting the language (L(r))*.
Regular operator:
Basically, there are three operators that are used to generate the languages that are regular,
Union (U / | /+): If L1 and L2 are any two regular languages then
L1UL2 ={s | s ε L1, or s ε L2 }
For Example: L1 = {00, 11}, L2 = (Є, 10}
L1UL2 = {Є, 00, 11, 10}
Concatenation (.): If L1 and L2 are any two regular languages then,
L1.L2 = {l1.l2|l1 ε L1 and l2 ε L2}
For examples: L1 = {00, 11} and L2 = {Є, 10}
L1.L2={00,11,0010,1110}
L2.L1={1000,1011,00,11}
Example: Let's take a simple language:
Note : So L1.L2 !=L2.L1 𝐿={𝑎,𝑏}
Kleene Closure (*): L={a,b}Now compute Kleene Closure L*:
L0={𝜀} (empty string)
If L is any regular Language then, L1=𝐿={𝑎,𝑏}
L* = Li =L0 UL1UL2U…………. L2=𝐿.𝐿={𝑎𝑎,𝑎𝑏,𝑏𝑎,𝑏𝑏}
L3=𝐿.𝐿.𝐿={𝑎𝑎𝑎,𝑎𝑎𝑏,𝑎𝑏𝑎,𝑎𝑏𝑏,𝑏𝑎𝑎,𝑏𝑎𝑏,𝑏𝑏𝑎,𝑏𝑏𝑏} and so on …
So,
• L∗={ε,a,b,aa,ab,ba,bb,aaa,aab,aba,abb,baa,bab,bba,bbb,… }
Note: In words: Kleene Closure means “zero or more repetitions of strings from LLL”.
Precedence of regular operator:
1. The star operator is of highest precedence. i.e it applies to its left well-formed RE.
2. Next precedence is taken by concatenation operator.
3. Finally, unions are taken.
Regular language Regular set
∅ {}
∈ {∈}
a* {∈, a, aa, aaa .....}
a+ b {a, b}
a.b {ab}
a* + ba {∈, a, aa, aaa,...... , ba}
(𝑎𝑏)∗ {ε,ab,abab,ababab,…}.
(a+b)∗ {ε,a,b,aa,ab,ba,bb,aaa,…}.
Basic identity rules
𝑅∪∅=𝑅
𝑅⋅𝜀=𝜀⋅𝑅=𝑅
𝑅⋅∅=∅⋅𝑅=∅
𝑅∪𝑅=𝑅 (idempotence)
∅∗={𝜀} (the star of empty set is {ε})
Star / closure rules
R∗=ε∪RR∗ (unfolding property)
(R∗)∗=R∗
Example 1 : Write a RE for the set of string that either start with 01 and end with 01
L = {01, 0101, 01001, 01101, 011101, 0100111, 0101, 11101, 0001……. }
So, RE for first part is 01(0+1)*
RE for second part is (0+1)*01
Finally, RE = 01(0+1)* + (0+1)* 01
Example 2: Write a RE for the set of string that have at least two consecutive 0’s or 1’s
L= {00, 11, 001001, 1100, ……. }
RE = (0+1)*00(0+1)* + (0+1)*11(0+1)*
Example 3: Write a RE for the set of string that have at least two consecutive 0’s and 1’s
L= {0011, 1100, 10001011, 110010011……. }
RE = (0+1)*00(0+1)* 11(0+1)* + (0+1)*11(0+1)* 00(0+1)*
Application of regular languages:
Validation: Determining that a string complies with a set of formatting constraints. Like email address validation,
password validation etc.
Search and Selection: Identifying a subset of items from a larger set on the basis of a pattern match.
Tokenization: Converting a sequence of characters into words, tokens (like keywords, identifiers) for later interpretation.
Exercise :
Consider Σ = {0, 1}, then some regular expressions over Σ are ;
0*10* is RE that represents language {w|w contains a single 1}
Σ*1Σ* is RE for language{w|w contains at least single 1}
Σ*001 Σ* = {w|w contains the string 001 as substring}
(Σ Σ)* or ((0+1)*.(0+1)*) is RE for {w|w is string of even length}
1*(01*01*)* is RE for {w|w is string containing even number of zeros}
0*10*10*10* is RE for {w|w is a string with exactly three 1‘s}
string that have substring either 001 or 100, the regular expression is
(1+0)*.001.(1+0)*+(1+0)*.(100).(1+0)*
For strings that have at most two 0‘s with in it, the regular expression is
1*.(0+Є).1*.(0+Є).1* For the strings ending with 11, the regular expression is (1+0)*.(11)+
Finite Automata and Regular expression
The regular expression approach for describing language is fundamentally different from the
finite automaton approach. However, these two notations turn out to represent exactly the same
set of languages, which we call regular languages. In order to show that the RE define the same
class of language as Finite automata, we must show that:
1) Any language define by one of these finite automata is also defined by RE.
2) Every language defined by RE is also defined by any of these finite automata.
Conversion from DFA to Regular Expression ( DFA to RE)
Arden‟s Theorem
Let p and q be the regular expressions over the alphabet Σ, if p does not contain
any empty string then r = q + rp has a unique solution r = qp*.
Proof:
Here, r = q + rp ......................... (i)
Let us put the value of r = q + rp on the right hand side of the relation (i), so;
r = q + (q + rp)p
r = q + qp + rp2 .................................... (ii)
Again putting value of r = q + rp in relation (ii), we get;
r = q + qp + (q +rp) p2
r = q+ qp + qp2 + qp3………………
Continuing in the same way, we will get as;
r = q + qp + qp2 + qp3………………..
r = q(Є + p + p2 +p3+…………………..
Thus r = qp* Proved.
Use of Arden‟s rule to find the regular expression for DFA:
To convert the given DFA into a regular expression, here are some of the
assumptions regarding the transition system:
- The transition diagram should not have the Є-transitions.
- There must be only one initial state.
- The vertices or the states in the DFA are as;
q1,q2,… ................. qn (Any qi is final state)
-Wij denotes the regular expression representing the set of labels of the edjes from qi to qj.
Thus we can write expressions as;
q1=q1w11+q2w12+q3w31+… .................... qnwn1+Є
q2=q1w12+q2w22+q3w32+… .................... +qnwn2
q3=q1w13+q2w23+q3w33+… .................... +qnwn3
…………………………………………………
qn=q1w1n+q2wn2+q3wn3+………………………qnwnn
Solving these equations for qi in terms of wij gives the regular expression equivalent to given DFA.
Use of Arden‟s rule to find the regular expression for DFA:
Let the equations are:
q1=q21+q30+ Є… ........ (i)
q2=q10… ....................... (ii)
q3=q11… ......................... (iii)
q4=q20+q31+q40+ q41……(iv)
Putting the values of q2 and q3 in (i)
q1=q101+q110+ Є
i.e.q1=q1(01+10)+ Є
i.e.q1= Є+q1(01+10) (since r = q+rp)
i.e. q1= Є(01+10)* (using Arden‘s rule)
Since, q1 is final state, the final regular expression for the DFA is
Є(01+10)* = (01+10)*
RE
q1 = q1a+q3a+E
q2 = q1b+ q2b+q3a
q3 = q2a
q2 = q1b+ q2b+q3a ……substitute q3 value
=q1b+ q2b+(q2a)b
q2 = q1b+ q2(b+ab)
R Q R P
then apply Arden theorem R =QP*
By applying Arden theorem
q2 = q1b(b+ab)*
Then substitute for final state
q1 = q1a+q3a+E
= q1a+ (q2a) a +E
= q1a+ q1b(b+ab)* aa +E
q1 =q1(a+b(b+ab)* aa ) +E
q1 =E +q1(a+b(b+ab)* aa ) by applying Arden theorem R =QP*
R Q R P
=E (a+b(b+ab)* aa ) *
RE =(a+b(b+ab)* aa ) *
Assignment