Regular Expression and Finite
Automata
A language denoted by L over an alphabet ∑ is subset
of all the strings that can be formed out of ∑.
Simply, language is subset of kleen closure over an
alphabet ∑ :L ∑* (set of strings chosen from ∑*
defines language).
Regular Expression is a formula for representing a
language in terms of a form combined using 3
operations union, concatenation and kleen closure.
Regular Expression and Finite
Automata
A regular expression is a rule that describe a whole set
of strings belongs to certain language.
Any regular expression is recursively defined as:
Empty state (φ), empty string(ε) symbol of input
alphabet (∑) are regular expression
Let R1 and R₂ be regular expression then union of R1 and
R2 denoted by (R1 + R2) is also Regular Expression.
Let R1 and R₂ be regular expression then concatenation
of R1 and R2 denoted by (R1 . R2) is also Regular
Expression.
Let R be regular expression then kleen closure of R
denoted by R* is also Regular Expression.
Regular Expression and Finite
Automata
Regular expression are the language generator and
finite automata are the language acceptor.
The regular expression approach for describing regular
language is fundamentally different from the finite
automata approach.
However, these two notations represent exactly the
same set of languages, called regular languages.
A language is said to be regular language if and only if a
finite automata recognizes it.
Regular Expression
Example of regular expression, Assume that ∑ = {a, b, c}
Zero or more: a* means “zero or more a’s”,
To say “zero or more ab’s,” i.e., {ε, ab, abab, …..} you need to say
(ab)*.
One or more: Since a* means “zero or more a’s”, you can use
aa* (or equivalently a*a) to mean “one or more a’s”.
Similarly to describe ‘one or more ab’s”, that is {ab, abab,
ababab, ……}, you can use ab(ab)*.
Zero or one: It can be described as an optional ‘a’ with (a + ε).
Any string at all: To describe any string at all (with ∑ = { a, b, c}
you can use (a + b + c)*.
Regular Expression
Any non-empty string: This is written any character from
∑ = {a, b, c} followed by any string at all:
(a + b + c) (a + b + c)*
Any string not containing ..........: To describe any string at
all that does not contain an ‘a’ (with ∑ ={a, b, c}), you can
use (b + c)*.
Any string containing exactly one ........: To describe any
string that contains exactly one ‘a’ (with ∑ ={a, b, c}) put
“any string not containing an a”, on either side of the ‘a’
like: (b + c)* a (b + c)* or a(b+c)* or (b+c)*a
Regular Expression
Example:
(0+1)*=
0*+1*=
(01)*=
0* .1*=
1.1*=
Regular Expression
(0+1)*={ ∈,0,1,00,101,1101,00101,…………}
0*+1*={ ∈,0,1,00,11,111,0000,…………}
(01)*={∈,01,0101,010101,……………}
0* .1*= { ∈,0,1,00,11,000,01,001,0111……}
1.1*={1,11,111,11111,……………}
Regular Expression
Precedence of Regular-Expression Operators
The star Operator (* ) is of highest precedence.
Next in precedence comes the concatenation (.) or dot
operator.
Finally, unions (+) are grouped with their operands.
For example:
The expression 10*+0 is grouped (1(0)*)+0.
Regular Language
The regular languages are those languages that can be
constructed from the three set operations:
Union
Concatenation
Kleen star.
Let ∑ be an alphabet. The class of “regular languages”
over ∑ is defined inductively as follows:
ф is a regular language so as empty string {ε} is regular
language.
For each symbol σ ε Σ. {σ} is a regular language.
If L1, L2, L3 are languages, then so is L₁ L2 L3 …….
are also regular.
Regular Language
If L1, L2 languages, then so is L₁ o L2 o L3 o ………... are
also regular
If L is a regular language, then so is L*.
Nothing else is a regular language unless its construction
follows from the above rules.
Some Questions
Write a regular expression for the language. L: {w ∈ {a, b}*:
w has even number of 'a's followed by odd number of ‘b’ s
Write a regular expression for the set of strings over ∑{0, 1 }
with exactly two 0's.
Write regular expressions for the language which generates
strings of even length over the alphabet ∑{a, b}.
Describe as simply as possible in English the language
corresponding to the regular expression a*b(a*ba*b)*a*
Write a regular expression for the language in which strings
start and end with same symbol over alphabet ∑{a, b }.
Write a regular expression for the language in which strings
start and end with different symbol over alphabet ∑{a, b }.
Some Questions
Q1. Answer: (aa)*(bb)*b
Q2. Answer: 1* 0 1*0 1*
Q3. Answer: (aa+ab+ba+bb)*
Algebraic Law for RL
Commutative law for union: L + M = M + L
Associative law for union: (L + M) + N = L + (M + N)
Associative law for concatenation: (LM)N = L(MN) ,
Note that there is no commutative law for
Concatenation, i.e. LM ≠ ML
The identity for union is: L + ф = ф + L = L
The identity for concatenation is: L. ε = ε. L = L
Left distributive law: L(M + N) = LM + LN
Right distributive law: (M + N)L = LM + LN
Idempotent law: L + L = L
Algebraic Law for RE
Laws Involving Closure
(L*)* = L* – i.e. closing an already closed expression does
not change the language
ф*=ε
ε*=ε
L+ = L. L* = L*L
Conversion of Regular Expression To
Finite Automata
We can convert the finite automata to its equivalent
regular expression and vice versa.
We have already discussed that DFA and two kinds of
NFA with & without ε-transition which accept the
same class of languages.
In order to show that the regular expressions define
the same class, we must show that:
Every language defined by one of these automata is also
defined by a regular expression. For this proof, we can
assume the language is accepted by some DFA.
Conversion of Regular Expression To
Finite Automata
Every language defined by a regular expression is
defined by one of these automata. For this part of proof,
the easiest is to show that there is an NFA with ε-
transitions accepting the same language.
ε-NFA NFA
RE DFA
Fig: Plan for showing the equivalence of four different notations for regular language.
Conversion of FA to Regular
Expression
We use Arden’s Theorem to convert the FA to RE.
Let P and Q be the two regular expression such that P
does not contain the empty string (ε) then regular
expression R of the form R=Q+RP has the unique
solution as:
R=QP*
Conversion of FA to Regular
Expression
Find RE for the given DFA
b
q1 q2
a b a
q3 q4 a, b
b
Conversion of FA to Regular
Expression
Here the equation for each in terms of incoming
transitions are:
q1=ε+q2b+q3a-----------(i) {incoming coming from no where so ε}
q2=q1a-------------------(ii)
q3=q1b-------------------(iii)
q4=q2a+q3b+q4a+q4b----(iv)
Conversion of FA to Regular
Expression
Taking equation (i) since it is final state
q1=ε+q2b+q3a-----------(i)
Substituting value of q2 andq3 from (ii) and (iii)
q1=ε+q1ab+q1ba
q1=ε+q1(ab+ba)---------(v)
Equation (v) is of the form R=Q+RP, so we can write it in
the form R=QP* according to Arden’s Theorem.
i.e. q1=ε(ab+ba)*
q1=(ab+ba)*, [according to algebraic laws of
regular expression ε.R=R]
Conversion of FA to Regular
Expression
Since q1 is the final state and we have derived the
equation of q1 hence q1=(ab+ba)* is the required regular
expression.
so required RE=(ab+ba)*
Conversion of FA to Regular
Expression
Obtain the RE for the given FA
a
a a
b b b q4
q1 q2 q3
a
Conversion of FA to Regular
Expression
Here the equation for each in terms of incoming
transitions are:
q1=ε+q1a+q2a+q3a-----------(i)
q2=q1b-------------------(ii)
q3=q2b-------------------(iii)
q4=q3b+q4a+q4b----(iv)
Conversion of FA to Regular
Expression
Taking equation (i) since q1 it is final state
q1= ε+q1a+q2a+q3a
Substituting value of q2 andq3 from (ii) and (iii)
q1= ε+q1a+q1ba+q2ba
q1= ε+q1a+q1ba+q1bba
q1=ε+q1(a+ba+bba)---------(v)
Equation (v) is of the form R=Q+RP, so we can write it in the
form R=QP* according to Arden’s Theorem.
i.e. q1=ε(a+ba+bba)*
q1=(a+ba+bba)*, [according to algebraic laws of regular
expression ε.R=R]
Conversion of FA to Regular
Expression
Taking equation (ii) since q2 is also final state
q2=q1b
q2=(a+ba+bba)*b
Taking equation (iii) since q3 is also final state
q3=q2b
q3=(a+ba+bba)*bb
We have derived the expression for all the final states so
the regular expression is union of all the obtained expression
i.e.
q1 q 2 q3
RE=(a+ba+bba)* (a+ba+bba)*b (a+ba+bba)*bb
Closure Properties of Regular
Language
Closure refers to some operation on a language, resulting in
a new language that is of the same “type” as originally
operated on i.e. regular.
Those operations on regular languages that are certain to
result in regular language are known as closure qualities.
The class of language accepted by finite automata is closed
under:
Union
Concatenation
Kleen Closure
Complement
Intersection
Closure Properties of Regular
Language (Union)
Let R1 and R2 are the regular language then the union
of R1 and R2 denoted by R1+R2 or R1 R2 is also
regular.
To show that R1+R2 are regular language we need to
design a non deterministic automata which can
recognize the language (R1+R2).
For this let us consider the non deterministic automata
M1 which can recognize R1 i.e. string generated by R1
will be accepted by M1.
M1 can be defined as M1= (Q1, ∑1, δ1, q1 , F1) such
that R1=L(M1)
Closure Properties of Regular
Language (Union)
Also consider another non deterministic automata M2
which can recognize R2 i.e. string generated by R2 will
be accepted by M2.
M2 can be defined as: M2= (Q2, ∑2, δ2, q2 , F2) such
that R2=L(M2)
We now construct a non deterministic automata such
that L(M)=L(M1)+L(M2).
Closure Properties of Regular
Language (Union)
A simple non deterministic automata can be
constructed easily as:
q1 qi qf q2 qh qs
M1 M2
ε
q2 qh qs
q0
ε q1 qi qf
M
Closure Properties of Regular
Language (Union)
Simply newly constructed automata M uses non-
determinism to guess whether input is in L(M1) or in
L(M2).
The automata M then processes the string exactly as
the corresponding automata either M1 or M2 so it
follows L(M)=L(M1)+L(M2).
The formal proof can be given as:
From these two NFAs, we will construct an NFA
M = (Q, ∑, δ, q0, F), such that L(M) = L(M1) + L(M2)
Closure Properties of Regular
Language (Union)
Different tuple of M can be defined as:
Q: Q1 Q2 q0
∑ : ∑1 ∑2 {ε}
q0 : q0
F: Set of Final States (F Q)
δ : δ1 δ2 δ (q0,ε) → q1 δ (q0,ε) → q2
Closure Properties of Regular
Language (Union)
That is M beings any computation by non
deterministically choosing to enter either initial sate of
M1, or the initial state of M₂.
M then reproduces either M1, or M2, thereafter.
Hence M accepts the string w if and only if M1, accepts w
or M2 accepts w and L(M)=L(M1)+L(M2).
Hence, regular properties are closed under union.
Steps in Union of R1 and R2
Create a new start state.
Make a ε-transition from the new start state to each of
the original start states.
Closure Properties of Regular
Language (Concatenation)
Let R1 and R2 are the regular language then the
concatenation of R1 and R2 denoted by R1.R2 is also
regular.
To show that R1.R2 are regular language we need to
design a non deterministic automata which can
recognize the language (R1.R2).
For this let us consider the non deterministic automata
M1 which can recognize R1 i.e. string generated by R1
will be accepted by M1.
M1 can be defined as M1= (Q1, ∑1, δ1, q1 , F1) such
that R1=L(M1)
Closure Properties of Regular
Language (Concatenation)
Also consider another non deterministic automata M2
which can recognize R2 i.e. string generated by R2 will
be accepted by M2.
M2 can be defined as: M2= (Q2, ∑2, δ2, q2 , F2) such
that R2=L(M2)
We now construct a non deterministic automata such
that L(M)=L(M1).L(M2).
Closure Properties of Regular
Language (Concatenation)
A simple non deterministic automata can be
constructed easily as:
q1 qi qf
q2 qh qs
qa
M2
qb
M1
Closure Properties of Regular
z
Language (Concatenation)
q1 qi qf ε
qa q2 qh qs
ε
ε
qb
M
Closure Properties of Regular
Language (Concatenation)
Simply newly constructed automata M starts from M1,
after consuming all the string generated by R1 it
reaches to the one of the final states of M1.
Then using non-determinism the automata reaches to
the starting state of M2.
The automata M then processes the string exactly as
the corresponding automata of M2 making
L(M)=L(M1).L(M2).
The formal proof can be given as:
From these two NFAs, we will construct an NFA
M = (Q, ∑, δ, q0, F), such that L(M) = L(M1).L(M2)
Closure Properties of Regular
Language (Concatenation)
Different tuple of M can be defined as:
Q: Q1 Q2
∑ : ∑1 ∑2 {ε }
q0: q1
F: Set of Final States (F Q2)
δ : δ1 δ2 δ (qf, ε) → q2 where qf Є F1
That is M beings from starting state of M1 and transits
according to transition function of M1 until it reaches
the final state of M1.
Closure Properties of Regular
Language (Concatenation)
Then from any of the final state of M1 the automata by
non determinism enters to the initial state of M₂.
M then reproduces M2, thereafter.
Hence M accepts the string w if and only if M2
accepts w and L(M)=L(M1).L(M2).
Hence, regular properties are closed under
concatenation.
Steps in Concatenation of R1 and R2
Put a ε-transition from each final state of M1 to the
initial state of M2.
Make the original final states of M1 as non final.
Closure Properties of Regular
Language (Kleen Closure)
Let R1 be the regular language then the kleen closure
of R1 denoted by R1* also regular.
To show that R1* is regular language we need to design
a non deterministic automata which can recognize the
language R1*.
For this let us consider the non deterministic automata
M1 which can recognize R1 i.e. string generated by R1
will be accepted by M1.
M1 can be defined as M1= (Q1, ∑1, δ1, q1 , F1) such
that R1=L(M1)
Closure Properties of Regular
Language (Kleen Closure)
We now construct a non deterministic automata such
that L(M)=R1*.
q1 q2 qf
qa
qb
M1
Closure Properties of Regular
a
Language (Kleen Closure)
ε
ε
qs ε q1 q2 qf
qa ε qr
ε
ε qb ε
M
Closure Properties of Regular
Language (Kleen Closure)
Simply newly constructed automata M begins from new
starting state (which is also the final state as it may accept ε
string) and uses non-determinism to reach the starting
state of M1.
The automata M then processes the string exactly as the
corresponding automata of M1.
For the repetition of the string the automata uses the non-
determinism from any of the final state of M1 to the initial
state of M1 making L(M)=R1*.
The automata finally uses non-determinism to reach any of
the final state of M from the original final state of M1.
Closure Properties of Regular
Language (Kleen Closure)
The formal proof can be given as:
From the original NFA, we will construct new NFA
M = (Q, ∑, δ, q0, F), such that L(M) = R1*
Different tuple of M can be defined as:
Q: Q1 {qs} {qr}
∑: ∑1 {ε}
q0: qs
F: {qs ,qr}
δ : δ1 δ (qi, ε) → q1 δ (qs, ε) → q1 where qi Є F1
Closure Properties of Regular
Language (Kleen Closure)
Steps in Kleene Closure of R1
Make a new start state (which is also the final state as it
may accept ε string); connect it to the original start state
with a ε-transition.
Make a new final state; connect the original final state
(which becomes non final) to it with ε -transitions.
For the repetition of the string use the ε -transitions
from any of the final state of M1 to the original initial
state i.e. of M1.
Closure Properties of Regular
Language (Complement)
Let R1 be the regular language then the complement of
R1 denoted by 𝑅1 also regular.
To show that 𝑅1 is regular language we need to design a
non deterministic automata which can recognize the
language 𝑅1 .
For this let us consider the non deterministic automata
M1 which can recognize 𝑅1 i.e. string generated by 𝑅1
will be accepted by M1.
M1 can be defined as M1= (Q1, ∑1, δ1, q1 , F1) such
that 𝑅1 =L(M1)
Closure Properties of Regular
Language (Complement)
We now construct a non deterministic automata such
that L(M)=𝑅1 .
q1 qi qf q1 qi qf
qa qa
qb qb
M1 M
Closure Properties of Regular
Language (Complement)
The formal proof can be given as:
From the original NFA, we will construct new NFA
M = (Q, ∑, δ, q0, F), such that L(M) = 𝑅1
Different tuple of M can be defined as:
Q: Q1
∑: ∑1
q0: q1
F: {Q-F1}
δ : δ1
Closure Properties of Regular
Language (Complement)
Steps in Complement of R1
Make the non final states of original automata as a final state
and vice versa.
Intersection Properties of Regular Language
R1 𝑹𝟐=𝑹𝟏 +𝑹𝟐 using De-Morgan’s Law
If R1 and R2 is regular then 𝑅1 and 𝑅2 are also regular using
complement properties.
If 𝑅1 and 𝑅2 are regular then 𝑅1 +𝑅2 is also regular by union
properties.
𝑅1 +𝑅2 is regular then its complement 𝑅1 +𝑅2 is also regular
from complement properties.
𝑅1 +𝑅2 is equivalent to R1 𝑅2 so, R1 𝑅2 is also regular.