Regular Experssions

Report
Regular Expressions
• Highlights:
– A regular expression is used to specify a language, and it does so
precisely.
– Regular expressions are very intuitive.
– Regular expressions are very useful in a variety of contexts.
– Given a regular expression, an NFA-ε can be constructed from it
automatically.
– Thus, so can an NFA be constructed, and a DFA, and a corresponding
program, all automatically!
1
Two Operations
•
Concatenation:
–
–
–
•
Language Concatenation: L1L2 = {xy | x is in L1 and y is in L2}
–
–
–
•
x = 010
y = 1101
xy = 010 1101
L1 = {01, 00}
L2 = {11, 010}
L1L2 = {01 11, 01 010, 00 11, 00 010}
Language Union:
–
–
–
L1 = {01, 00}
L2 = {01, 11, 010}
L1L2 = {01, 00, 11, 010}
2
Operations on Languages
•
Let L, L1, L2 be subsets of Σ*
•
Concatenation: L1L2 = {xy | x is in L1 and y is in L2}
•
Concatenating a language with itself:
L0 = {ε}
Li = LLi-1, for all i >= 1
3
Kleene closure
Say, L, or L1 ={a, abc, ba}, on Σ ={a,b,c}
Then, L2 = {aa, aabc, aba, abca, abcabc, abcba, baa, baabc, baba}
L3= {a, abc, ba}. L2
…..
But, L0 = {ε}
Kleene closure of L, L* = {ε, L1, L2, L3, . . .}
4
Operations on Languages
•
Let L, L1, L2 be subsets of Σ*
•
Concatenation: L1L2 = {xy | x is in L1 and y is in L2}
•
Union is set union of L1 and L2

•
Kleene Closure:
L*
•
Positive Closure:
L+
•
Question: Does L+ contain ε?
=
 Li = L0 U L1 U L2 U…
i 0

=
 Li = L1 U L2 U…
i 1
5
Definition of a Regular Expression
•
Let Σ be an alphabet. The regular expressions over Σ are:
– Ø
– ε
– a
Represents the empty set { }
Represents the set {ε}
Represents the set {a}, for any symbol a in Σ
Let r and s be regular expressions that represent the sets R and S, respectively.
–
–
–
–
•
r+s
rs
r*
(r)
Represents the set R U S
Represents the set RS
Represents the set R*
Represents the set R
(precedence 3)
(precedence 2)
(highest precedence)
(not an operator, rather provides
precedence)
If r is a regular expression, then L(r) is used to denote the corresponding language.
6
•
Examples: Let Σ = {0, 1}
(0 + 1)*
01*
All strings of 0’s and 1’s
0 followed by any number 1’s
0(0 + 1)*
All strings of 0’s and 1’s, beginning with a 0
(0 + 1)*1
All strings of 0’s and 1’s, ending with a 1
(0 + 1)*0(0 + 1)*
All strings of 0’s and 1’s containing at least one 0
(0 + 1)*0(0 + 1)*0(0 + 1)*
All strings of 0’s and 1’s containing at least two 0’s
(0 + 1)*01*01*
All strings of 0’s and 1’s containing at least two 0’s
(1 + 01*0)*
All strings of 0’s and 1’s containing an even number of 0’s
1*(01*01*)*
All strings of 0’s and 1’s containing an even number of 0’s
(1*01*0)*1*
(0+1)* = (0*1*)*
All strings of 0’s and 1’s containing an even number of 0’s
Any string, or (sigma)*, sigma={0, 1} in all cases here
7
•
Question: Is there a unique minimum regular expression for a given language?
•
Identities:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Øu = uØ = Ø
Like multiplying by 0
ε u = uε = u
Like multiplying by 1

*
Ø* = ε
L =  Li = L0 U L1 U L2 U…
i 0
ε* = ε
= {ε}
u+v = v+u
u+Ø=u
u+u=u
u* = (u*)*
u(v+w) = uv+uw [which operation is hidden before parenthesis?]
(u+v)w = uw+vw
(uv)*u = u(vu)* [note: you have to have a single u, at start or end]
(u+v)* = (u*+v)*
= u*(u+v)*
= (u+vu*)*
= (u*v*)*
= u*(vu*)*
= (u*v)*u*
8
Equivalence of Regular Expressions
and NFA-εs
•
Note:
Throughout the following, keep in mind that a string is accepted by an NFA-ε
if there exists ANY path from the start state to any final state.
•
Lemma 1: Let r be a regular expression. Then there exists an NFA-ε M such
that L(M) = L(r). Furthermore, M has exactly one final state with no
transitions out of it.
•
Proof: (by induction on the number of operators, denoted by OP(r), in r).
9
Basis: OP(r) = 0
Then r is either Ø, ε, or a, for some symbol a in Σ
For Ø:
q0
qf
For ε:
qf
For a:
q0
a
qf
10
Inductive Hypothesis: Suppose there exists a k  0 such that for any regular
expression r where 0  OP(r)  k, there exists an NFA-ε such that L(M) = L(r).
Furthermore, suppose that M has exactly one final state.
Inductive Step: Let r be a regular expression with k + 1 operators (OP(r) = k + 1),
where k + 1 >= 1.
Case 1) r = r1 + r2
Since OP(r) = k +1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive
hypothesis there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and
L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state.
Construct M as:
ε
q0
q1
M1
f1
ε
ε
ε
q2
M2
qf
f2
11
Case 2)
r = r1r2
Since OP(r) = k+1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis
there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2).
Furthermore, both M1 and M2 have exactly one final state.
Construct M as:
q1
Case 3)
ε
f1
M1
q2
M2
f2
r = r1*
Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive hypothesis there exists
an NFA-ε machine M1 such that L(M1) = L(r1). Furthermore, M1 has exactly one final state.
ε
Construct M as:
q0
ε
q1
M1
f1
ε
qf
12
ε
•
Example:
r = 0(0+1)*
r = r1r2
r1 = 0
r2 = (0+1)*
r2 = r3*
q0
1
q1
r3 = 0+1
r3 = r4 + r5
r4 = 0
r5 = 1
13
•
Example:
r = 0(0+1)*
r = r1r2
r1 = 0
r2 = (0+1)*
r2 = r3*
q0
1
q2
0
q1
r3 = 0+1
r3 = r4 + r5
q3
r4 = 0
r5 = 1
14
•
Example:
r = 0(0+1)*
r = r1r2
r1 = 0
r2 = (0+1)*
r2 = r3*
r3 = 0+1
r3 = r 4 + r 5
ε
q0
1
q1
ε
q5
q4
ε
q2
0
q3
ε
r4 = 0
r5 = 1
15
•
Example:
r = 0(0+1)*
r = r1r2
r1 = 0
ε
r2 = (0+1)*
r2 = r3*
r3 = 0+1
r3 = r4 + r5
r4 = 0
ε
q6
ε
q0
1
q1
ε
q4
q5
ε
q2
0
q3
ε
qf
ε
ε
r5 = 1
16
•
Example:
r = 0(0+1)*
q8
r = r1r2
0
q9
r1 = 0
ε
r2 = (0+1)*
r2 = r3*
r3 = 0+1
r3 = r4 + r5
r4 = 0
ε
q6
ε
q0
1
q1
ε
q4
q5
ε
q2
0
q3
ε
qf
ε
ε
r5 = 1
17
•
Example:
r = 0(0+1)*
0
q8
r = r1r2
r1 = 0
q9
ε
ε
r2 = (0+1)*
r2 = r3*
r3 = 0+1
r3 = r4 + r5
r4 = 0
ε
q6
ε
q0
1
q1
ε
q4
q5
ε
q2
0
q3
ε
qf
ε
ε
r5 = 1
18
Definitions Required to Convert a DFA
to a Regular Expression
•
Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn}, and
define:
Ri,j = { x | x is in Σ* and δ(qi,x) = qj}
Ri,j is the set of all strings that define a path in M from qi to qj.
•
Note that states have been numbered starting at 1, not 0!
19
•
Example:
q2
1
q4
0
0
q1
1
0
1
1
q3
0
q5
1
0
R2,3 = {0, 001, 00101, 011, …}
R1,4 = {01, 00101, …}
R3,3 = {11, 100, …}
20
•
In words: Rki,j is the set of all the strings that define a path in M from qi to qj
but that passes through no state numbered greater than k.
•
Definition:
Rki,j = { x | x is in Σ* and δ(qi,x) = qj, and for no u where 1  |u| < |x| and
x = uv there is no case such that δ(qi,u) = qp where p>k}
•
Note that it may be true that i>k or j>k, only the intermediate states on the path
from i to j may not be >k.
21
•
Example:
q2
1
q4
0
0
q1
1
0
1
1
q3
0
q5
1
0
R42,3 = {0, 1000, 011, …}
R12,3 = {0}
111 is not in R42,3 because it goes via q5
111 is not in R12,3
101 is not in R12,3
R52,3 = R2,3 any state may be on the path now
22
•
Obeservations:
1) Rni,j = Ri,j
2) Rk-1i,j is a subset of Rki,j
3) L(M) =  Rn1,q =
qF
4) R0i,j =

R1,q
qF
{a |  (qi , a)  q j }, orPhi i  j

{a |  (qi , a)  q j }{ } i  j
5) Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j
Easily computed from the DFA!
Now, you see the purpose of
introducing k:
So that we can write it as a RE
23
•
Notes on 5:
5) Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j
•
Consider paths represented by the strings in Rki,j :
qi
qj
:
•
IF x is a string in Rki,j then no state numbered > k may passed through when processing
x and either:
– qk is not passed through, i.e., x is in Rk-1i,j
– qk is passed through one or more times, i.e., x is in Rk-1i,k (Rk-1k,k)* Rk-1k,j
24
•
Lemma 2: Let M = (Q, Σ, δ, q1, F) be a DFA. Then there exists a regular expression r
such that L(M) = L(r).
•
Proof:
First we will show (by induction on k) that for all i,j, and k, where 1  i,j  n
and 0  k  n, that there exists a regular expression r such that L(r) = Rki,j .
Basis: k=0
R0i,j contains single symbols, one for each transition from qi to qj, and possibly ε if i=j.
case 1) No transitions from qi to qj and i != j
r0i,j = Ø
case 2) At least one (m  1) transition from qi to qj and i != j
r0i,j = a1 + a2 + a3 + … + am
where δ(qi, ap) = qj,
for all 1  p  m
25
case 3) No transitions from qi to qj and i = j
r0i,j = ε
case 4) At least one (m  1) transition from qi to qj and i = j
r0i,j = a1 + a2 + a3 + … + am + ε
where δ(qi, ap) = qj
for all 1  p  m
Inductive Hypothesis:
Suppose that Rk-1i,j can be represented by the regular expression rk-1i,j for all
1  i,j  n, and some k1.
Inductive Step:
Consider Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j . By the inductive hypothesis
there exist regular expressions rk-1i,k , rk-1k,k , rk-1k,j , and rk-1i,j generating Rk-1i,k ,
Rk-1k,k , Rk-1k,j , and Rk-1i,j , respectively. Thus, if we let
rki,j = rk-1i,k (rk-1k,k)* rk-1k,j + rk-1i,j
then rki,j is a regular expression generating Rki,j ,i.e., L(rki,j) = Rki,j .
26
•
Finally, if F = {qj1, qj2, …, qjr}, then
rn1,j1 + rn1,j2 + … + rn1,jr
is a regular expression generating L(M).•
•
Note: not only does this prove that the regular expressions generate the regular
languages, but it also provides an algorithm for computing it!
27
•
Example:
1
q1
0
0
k=0
rk1,1
rk1,2
rk1,3
rk2,1
rk2,2
rk2,3
rk3,1
rk3,2
rk3,3
q2
1
q3
First table column is
computed from the
DFA.
0/1
k=1
k=2
ε
0
1
0
ε
1
Ø
0+1
ε
28
•
All remaining columns are computed from the previous column using the
formula.
1
r12,3 = r02,1 (r01,1 )* r01,3 + r02,3
= 0 (ε)* 1 + 1
= 01 + 1
q1
0
0
rk1,1
rk1,2
rk1,3
rk2,1
rk2,2
rk2,3
rk3,1
rk3,2
rk3,3
k=0
k=1
ε
ε
0
1
0
0
1
0
ε
ε + 00
1
Ø
0+1
1 + 01
Ø
0+1
ε
ε
q2
1
q3
0/1
k=2
29
1
r21,3 = r11,2 (r12,2 )* r12,3 + r11,3
= 0 (ε + 00)* (1 + 01) + 1
= 0*1
q1
0
q2
0
rk1,1
rk1,2
rk1,3
rk2,1
rk2,2
rk2,3
rk3,1
rk3,2
rk3,3
k=0
k=1
k=2
ε
0
1
0
ε
1
Ø
0+1
ε
ε
0
1
0
ε + 00
1 + 01
Ø
0+1
ε
(00)*
0(00)*
0*1
0(00)*
(00)*
0*1
(0 + 1)(00)*0
(0 + 1)(00)*
ε + (0 + 1)0*1
1
q3
0/1
30
•
To complete the regular expression, we compute:
r31,2 + r31,3
rk1,1
rk1,2
rk1,3
rk2,1
rk2,2
rk2,3
rk3,1
rk3,2
rk3,3
k=0
k=1
k=2
ε
0
1
0
ε
1
Ø
0+1
ε
ε
0
1
0
ε + 00
1 + 01
Ø
0+1
ε
(00)*
0(00)*
0*1
0(00)*
(00)*
0*1
(0 + 1)(00)*0
(0 + 1)(00)*
ε + (0 + 1)0*1
31
•
Theorem: Let L be a language. Then there exists an a regular expression r
such that L = L(r) if and only if there exits a DFA M such that L = L(M).
•
Proof:
(if) Suppose there exists a DFA M such that L = L(M). Then by Lemma 2
there exists a regular expression r such that L = L(r).
(only if) Suppose there exists a regular expression r such that L = L(r). Then
by Lemma 1 there exists a DFA M such that L = L(M).•
•
Corollary: The regular expressions define the regular languages.
•
Note: The conversion from a regular expression to a DFA and a program
accepting L(r) is now complete, and fully automated!
32

similar documents