PPT

```Theory of Computing
Lecture 24
MAS 714
Hartmut Klauck
Size of Automata?
• We know L is regular, if there is a DFA with a
finite number of states for L
• Minimum number of states is the MyhillNerode Index
• Questions:
– How can we find a minimal DFA for L?
– Is it unique (up to renaming states)?
– Are NFA smaller than DFA?
– Are two-way DFA smaller than DFA?
Question 4
• Consider the language
L={xi : x2{0,1}n and i2{0,1} log n and xi=1}
– n is fix, this is a finite language
• It is easy to see that there are 2n rows in the
communication matrix
– Rows labeled x contain the string on columns i=1…n
• Hence any DFA for L has 2n states.
• Can give a 2-way DFA with O(n2) states:
– go to the right end of the input, read i into the state,
move n-i steps left, accept if there is a 1
Question 3
• Consider L={xy: x,y 2 {0,1}n, x  y}
– This is a FINITE language, n is fixed
• Exercise: there is an NFA with O(n2) states for L
• DFA-size: the matrix has more than 2n distinct
rows
– DFA size is exponential
• Hence NFA can be exponentially smaller than
DFA for some languages
• Example where they are not: complement of L
2) Uniqueness
• Uniqueness of the minimal DFA (up to vertex
names) follows from the Myhill-Nerode
characterization
• The optimal DFA has exactly one state for each
set of equal rows of the comm. matrix
• Edges are also determined uniquely
1) Optimal DFA
• Theorem: For every regular L there is a unique DFA A
that has minimum size. Given a DFA M for L we can find
A in polynomial time.
• Corollary: Given DFA A,B we can decide in polynomial
time whether they compute the same language
– Minimize DFA, compare graphs (note that this is not an
instance of the (hard) graph isomorphism problem, since
we know the starting state and edge labels)
• Corollary: Given DFA A, we can decide in polynomial
time whether LA is empty
Note
• The corresponding questions for Turing
machines are undecidable
• These properties (easy to find unique
representation, easy to find small
representation, easy to check for emptiness
and easy to compare) make DFA a good datastructure for languages/Boolean functions
DFA Minimization
• All algorithms try to recover equivalence classes
in the Myhill-Nerode characterization
• I.e., try to find the smallest partition of the rows
of the communication matrix into equal rows
• Input: DFA with n states and alphabet size k
• Hopcroft: O(nk log n)
• Moore: O(n2k)
• Idea in both:
– Start from partition of states into accepting/rejecting
– Refine partition until no longer possible
Proof
• We are given a DFA M with n states and alphabet size k
• Want to find an optimal DFA A
• First: remove unreachable states
– unreachable means in the graph, cannot be reached from
q0 on any input
– Identify them with DFS from the starting state
• Equivalent states: q,q’ are equivalent, if all strings y,
starting from q or q’ lead to the same result (acc/rej)
– can assume |y|< n, otherwise y contains a loop in M
– equivalent states have the same row in the communication
matrix
A simple algorithm
• Algorithm:
– Remove unreachable states
– Build a n£n matrix D that containing blank entries
– For all rows q, and columns q’, where q2 F and q’
is not, set D[q,q’]=D[q’,q]=²
– Iterate [until nothing to do, at most n times]
• Loop over all letters a and all pairs q,q’ with D[q,q’]
blank
• If D[±(q,a),±(q’,a)] is not blank, set D[q,q’]=a
– Join states q,q’ that have D[q,q’] still blank
A simple algorithm
• D allows us to find a witness that q,q’ are not
equivalent: D[q,q’]=a, then find recursively the
witness y for ±(q,a) and ±(q’,a), witness for q,q’
is ay
– Follow ay from q and from q’, one leads to accept,
the other to reject
• Witnesses need not be longer than n-1
• If no witness exists, then q,q’ are equivalent
– by definition of equivalence
Time
• Each iteration takes time O(n2k)
• If q and q’ are not equivalent, a witness of
length no more than n-1 can be found, i.e.,
number of iterations is at most n-1
– typically much less: longest simple path in DFA
Correctness
• All q,q’ with D[q,q’] not blank are not equivalent
• Claim: algo will find a witness for non-equivalent q,q’
Induction over length of witness
– length 0: in first iteration
– Assume length k witnesses are found in k iterations
– If q,q’ are not equivalent, and there is a string ay of length k
such that ±(q,ay) accepts, ±(q’,ay) rejects, then some witness y’
for ±(q,a) and ±(q’,a) of length k-1 is found earlier
– ay’ is a witness and found in iteration k
• Claim: all equivalent states will never be declared not
equivalent
– Clear from the witness computed
Minimizing NFA?
• NFA are frequently smaller than DFA
• Can we minimize their size?
• Theorem:
– NFA-minimization is PSPACE-hard
– This remains true if the input is a DFA but we
search the smallest NFA
– Even approximation is hard
Conclusion
• DFA can be transformed into unique minimal
DFA in polynomial time
– Can then compare, decide emptiness etc.
• NFA and 2-way DFA/NFA can be exponentially
smaller, but cannot be minimized efficiently
• Limits of DFA:
– Communication method
Regular Languages (Definition 2)
• Simple recursive description of languages
• Definition: Regular expression over alphabet
§
– a2§ is a r.e.
– The empty word ² is a r.e.
– ; is a r.e. [empty set]
– R1[ R2 is regular [union] Notation: r1 | r2
– R1R2 is regular [concatenation]
– R1* is regular [Kleene]
Examples
•
•
•
•
•
•
0*10*: strings with one 1
§* 1§*: strings with at least one 1
(§§)*: even length
1*(011*)*: every 0 followed by a 1
(0*10*10*)*: even number of 1’s
0*|1*: contains only 0’s or only 1’s
Regular vs. DFA
• Definition: a language describable by a
regular expression is called regular
• Theorem: The set of regular languages is the
same set as the set of languages decidable by
DFA
• Proof:
• We show
– NFA can simulate regular expression
– Regular expression can be constructed from DFA
Regular to NFA
• Consider an NFA with ² transitions
– Edges can be labeled with ² (empty word)
• Exercise: NFA with ² transitions can be transformed into NFA,
same number of states
• R a regular expression
• Inductively find an NFA for R
• Basis cases: R=², R=a2§, R=;
– NFA is trivial (at most 3 states)
• Induction: Find NFA for Union, Concatenation,
Kleene Hull
Regular to NFA
• Union R1[ R2:
– NFA M1 and M2, starting state ²-move to start in
M1 and to start in M2
• Concatenation:
• Kleene:
DFA to regular
• Generalized DFA: edges can carry regular
generalized DFA with 1 edge
• Idea 2: dynamic programming type argument
• Ri,j,k regular expression for the set of strings
that lead from qi to qj using q0,…, qk only
DFA to regular
• R0,0,0=(a|b|c)* if there are edges labeled a,b,c
from q0 to q0
• Ri,i,0 = ² otherwise
• Ri,j,0 = a|b etc., if there are edges labeled a and b
from qi to qj
• Ri,j,k = Ri,j,k-1 | Ri,k,k-1 (Rk,k,k-1)* Rk,j,k-1
• Finally take union over all R0,i,n, where qi is
accepting
• Result: a reg. expr. for the DFA
• Works even for NFA
Conclusion
• Theoretical Computer Science
– Algorithms
– Computability
– Complexity
– Machine Models
– Formal Languages
– More: Cryptography, Information/Communication
Theory …
Topics:
• Algorithms:
–
–
–
–
–
Basic Algorithms [Sorting/Searching]
Graph Algorithms
Linear Programming
Algorithms for hard problems [Approximation]
• Other theory:
–
–
–
–
Hardness and reductions [eg. NP-completeness]
Computability [eg. Halting problem, Universality]
Time-Hierarchy [etc.]
Finite Automata
```