### ppt - Department of Computer Science and Engineering

```CSCI 3130: Formal Languages and
Automata Theory
Tutorial 5
Hung Chun Ho
Office: SHB 1026
Department of Computer Science & Engineering
1
Agenda
• Cocke-Younger-Kasami (CYK) algorithm
– Parsing CFG in normal form
• Pushdown Automata (PDA)
– Design
2
CYK Algorithm
Bottom-up Parsing for normal form
3
Cocke-Younger-Kasami Algorithm
• Used to parse context-free grammar in
Chomsky normal form (or simply normal form)
Normal Form
Example
Every production is of type
S  AB
1) X  YZ
A  CC | a | c
2) X  a
B  BC | b
3) S  ε
C  CB | BA | c
4
CYK Algorithm - Idea
• = Algorithm 2 in Lecture Note (10L8.pdf)
• Idea: Bottom Up Parsing
• Algorithm:
Given a string s of length N
For k = 1 to N
For every substring of length k
Determine what variable(s) can derive it
5
CYK Algorithm - Example
• CFG
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
• Parse abbc
6
CYK Algorithm – Idea (1)
• Idea: We parse the strings in this order:
• Length-1 substring
abbc
abbc
abbc
abbc
7
CYK Algorithm – Idea (1)
• Idea: We parse the strings in this order:
• Length-2 substring
abbc
abbc
abbc
8
CYK Algorithm – Idea (1)
• Idea: We parse the strings in this order:
• Length-3 substring
abbc
abbc
• Length-4 substring
abbc
• Done!
9
CYK Algorithm – Idea (2)
• Idea: Parsing of longer substrings depends on
parsing of shorter substrings
• Example: abb may be decomposed as
– ab + b
– a + bb
• If we know how to parse ab and b (or, a and
bb) then we know how to parse abb
10
CYK Algorithm – Substring
• Denote sub(i, j) := substring with start index =
i and end index = j
• Example: For abbc, sub(2,4) = bbc
• This notation is not to complicate things, but
just for the sake of convenience in the
following discussion…
11
CYK Algorithm – Table
• Each cell corresponds to a substring
• Store variables deriving the substring
Length of Substring
Substring of length = 3
Starting with index = 2
i.e., sub(2,3) = bbc
a
b
b
Start Index of Substring
c
12
CYK Algorithm – Simulation
• Base Case : length = 1
– The possible choices of variable(s) can be known
by scanning through each production
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
A
B
B
A, C
a
b
b
c
13
CYK Algorithm – Simulation
• Loop : length = 2
– For each substring of length 2
• Decompose into shorter substrings
• Check cells below it
S  AB
A  CC | a | c
ab
Let’s parse this substring
B  BC | b
C  CB | BA | c
A
B
B
A, C
a
b
b
c
14
CYK Algorithm – Simulation
• For sub(1,2) = ab, it can be decomposed:
– ab = a + b
= sub(1,1) + sub(2,2)
– Possible choices: AB
– Scan rules : S
S  AB
A  CC | a | c
S
B  BC | b
C  CB | BA | c
A
B
B
A, C
a
b
b
c
15
CYK Algorithm – Simulation
• For sub(2,3) = bb, it can be decomposed:
– bb = b + b
= sub(2,2) + sub(3,3)
– Possible choices: BB
– Scan rules : ∅
No suitable rules are found
 The CFG cannot parse
this substring
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
S
∅
A
B
B
A, C
a
b
b
c
16
CYK Algorithm – Simulation
• For sub(3,4) = bc, it can be decomposed:
– bc = b + c
= sub(3,3) + sub(4,4)
– Possible choices: BA, BC
– Scan rules : B, C
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
S
∅
B, C
A
B
B
A, C
a
b
b
c
17
CYK Algorithm – Simulation
• For sub(1,3) = abb:
– abb = ab + b
= sub(1,2) + sub(3,3)
– Possible choices: SB
– Scan rules : ∅
No suitable variables found yet
But, there is another way to
decompose the string
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
S
∅
B, C
A
B
B
A, C
a
b
b
c
18
CYK Algorithm – Simulation
• For sub(1,3) = abb:
– abb = a + bb
= sub(1,1) + sub(2,3)
– Possible choices: ∅
– Scan rules
Cant parse smaller substring
 Cant parse the string
 No need to scan rules
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
S
∅
B, C
A
B
B
A, C
a
b
b
c
19
CYK Algorithm – Simulation
• For sub(1,3) = abb:
– abb = sub(1,1) + sub(2,3) gives no valid parsing
– abb = sub(1,2) + sub(3,3) gives no valid parsing
• Cannot parse
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
∅
S
∅
B, C
A
B
B
A, C
a
b
b
c
20
CYK Algorithm – Simulation
• For sub(2,4) = bbc:
– bbc = sub(2,2) + sub(3,4)
 Variable: B
• Possible choices: BB, BC
– bbc = sub(2,3) + sub(4,4)
• Possible choices: ∅
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
∅
B
S
∅
B, C
A
B
B
A, C
a
b
b
c
21
CYK Algorithm – Simulation
• Finally, for sub(1,4) = abbc:
– Possible choices:
This cell represents the original
string, and it consists S
 abbc is in the language
• AB , SB, SC
– Variables:
•S
S  AB
A  CC | a | c
B  BC | b
C  CB | BA | c
∅
B
S
∅
B, C
A
B
B
A, C
a
b
b
c
22
CYK Algorithm – Parse Tree
• abbc is in the language!
• How to obtain the parse tree?
– Tracing back the derivations:
• sub(1,4) is derived using SAB from sub(1,1) and
sub(2,4)
• sub(1,1) is derived using Aa
• sub(2,4) is derived using BBC from sub(2,2)
and sub(3,4)
•…
• So, record also the used derivations!
23
CYK Algorithm – Parse Tree
• Obtained from the table
S
∅
B
S
∅
B, C
A
B
B
A, C
a
b
b
c
24
CYK Algorithm – Conclusion
• A bottom up parsing algorithm
– Dynamic Programming
– Solution of a subproblem (parsing of a substring)
depends on that of smaller subproblems
• Before employing CYK Algorithm, convert the
grammar into normal form
– Remove ε-productions
– Remove unit-productions
25
CYK Algorithm – Detailed
D = “On input w = w1w2…wn:
If w = ε, and S  ε is rule, Accept
For i = 1 to n:
For each variable A:
Test whether A  b is a rule, where b = wi.
If so, place A in table(i, i).
For l = 2 to n:
For i = 1 to n – l + 1:
Let j = i + l – 1,
For k = i to j – 1:
For each rule A  BC:
If table(i,k) contains B and table(k+1, j) contains C
Put A in table(i, j)
If S is in table (1,n), accept. Otherwise, reject.”
26
Pushdown Automata
NFA with infinite memory/states
27
Pushdown Automata
• PDA ~= NFA, with a stack of memory
• Transition:
– NFA – Depends on input
– PDA – Depends on input and top of stack
(possibly ε)
• Push a symbol to stack
(possibly ε)
• Pop a symbol to stack
• Read a terminal on string (possibly ε)
• Transitions are non-deterministic
28
Pushdown Automata and NFA
• Accept:
– NFA – Go to an Accept state
– PDA – Go to an Accept state
29
PDA – Example 1
• Given the following language:
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
• Design a PDA for it
30
PDA – Example 1 - Idea
• Idea: The input has two sections
– First half
• All ‘0’s
– Second half
• All ‘1’s
• #‘1 depends on #‘0’
– #‘0’ ≤ #‘1’ ≤ #‘0’ × 2
31
PDA – Example 1 – Solution
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
1,X/e
q3
q2
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
32
PDA – Example 1 – Explain
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
q3
1,X/e
q2
• Let’s try some string… w = 00111
– See white board for simulation…
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
33
PDA – Example 1 – Explain
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
1,X/e
q3
q2
• Indicates the start of parsing
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
34
PDA – Example 1 – Explain
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
1,X/e
q3
q2
• This part saves information about #‘0’
• # ‘X’ in stack = #‘0’
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
35
PDA – Example 1 – Explain
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
1,X/e
q3
q2
• This part accounts for #‘1’
– #‘0’ ≤ #‘1’ ≤ #‘0’ × 2
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
36
PDA – Example 1 – Explain
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
1,X/e
q3
q2
• Consume one ‘X’ and eats one ‘1’
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
37
PDA – Example 1 – Explain
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
1,X/e
q3
q2
• Consume one ‘X’ and eats two ‘1’
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
38
PDA – Example 1 – Explain
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
1,X/e
q3
q2
• Consume one ‘X’, and then
– eats one ‘1’, or
– eat two ‘1’
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
39
PDA – Example 1 – Explain
• Solution:
1,X/e
0,e/X
e,e/\$
e,e/e
q1
e,\$/e
q0
1,X/X
1,X/e
q3
q2
• Indicates the end of parsing
L = {0i1j: i ≤ j ≤ 2i, i=0,1,…},
S = {0, 1}
40
PDA – Example 2
• Given the following language:
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l },
where the alphabet Σ= {a, b, c, d}
• Design a PDA for it
41
PDA – Example 2 – Idea
• Idea:
– Sequentially read (multiple) ‘a’, ‘b’, ‘c’ and ‘d’
– Maintain:
• #‘a’ + #‘c’
• #‘b’ + #‘d’
– If these numbers equal
• Accept
42
PDA – Example 2 – Solution
• Solution:
b,X/e
a,e/X
e,e/\$ q
1
c,\$/\$X
c,X/XX
e,e/e
q2
b,\$/\$Y
b,Y/YY
e,e/e
q3
c,Y/e
e,e/e
d,X/e
q4
e, \$ /e
q5
d,\$/\$Y
d,Y/YY
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l },
where the alphabet Σ= {a, b, c, d}
43
PDA – Example 2 – Explain
• Solution:
b,X/e
a,e/X
e,e/\$ q
1
c,\$/\$X
c,X/XX
e,e/e
q2
b,\$/\$Y
e,e/e
q3
c,Y/e
b,Y/YY
start
a
b
e,e/e
d,X/e
q4
e, \$ /e
q5
d,\$/\$Y
d,Y/YY
c
d
end
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l },
where the alphabet Σ= {a, b, c, d}
44
PDA – Example 2 – Explain
• Solution:
b,X/e
a,e/X
e,e/\$ q
1
c,\$/\$X
c,X/XX
e,e/e
q2
b,\$/\$Y
b,Y/YY
e,e/e
q3
c,Y/e
d,X/e
e,e/e
q4
e, \$ /e
q5
d,\$/\$Y
d,Y/YY
• Each X in stack = An extra a or c
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l },
where the alphabet Σ= {a, b, c, d}
45
PDA – Example 2 – Explain
• Solution:
b,X/e
a,e/X
e,e/\$ q
1
c,\$/\$X
c,X/XX
e,e/e
q2
b,\$/\$Y
b,Y/YY
e,e/e
q3
c,Y/e
d,X/e
e,e/e
q4
e, \$ /e
q5
d,\$/\$Y
d,Y/YY
• Each Y in stack = An extra b or d
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l },
where the alphabet Σ= {a, b, c, d}
46
PDA – Example 2 – Explain
• Solution:
b,X/e
a,e/X
e,e/\$ q
1
c,\$/\$X
c,X/XX
e,e/e
q2
b,\$/\$Y
b,Y/YY
e,e/e
q3
c,Y/e
e,e/e
d,X/e
q4
e, \$ /e
q5
d,\$/\$Y
d,Y/YY
• X and Y ‘cancel’ each other
• The stack contains only X’s or only Y’s
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l },
where the alphabet Σ= {a, b, c, d}
47
PDA – Example 2 – Explain
• Solution:
b,X/e
a,e/X
e,e/\$ q
1
c,\$/\$X
c,X/XX
e,e/e
q2
b,\$/\$Y
e,e/e
q3
c,Y/e
b,Y/YY
e,e/e
d,X/e
q4
e, \$ /e
q5
d,\$/\$Y
d,Y/YY
• No X’s and no Y’s means
– #a + #c = #b + #d  Accept
L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l },
where the alphabet Σ= {a, b, c, d}
48
```