Randomized Algorithms CS648

Report
Randomized Algorithms
CS648
Lecture 11
• BPWM : The final algorithm
• Hashing – (part 1)
1
Boolean Product Witness Matrix (BPWM)
Problem: Given two Boolean matrices A and B, and their Boolean product C,
compute a matrix , such that:
 stores a witness for each (, ) with  = 1.
Conclusion of last lecture
Theorem:
Given two Boolean matrices  and , and integer ,
there is a randomized Monte Carlo algorithm to compute witnesses
for all those pairs which have  witnesses.
• The running time is O(  ).
• The error probability is at most

−
?
Questions:
1. How to compute witnesses for all pairs in O(   ) time.
2. How to convert Monte Carlo to Las Vegas ?
Randomized algorithm for Computing Witnesses for
all pairs with exactly  witnesses
//The pseudo code for sampling the (indices of) columns
Sample(, )
{  ∅;
For each  ∈ [] do:
add  to  with probability ;
return ;
}
Randomized algorithm for Computing Witnesses for
all pairs with exactly  witnesses
Compute-Witnesses(,,)
{
{  Sample(, 1/);
For each ,  ∈ []
if  ∈  then   ∙  ;
else
 0;
′   ∙ ;
For each ,  ∈ []
′
If 
is a witness for (, )
′
  
}
}
Time complexity: O( )
1
Probability of finding a witness for a single pair (, ): ??
>

Focus on a single pair (,)
Let there be  witnesses for (,) such that  ≤  < 2
Question: If each column is selected independently with  = 1/ ,
what is the probability that exactly one out of  witnesses for (,) survives ?
Answer:
=
 1
1 
1 −1

1−
1 
1−
1 2
≥
≥ 1−
1
≥ 2
= 0.135 …
>
1
8
Randomized algorithm for Computing Witnesses for
all pairs with witness count ∈ [, )
Compute-Witnesses(,,)
{
{  Sample(, 1/);
For each ,  ∈ []
if  ∈  then   ∙  ;
else
How to reduce the error

probability to <  ?
 0;
′   ∙ ;
For each ,  ∈ []
′
If 
is a witness for (, )
Repeat the entire process
  8/7  times.
′
  
}
}
Time complexity: O( )
1 7
Probability of failing to find a witness for a single pair (, ): ??< 1 − <
8 8
Randomized algorithm for Computing Witnesses for
all pairs with witness count ∈ [, )
Compute-Witnesses(,,)
{ Repeat   /  times
{  Sample(, 1/);
For each ,  ∈ []
if  ∈  then   ∙  ;
else
 0;
′   ∙ ;
For each ,  ∈ []
′
If 
is a witness for (, )
′
  
}
}
Time complexity: O(  )

< 
Probability of failing to find a witness for a single pair (, ): ??

Randomized algorithm for Computing Witnesses for
all pairs with witness count ∈ [, )
Compute-Witnesses(,,)
{ Repeat   /  times
{  Sample(, 1/);
For each ,  ∈ []
if  ∈  then   ∙  ;
else
How to compute
witnesses for all pairs
in O(   ) time ?
 0;
′   ∙ ;
For each ,  ∈ []
′
If 
is a witness for (, )
Let there be  pairs that
have exactly  witnesses.
Apply Union theorem …
′
  
}
}
Time complexity: O(  )

Prob. of failing to find a witness for any pair having witness count ∈ [, ) : ??
<  <

−
Randomized algorithm for Computing Witnesses for
all pairs
Compute-Witnesses(,)
{ For  = 1 to   
{ Repeat   /  times
{  Sample(, 1/2 );
For each ,  ∈ []
if  ∈  then   ∙  ;
else
 0;
′   ∙ ;
For each ,  ∈ []
′
If 
is a witness for (, )
′
  
}
}
}
Time complexity: O(   )
Question:
What is Prob. that witness is not found for a
given pair (,) ?
Answer: ? < 


Question:
What is Prob. that witness is not found for at
least one pair ?
Answer: < 
−
How to
If there is transform
even a single
to Laspair with
nonzero witnesses
Vegas but
algowe
? fail to find
even one, run the O( ) algo.
Expected running time = O(   )
Boolean Product Witness Matrix (BPWM)
Theorem: Given two Boolean matrices A and B, and their Boolean product C,
there exists a Las Vegas algorithm for computing a matrix , such that:
 stores a witness for each (, ) with  = 1.
The expected running time of the algorithm is O(   ).
Homework:
Show that the running time of the algo is concentrated around O(   ).
(modify the algorithm if needed)
Hashing
(part 1)
Problem Definition
•  = 1,2, … ,  called universe
•  ⊆  and  = ||
• ≪
Examples:
 = 1018 ,  = 103
Aim
Maintain a data structure for storing  to support the search query :
“Does  ∈  ?” for any given  ∈ .
Solutions
Solutions with worst case guarantees
Solution for static  :
• Array storing  in sorted order
Solution for dynamic  :
• Height Balanced Search trees (AVL trees, Red-Black trees,…)
Time per operation: O(log ),
Space: O()
Alternative:
Time per operation: O(1),
Space: O()
Solutions used in practice with no worst case guarantees
Hashing.
Hashing
• Hash table:
: an array of size .
• Hash function
 :  []
Answering a Query: “Does  ∈  ?”
1. ();
2. Search the list stored at [].
Properties of  :
•   computable in O(1) time.
• Space required by : O(1).
How many words
needed to encode  ?

0
1
⋮
⋮
−
Elements of 
Collision
Definition: Two elements ,  ∈  are
said to collide under hash function  if
  = 
Worst case time complexity of searching
an item  :
No. of elements in  colliding with .
A Discouraging fact:
No hash function can be found
which is good for all .
Proof:
At least / elements from  are
mapped to a single index in .

0
1
⋮
⋮
−
Collision
Definition: Two elements ,  ∈  are
said to collide under hash function  if
  = 
Worst case time complexity of searching
an item  :
No. of elements in  colliding with .
A Discouraging fact:
No hash function can be found
which is good for all .
Proof:
At least / elements from  are
mapped to a single index in .

0
1
⋮
⋯
⋮
−
/
Hashing
• A very popular heuristic since 1950’s
• Achieves O(1) search time in practice
• Worst case guarantee on search time: O()
Question: Can we have a hashing ensuring
• O(1) worst case guarantee on search time.
• O() space.
• Expected O() preprocessing time.
The following result gave an answer in affirmative
Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table
with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3),
1984.
WHY DOES HASHING WORK SO
WELL IN PRACTICE ?
Why does hashing work so well in Practice ?
Question: What is the simplest hash function  :  [] ?
Answer:   =   
Hashing works so well in practice because the set  is usually a uniformly
random subset of .
Let us give a theoretical reasoning for this fact.
Why does hashing work so well in Practice ?
Let 1 , 2 , … ,  denote  elements
selected randomly uniformly from  to
form .
Question:
What is expected number of elements
colliding with 1 ?
Answer: Let 1 takes value .
P( collides with 1 ) = ??
1
2
⋮
−

+
How many possible values
can  take ?  − 1
How many possible values
can collide with  ?
 + 2
 + 3
⋮
m
Why does hashing work so well in Practice ?
Let 1 , 2 , … ,  denote  elements
selected randomly uniformly from  to
form .
Question:
What is expected number of elements
colliding with 1 ?
Answer: Let 1 takes value .
P( collides with 1 ) =


1
2
⋮
−

+
−1
−1
 + 2
Expected number of elements of 
colliding with 1 =
=


−1
−1
 + 3
⋮
( − 1)
=  1 for  = ()
m
Values which may
collide with 
under the hash
function   =
  
Why does hashing work so well in Practice ?
Conclusion
1.
  =    works so well because
for a uniformly random subset of ,
the expected number of collision at an index of  is O(1).
It is easy to fool this hash function such that it achieves O(s) search time.
(do it as a simple exercise).
This makes us think:
“How can we achieve worst case O(1) search time for a given set .”
HOW TO ACHIEVE WORST CASE
O(1) SEARCH TIME
Key idea to achieve worst case O(1) search time
Observation: Of course, no single hash function is good for every possible .
But we may strive for a hash function which is good for a given .
A promising direction:
Find out a family of hash functions H such that
• For any given , many of them are good.
The notion of goodness is captured
formally by Universal hash family in
the following slide.
• Select a function randomly from H and try for .
UNIVERSAL HASH FAMILY
Universal Hash Family
Definition: A collection  of hash-functions is said to be c-universal
if there exists a constant  such that for any ,  ∈ ,
∈    =  
≤


Question: Does there exist a Universal hash family whose hash functions have a compact
encoding?
Answer:
Yes and it is very simple too 
STATIC HASHING
WORST CASE O(1) SEARCH TIME
The Journey
One Milestone in Our Journey:
• A perfect hash function using hash table of size O( 2 )
Tools Needed:
• Universal Hash Family where  is a small constant
• Elementary Probability
Spend at least 30 minutes today to
see how the milestone can be
achieved. It is simple and beautiful 

similar documents