Report

Randomized Algorithms CS648 Lecture 11 • BPWM : The final algorithm • Hashing – (part 1) 1 Boolean Product Witness Matrix (BPWM) Problem: Given two Boolean matrices A and B, and their Boolean product C, compute a matrix , such that: stores a witness for each (, ) with = 1. Conclusion of last lecture Theorem: Given two Boolean matrices and , and integer , there is a randomized Monte Carlo algorithm to compute witnesses for all those pairs which have witnesses. • The running time is O( ). • The error probability is at most − ? Questions: 1. How to compute witnesses for all pairs in O( ) time. 2. How to convert Monte Carlo to Las Vegas ? Randomized algorithm for Computing Witnesses for all pairs with exactly witnesses //The pseudo code for sampling the (indices of) columns Sample(, ) { ∅; For each ∈ [] do: add to with probability ; return ; } Randomized algorithm for Computing Witnesses for all pairs with exactly witnesses Compute-Witnesses(,,) { { Sample(, 1/); For each , ∈ [] if ∈ then ∙ ; else 0; ′ ∙ ; For each , ∈ [] ′ If is a witness for (, ) ′ } } Time complexity: O( ) 1 Probability of finding a witness for a single pair (, ): ?? > Focus on a single pair (,) Let there be witnesses for (,) such that ≤ < 2 Question: If each column is selected independently with = 1/ , what is the probability that exactly one out of witnesses for (,) survives ? Answer: = 1 1 1 −1 1− 1 1− 1 2 ≥ ≥ 1− 1 ≥ 2 = 0.135 … > 1 8 Randomized algorithm for Computing Witnesses for all pairs with witness count ∈ [, ) Compute-Witnesses(,,) { { Sample(, 1/); For each , ∈ [] if ∈ then ∙ ; else How to reduce the error probability to < ? 0; ′ ∙ ; For each , ∈ [] ′ If is a witness for (, ) Repeat the entire process 8/7 times. ′ } } Time complexity: O( ) 1 7 Probability of failing to find a witness for a single pair (, ): ??< 1 − < 8 8 Randomized algorithm for Computing Witnesses for all pairs with witness count ∈ [, ) Compute-Witnesses(,,) { Repeat / times { Sample(, 1/); For each , ∈ [] if ∈ then ∙ ; else 0; ′ ∙ ; For each , ∈ [] ′ If is a witness for (, ) ′ } } Time complexity: O( ) < Probability of failing to find a witness for a single pair (, ): ?? Randomized algorithm for Computing Witnesses for all pairs with witness count ∈ [, ) Compute-Witnesses(,,) { Repeat / times { Sample(, 1/); For each , ∈ [] if ∈ then ∙ ; else How to compute witnesses for all pairs in O( ) time ? 0; ′ ∙ ; For each , ∈ [] ′ If is a witness for (, ) Let there be pairs that have exactly witnesses. Apply Union theorem … ′ } } Time complexity: O( ) Prob. of failing to find a witness for any pair having witness count ∈ [, ) : ?? < < − Randomized algorithm for Computing Witnesses for all pairs Compute-Witnesses(,) { For = 1 to { Repeat / times { Sample(, 1/2 ); For each , ∈ [] if ∈ then ∙ ; else 0; ′ ∙ ; For each , ∈ [] ′ If is a witness for (, ) ′ } } } Time complexity: O( ) Question: What is Prob. that witness is not found for a given pair (,) ? Answer: ? < Question: What is Prob. that witness is not found for at least one pair ? Answer: < − How to If there is transform even a single to Laspair with nonzero witnesses Vegas but algowe ? fail to find even one, run the O( ) algo. Expected running time = O( ) Boolean Product Witness Matrix (BPWM) Theorem: Given two Boolean matrices A and B, and their Boolean product C, there exists a Las Vegas algorithm for computing a matrix , such that: stores a witness for each (, ) with = 1. The expected running time of the algorithm is O( ). Homework: Show that the running time of the algo is concentrated around O( ). (modify the algorithm if needed) Hashing (part 1) Problem Definition • = 1,2, … , called universe • ⊆ and = || • ≪ Examples: = 1018 , = 103 Aim Maintain a data structure for storing to support the search query : “Does ∈ ?” for any given ∈ . Solutions Solutions with worst case guarantees Solution for static : • Array storing in sorted order Solution for dynamic : • Height Balanced Search trees (AVL trees, Red-Black trees,…) Time per operation: O(log ), Space: O() Alternative: Time per operation: O(1), Space: O() Solutions used in practice with no worst case guarantees Hashing. Hashing • Hash table: : an array of size . • Hash function : [] Answering a Query: “Does ∈ ?” 1. (); 2. Search the list stored at []. Properties of : • computable in O(1) time. • Space required by : O(1). How many words needed to encode ? 0 1 ⋮ ⋮ − Elements of Collision Definition: Two elements , ∈ are said to collide under hash function if = Worst case time complexity of searching an item : No. of elements in colliding with . A Discouraging fact: No hash function can be found which is good for all . Proof: At least / elements from are mapped to a single index in . 0 1 ⋮ ⋮ − Collision Definition: Two elements , ∈ are said to collide under hash function if = Worst case time complexity of searching an item : No. of elements in colliding with . A Discouraging fact: No hash function can be found which is good for all . Proof: At least / elements from are mapped to a single index in . 0 1 ⋮ ⋯ ⋮ − / Hashing • A very popular heuristic since 1950’s • Achieves O(1) search time in practice • Worst case guarantee on search time: O() Question: Can we have a hashing ensuring • O(1) worst case guarantee on search time. • O() space. • Expected O() preprocessing time. The following result gave an answer in affirmative Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984. WHY DOES HASHING WORK SO WELL IN PRACTICE ? Why does hashing work so well in Practice ? Question: What is the simplest hash function : [] ? Answer: = Hashing works so well in practice because the set is usually a uniformly random subset of . Let us give a theoretical reasoning for this fact. Why does hashing work so well in Practice ? Let 1 , 2 , … , denote elements selected randomly uniformly from to form . Question: What is expected number of elements colliding with 1 ? Answer: Let 1 takes value . P( collides with 1 ) = ?? 1 2 ⋮ − + How many possible values can take ? − 1 How many possible values can collide with ? + 2 + 3 ⋮ m Why does hashing work so well in Practice ? Let 1 , 2 , … , denote elements selected randomly uniformly from to form . Question: What is expected number of elements colliding with 1 ? Answer: Let 1 takes value . P( collides with 1 ) = 1 2 ⋮ − + −1 −1 + 2 Expected number of elements of colliding with 1 = = −1 −1 + 3 ⋮ ( − 1) = 1 for = () m Values which may collide with under the hash function = Why does hashing work so well in Practice ? Conclusion 1. = works so well because for a uniformly random subset of , the expected number of collision at an index of is O(1). It is easy to fool this hash function such that it achieves O(s) search time. (do it as a simple exercise). This makes us think: “How can we achieve worst case O(1) search time for a given set .” HOW TO ACHIEVE WORST CASE O(1) SEARCH TIME Key idea to achieve worst case O(1) search time Observation: Of course, no single hash function is good for every possible . But we may strive for a hash function which is good for a given . A promising direction: Find out a family of hash functions H such that • For any given , many of them are good. The notion of goodness is captured formally by Universal hash family in the following slide. • Select a function randomly from H and try for . UNIVERSAL HASH FAMILY Universal Hash Family Definition: A collection of hash-functions is said to be c-universal if there exists a constant such that for any , ∈ , ∈ = ≤ Question: Does there exist a Universal hash family whose hash functions have a compact encoding? Answer: Yes and it is very simple too STATIC HASHING WORST CASE O(1) SEARCH TIME The Journey One Milestone in Our Journey: • A perfect hash function using hash table of size O( 2 ) Tools Needed: • Universal Hash Family where is a small constant • Elementary Probability Spend at least 30 minutes today to see how the milestone can be achieved. It is simple and beautiful