Report

CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011 Learning Goals • Describe the cuckoo hashing principle • Analyze the space and time complexity of cuckoo hashing • Apply the insert and lookup algorithms in a cuckoo hash table • Construct the graph for a cuckoo table Remember Graphs? • A set of nodes V = 1 , … , • A set of edges = 1 , 2 , … , −1 , • Here: – = 1 , 2 , 3 , 4 , 5 – = 1 , 2 , … 3 , 5 – = {1 , 2 , 3 , 4 , 5 , 6 } Graph Cycles • A graph cycle is a path of edges such that the first and last vertices are the same 1 , 2 , 5 , 3 , 4 , 1 Recall Hashing • A hash function ℎ() – Takes the target – Hashes x to a bucket 0 … − 1 • Perfect hashing is ideal: – O(1) lookup – O(1) insert • Perfect hashing is not realistic! Cuckoo Hashing: the idea • Remember the cuckoo bird? – Shares a nest with other species… – …then kicks the other species out! • Same idea with cuckoo hashing – When we insert , we “kick out” what occupies the nest, – Then finds a new, alternate home Why is this cool? • Perfect hashing guarantees – O(1) lookup, O(1) insert • Cuckoo hashing guarantees – O(1) lookup – O(1) insert** • Other hashing strategies can’t guarantee this! • Also, it’s an option for your final project ** There’s a caveat here, but we’ll see it later Cuckoo Hashing: Two Nests • Suppose we have TWO hash tables 1 , 2 – they each have a hash function ℎ1 , ℎ2 () – we prefer 1 , but if we have to move we’ll go to 2 – if we’re in 2 and have to move, we’ll go back to 1 • This is our collision strategy for cuckoo hashing – Different from linear probing/open addressing – Different from trees Cuckoo Hashing: Example • We want to insert • There are no conflicts anywhere ℎ2 () ℎ1 () x Cuckoo Hashing : Example • Now we want to insert • There are no conflicts anywhere x y Cuckoo Hashing : Example • To insert , ℎ1 = ℎ1 () • Move to ℎ2 () y oh no! x z Cuckoo Hashing : Example • Now we insert into ℎ1 () y x NOW we’re fine! z Cuckoo Hashing : Example • The final table after inserting , , in order y x z Why two tables? • Two tables, one for each hash function • Simple to visualize, simple to implement • But, why two? • One table works just as well! • Just as simple to implement (all one table) One Table Example • Let’s insert , , again, with ℎ1 , ℎ2 () • Again, ℎ1 preferred ℎ2 () ℎ1 () x One Table Example • Now insert • No conflicts, no problem ℎ2 () x ℎ1 () y One Table Example • Now insert • But, another conflict with : ℎ1 = ℎ1 () oh no! ℎ1 () x y ℎ2 () z One Table Example • First, move to ℎ2 () ℎ1 () x y ℎ2 () z One Table Example • Now we move to ℎ1 () x z y One Table Example • Final table after inserting , , in order x z y Graph Representation • How can we represent our table? • Why not a graph? – Nodes are every possible table entry – Edges are inserted entries • This is a directed graph • Direction from current location TO alternate location Graph Example • Remember our one-table example? x 1 z 2 y 3 4 1 2 3 4 Infinite Insert • Suppose we insert something, and we end up in an infinite loop – Or, “too many” displacements – Some pre-defined maximum based on table size Example: Loops • Remember our one-table example? x 1 z 2 y 3 4 1 2 3 4 Example: Loops • Let’s insert : no conflicts still x 1 z 2 y 3 4 w 1 2 3 4 Example: Loops • Now let’s insert : displace x 1 z 2 y 3 w 4 a 1 2 3 4 Example: Loops • Now is placed, and is displaced (put in 4) a 1 x 2 y 3 w 4 z 1 2 3 4 Example: Loops • Now is placed, and is displaced (put in 3) a 1 x 2 y 3 z 4 w 1 2 3 4 Example: Loops • Notice what happens to the graph • We keep going and going and going…. 1 2 3 4 Analysis: Loops • Remember infinite loops in a new insert? • In the graph, this is a closed loop – We might forever re-do the same displacements • The probability of getting a loop increases dramatically once we’ve inserted 2 elements – N is the number of buckets (size of table) – This is from the research on cuckoo hashing Analysis: Loops • What can we do once we get a loop? – Rebuild, same size (ok solution) – Double table size (better solution) • We’ll need new hash functions for both Analysis • Lookup has O(1) time – At MOST two places to look, ever – One location per hash function • Insert has amortized O(1) time – Think of this as “in the long run” – In practice we see O(1) time insert – You’ll see amortized analysis in CPSC 320 • Remember the “grass and trees” analysis? Lookup: The Code Return the position of (either ℎ1 or ℎ2 ()) Otherwise, return false lookup(x) return T[h1(x)] = x T[h2(x)] = x or Insert: The Code Given a table (array) T and item to insert: insert(x) if lookup(x) return; pos <- h1(x); for i <- 1 to M if T[pos] empty T[pos] <- x return; swap x and T[pos]; if pos = h1(x) pos <- h2(x) else pos <- h1(x) rehash(); insert(x); end // if it’s already here, done // store h1(x) // loop at most M times // if T[pos] empty, done // put x in T[pos] // now we’re displacing // if we couldn’t stop, rehash // then insert currently displaced Analysis: Load Factor • What is load? – The average fill factor (% full) the table is • What about cuckoo hash tables? – For two hash functions, load factor ≈ 50% • Remember loops? – For three hash functions, we get ≈ 91% • That’s pretty great, actually! More hash functions • What would this look like? • We would have three tables (simple case) – One hash function per table • Or, we would have two alternates (one table) More hash functions • What would this look like? • Each entry has TWO alternates, not one • ℎ1 , ℎ2 , ℎ3 () x z y More hash functions • When something comes in new (insert) – Put it in ℎ1 () • If it’s displaced, check ℎ2 () – If that’s full, go to ℎ3 () • To lookup, we just look in ℎ1 , ℎ2 , () or ℎ3 () – Still constant time! Even better load? • Currently we’ve only put one item per bucket • What if we had two cells per bucket? x,w z y,a Even better load? • Currently we’ve only put one item per bucket • What if we had two cells per bucket? • What about collision strategies? – Round-robin (cells take turns swapping out) – FIFO (oldest resident gets kicked out) Even better load? Links & Resources • http://en.wikipedia.org/wiki/Cuckoo_hashing • http://www.ru.is/faculty/ulfar/CuckooHash.pdf • http://www.it-c.dk/people/pagh/papers/cuckooundergrad.pdf • No neat animations on the internet…yet! – Possible personal project? – Brownie points? – Pre-coop project?