### CS 221 Guest lecture: Cuckoo Hashing

CS 221
Guest lecture: Cuckoo Hashing
Shannon Larson
March 11, 2011
Learning Goals
• Describe the cuckoo hashing principle
• Analyze the space and time complexity of
cuckoo hashing
• Apply the insert and lookup algorithms in a
cuckoo hash table
• Construct the graph for a cuckoo table
Remember Graphs?
• A set of nodes V =
1 , … ,
• A set of edges  =
1 , 2 , … , −1 ,
• Here:
–  = 1 , 2 , 3 , 4 , 5
–  = 1 , 2 , … 3 , 5
–  = {1 , 2 , 3 , 4 , 5 , 6 }
Graph Cycles
• A graph cycle is a path of edges such that the
first and last vertices are the same
1 , 2 , 5 , 3 , 4 , 1
Recall Hashing
• A hash function ℎ()
– Takes the target
– Hashes x to a bucket 0 …  − 1
• Perfect hashing is ideal:
– O(1) lookup
– O(1) insert
• Perfect hashing is not realistic!
Cuckoo Hashing: the idea
• Remember the cuckoo bird?
– Shares a nest with other species…
– …then kicks the other species out!
• Same idea with cuckoo hashing
– When we insert , we “kick out” what occupies
the nest,
– Then  finds a new, alternate home
Why is this cool?
• Perfect hashing guarantees
– O(1) lookup, O(1) insert
• Cuckoo hashing guarantees
– O(1) lookup
– O(1) insert**
• Other hashing strategies can’t guarantee this!
• Also, it’s an option for your final project
** There’s a caveat here, but we’ll see it later
Cuckoo Hashing: Two Nests
• Suppose we have TWO hash tables 1 , 2
– they each have a hash function ℎ1  , ℎ2 ()
– we prefer 1 , but if we have to move we’ll go to 2
– if we’re in 2 and have to move, we’ll go back to 1
• This is our collision strategy for cuckoo hashing
– Different from linear probing/open addressing
– Different from trees
Cuckoo Hashing: Example
• We want to insert
• There are no conflicts anywhere
ℎ2 ()
ℎ1 ()
x
Cuckoo Hashing : Example
• Now we want to insert
• There are no conflicts anywhere
x
y
Cuckoo Hashing : Example
• To insert , ℎ1  = ℎ1 ()
• Move  to ℎ2 ()
y
oh no!
x
z
Cuckoo Hashing : Example
• Now we insert  into ℎ1 ()
y
x
NOW we’re fine!
z
Cuckoo Hashing : Example
• The final table after inserting , ,  in order
y
x
z
Why two tables?
• Two tables, one for each hash function
• Simple to visualize, simple to implement
• But, why two?
• One table works just as well!
• Just as simple to implement (all one table)
One Table Example
• Let’s insert , ,  again, with ℎ1  , ℎ2 ()
• Again, ℎ1  preferred
ℎ2 ()
ℎ1 ()
x
One Table Example
• Now insert
• No conflicts, no problem
ℎ2 ()
x
ℎ1 ()
y
One Table Example
• Now insert
• But, another conflict with : ℎ1  = ℎ1 ()
oh no!
ℎ1 ()
x
y
ℎ2 ()
z
One Table Example
• First, move  to ℎ2 ()
ℎ1 ()
x
y
ℎ2 ()
z
One Table Example
• Now we move  to ℎ1 ()
x
z
y
One Table Example
• Final table after inserting , ,  in order
x
z
y
Graph Representation
• How can we represent our table?
• Why not a graph?
– Nodes are every possible table entry
– Edges are inserted entries
• This is a directed graph
• Direction from current location TO alternate location
Graph Example
• Remember our one-table example?
x
1
z
2
y
3
4
1
2
3
4
Infinite Insert
• Suppose we insert something, and we end up
in an infinite loop
– Or, “too many” displacements
– Some pre-defined maximum based on table size
Example: Loops
• Remember our one-table example?
x
1
z
2
y
3
4
1
2
3
4
Example: Loops
• Let’s insert : no conflicts still
x
1
z
2
y
3
4
w
1
2
3
4
Example: Loops
• Now let’s insert : displace
x
1
z
2
y
3
w
4
a
1
2
3
4
Example: Loops
• Now  is placed, and  is displaced (put in 4)
a
1
x
2
y
3
w
4
z
1
2
3
4
Example: Loops
• Now  is placed, and  is displaced (put in 3)
a
1
x
2
y
3
z
4
w
1
2
3
4
Example: Loops
• Notice what happens to the graph
• We keep going and going and going….
1
2
3
4
Analysis: Loops
• Remember infinite loops in a new insert?
• In the graph, this is a closed loop
– We might forever re-do the same displacements
• The probability of getting a loop increases
dramatically once we’ve inserted  2 elements
– N is the number of buckets (size of table)
– This is from the research on cuckoo hashing
Analysis: Loops
• What can we do once we get a loop?
– Rebuild, same size (ok solution)
– Double table size (better solution)
• We’ll need new hash functions for both
Analysis
• Lookup has O(1) time
– At MOST two places to look, ever
– One location per hash function
• Insert has amortized O(1) time
– Think of this as “in the long run”
– In practice we see O(1) time insert
– You’ll see amortized analysis in CPSC 320
• Remember the “grass and trees” analysis?
Lookup: The Code
Return the position of  (either ℎ1  or ℎ2 ())
Otherwise, return false
lookup(x)
return T[h1(x)] = x
T[h2(x)] = x
or
Insert: The Code
Given a table (array) T and item  to insert:
insert(x)
if lookup(x)
return;
pos <- h1(x);
for i <- 1 to M
if T[pos] empty
T[pos] <- x
return;
swap x and T[pos];
if pos = h1(x)
pos <- h2(x)
else
pos <- h1(x)
rehash();
insert(x);
end
// if it’s already here, done
// store h1(x)
// loop at most M times
// if T[pos] empty, done
// put x in T[pos]
// now we’re displacing
// if we couldn’t stop, rehash
// then insert currently displaced
– The average fill factor (% full) the table is
• What about cuckoo hash tables?
– For two hash functions, load factor ≈ 50%
• Remember loops?
– For three hash functions, we get ≈ 91%
• That’s pretty great, actually!
More hash functions
• What would this look like?
• We would have three tables (simple case)
– One hash function per table
• Or, we would have two alternates (one table)
More hash functions
• What would this look like?
• Each entry has TWO alternates, not one
• ℎ1  , ℎ2  , ℎ3 ()
x
z
y
More hash functions
• When something comes in new (insert)
– Put it in ℎ1 ()
• If it’s displaced, check ℎ2 ()
– If that’s full, go to ℎ3 ()
• To lookup, we just look in ℎ1  , ℎ2 , () or ℎ3 ()
– Still constant time!
• Currently we’ve only put one item per bucket
• What if we had two cells per bucket?
x,w
z
y,a
• Currently we’ve only put one item per bucket
• What if we had two cells per bucket?