25-08-2017, 09:32 PM
Some Open Questions Related to Cuckoo Hashing
esa2009.pdf (Size: 120.01 KB / Downloads: 38)
Introduction
Hash-based data structures and algorithms are currently a booming industry in
the Internet, particularly for applications related to measurement, monitoring,
and security. Hash tables and related structures, such as Bloom filters and their
derivatives, are used billions of times a day, and new uses keep proliferating.
Indeed, one of the most remarkable trends of the last five years has been the
growing prevalence of hash-based algorithms and data structures in networking
and other areas. At the same time, the field of hashing, which has enjoyed a long
and rich history in computer science (see e.g., [26]), has also enjoyed something
of a theoretical renaissance. Arguably, this burst of activity began with the
demonstration of the power of multiple choices: by giving each item multiple
possible hash locations, and storing it in the least loaded, remarkably balanced
loads can be obtained, yielding quite efficient lookup schemes [4, 7, 21, 28, 35]. An
extension of this idea, cuckoo hashing, further allows items to be moved among
its multiple choices to better avoid collisions, improving memory utilization even
further.
Background : Multiple-choice Hashing and Cuckoo
Hashing
The key result behind multiple choice hashing was presented in a seminal work
by Azar, Broder, Karlin, and Upfal [4], who showed the following: suppose that n
items1 are hashed sequentially into n buckets by hashing each item d times to obtain
d choices of a bucket for each item, and placing each item in the choice with
the smallest current number of items (or load). When d = 1, which is standard
hashing, then the maximum load grows like (1+o(1))) log n/ log log n) with high
probability [20]; when d 2, the maximum load grows like log log n/ log d+O(1)
with high probability, which even for 2 choices gives a maximum load of 6 in
most practical scenarios. The upshot is that by giving items just a small amount
of choice in where they are placed, the maximum load can be greatly reduced;
the cost is that now d locations have to be checked when trying to look up the
item, which is usually a small price to pay in systems where the d locations can
be looked up in parallel.
Insertion Times for Random Walk Cuckoo Hashing
Let us consider the online setting for cuckoo hashing, where new items may
arrive to be inserted and old items may be deleted. Note that this is in contrast
to the offline setting, where all items are initially present and one merely wants
to make a lookup table, without updates to the underlying set. When there are
d > 2 choices or more than one item per bucket, the question of what to do
when inserting a new item is more subtle than in the case with two choices.
One approach is to do a breadth first search to find an augmenting path in the
underlying graph structure, looking at all paths that require one move, then
two moves, and so on.
Limited Randomness and Cuckoo Hashing
Even from the inception of cuckoo hashing, the question of how much randomness
is required was considered a worthwhile question. While assuming perfectly
random hash functions is useful for analysis, it is both theoretically and practically
unappealing, since perfectly random hash functions are not readily available.