Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract)

Dietzfelbinger, Martin; Pagh, Rasmus

doi:10.1007/978-3-540-70575-8_32

Cited by 65 publications

(89 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The storage space of the resulting PHFs and MPHFs are distant from the information theoretic lower bound by a factor of 1.43. The closest competitor is the algorithm by Martin and Pagh [7] but their algorithm do not work in linear time. Furthermore, the CHD algorithm can be tuned to run faster than the BPZ algorithm [2] (the fastest algorithm available in the literature so far) and to obtain more compact functions.…”

Section: Discussionmentioning

confidence: 99%

Hash, Displace, and Compress

Belazzougui

Botelho

Dietzfelbinger

2009

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. A hash function h, i.e., a function from the set U of all keys to the range range [m] = {0, . . . , m − 1} is called a perfect hash function (PHF) for a subset S ⊆ U of size n ≤ m if h is 1-1 on S. The important performance parameters of a PHF are representation size, evaluation time and construction time. In this paper, we present an algorithm that permits to obtain PHFs with representation size very close to optimal while retaining O(n) construction time and O(1) evaluation time. For example in the case m = 2n we obtain a PHF that uses space 0.67 bits per key, and for m = 1.23n we obtain space 1.4 bits per key, which was not achievable with previously known methods. Our algorithm is inspired by several known algorithms; the main new feature is that we combine a modification of Pagh's "hash-and-displace" approach with data compression on a sequence of hash function indices. That combination makes it possible to significantly reduce space usage while retaining linear construction time and constant query time. Our algorithm can also be used for k-perfect hashing, where at most k keys may be mapped to the same value. For the analysis we assume that fully random hash functions are given for free; such assumptions can be justified and were made in previous papers.

show abstract

Section: Discussionmentioning

confidence: 99%

Hash, Displace, and Compress

Belazzougui

Botelho

Dietzfelbinger

2009

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Upper bounds on the theshold can found by again viewing the problem as an orientation problem on random hypergraphs, and while some additional considerations are needed, an upper bound can be calculated [5]. Lower bounds have been achieved, based on a new approach for designing dictionary and retrieval structures, based on matrix techniques [13]. (See also [33].)…”

Section: Threshold Loads For Cuckoo Hashingmentioning

confidence: 99%

“…Storing the vector is then sufficient to generate the value associated with each key, and further requires just d lookups into the vector. As a specific example, for the important case of d = 3, there is an upper bound of 0.9183 for the threshold load [5], and a lower bound of 0.8894 [13]. Again, however, the question of bounds for efficient algorithms in the online setting remains more open.…”

Section: Threshold Loads For Cuckoo Hashingmentioning

confidence: 99%

Some Open Questions Related to Cuckoo Hashing

Mitzenmacher

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…An obvious possibility is to store a minimal perfect hash function on S and use the resulting value to index a table of r n bits. Much better theoretical solutions were made available recently [6,9,29]: essentially, it is possible to evaluate a function in constant time storing just r n + o(n) bits. Since we are interested in practical applications, however, we will use an extension of a technique developed by Majewski, Wormald, Havas and Czech [26] that has a slightly larger space usage, but has the advantage of being extremely fast, as it requires just the evaluation of three hash functions 1 plus three accesses to memory.…”

Section: Storing Functionsmentioning

confidence: 99%

Theory and Practise of Monotone Minimal Perfect Hashing

Belazzougui

Boldi

Pagh

et al. 2009

2009 Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments (ALENEX)

Self Cite

View full text Add to dashboard Cite

Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable (n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage.

show abstract

Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract)

Cited by 65 publications

References 32 publications

Hash, Displace, and Compress

Hash, Displace, and Compress

Some Open Questions Related to Cuckoo Hashing

Theory and Practise of Monotone Minimal Perfect Hashing

Contact Info

Product

Resources

About