ABSTRACT:Suppose m balls are sequentially thrown into n bins where each ball goes into a random bin. It is well-known that the gap between the load of the most loaded bin and the average is ( m log n n ), for large m. If each ball goes to the lesser loaded of two random bins, this gap dramatically reduces to (log log n) independent of m. Consider a constrained setting where not all pairs of bins can be sampled. We are given a graph where each node corresponds to a bin. The process sequentially samples an edge from the graph and places a ball in the lesser loaded of its endpoints. We show the gap is at most O(log n/σ ) where σ is the edge expansion of the graph. Our results extend naturally to the hypergraph version of this question. Our technique involves a tight analysis of what we call the "(1 + β)-choice" process for some parameter β ∈ (0, 1): each ball goes to a random bin with probability (1 − β) and the lesser loaded of two random bins with probability β. For this process we show that the gap is (log n/β), irrespective of m. Moreover the gap stays at (log n/β) in the weighted case for a large class of weight distributions. No non-trivial bounds were previously known in the weighted case, even for the 2-choice case.
Cuckoo hashing holds great potential as a high-performance hashing scheme for real applications. Up to this point, the greatest drawback of cuckoo hashing appears to be that there is a polynomially small but practically significant probability that a failure occurs during the insertion of an item, requiring an expensive rehashing of all items in the table. In this paper, we show that this failure probability can be dramatically reduced by the addition of a very small constant-sized stash. We demonstrate both analytically and through simulations that stashes of size equivalent to only three or four items yield tremendous improvements, enhancing cuckoo hashing's practical viability in both hardware and software. Our analysis naturally extends previous analyses of multiple cuckoo hashing variants, and the approach may prove useful in further related schemes.
We propose a new approach for constructing P2P networks based on a dynamic decomposition of a continuous space into cells corresponding to servers. We demonstrate the power of this approach by suggesting two new P2P architectures and various algorithms for them. The first serves as a DHT (distributed hash table) and the other is a dynamic expander network. The DHT network, which we call Distance Halving, allows logarithmic routing and load while preserving constant degrees. It offers an optimal tradeoff between degree and path length in the sense that degree d guarantees a path length of O(log d n). Another advantage over previous constructions is its relative simplicity. A major new contribution of this construction is a dynamic caching technique that maintains low load and storage, even under the occurrence of hot spots. Our second construction builds a network that is guaranteed to be an expander. The resulting topologies are simple to maintain and implement. Their simplicity makes it easy to modify and add protocols. A small variation yields a DHT which is robust against random Byzantine faults. Finally we show that, using our approach, it is possible to construct any family of constant degree graphs in a dynamic environment, though with worse parameters. Therefore, we expect that more distributed data structures could be designed and implemented in a dynamic environment.
In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance r . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. For example if the graph has node expansion Φ then we show that any deterministic t-probe data structure for n points must use space S where (St/n) t > Φ. We show similar results for randomized algorithms as well. These relationships can be used to derive most of the known lower bounds in the well known metric spaces such as l 1 , l 2 , l ∞ by simply computing their expansion. In the process, we strengthen and generalize our previous results [18]. Additionally, we unify the approach in [18] and the communication complexity based approach. Our work reduces the problem of proving cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter.In our results, as in all previous results, the dependence on t is weak; that is, the bound drops exponentially in t. We show a much stronger (tight) time-space tradeoff for the class of dynamic low contention data structures. These are data structures that supports updates in the data set and that do not look up any single cell too often.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.