Alexandr Andoni scite author profile

The goal of this article is twofold. In the first part, we survey a family of nearest neighbor algorithms that are based on the concept of localitysensitive hashing. Many of these algorithm have already been successfully applied in a variety of practical scenarios. In the second part of this article, we describe a recently discovered hashing-based algorithm, for the case where the objects are points in the d-dimensional Euclidean space. As it turns out, the performance of this algorithm is provably near-optimal in the class of the locality-sensitive hashing algorithms.

show abstract

Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

Andoni¹,

Indyk²

2006

962

588

View full text Add to dashboard Cite

show abstract

Optimal Data-Dependent Hashing for Approximate Near Neighbors

2015

View full text Add to dashboard Cite

We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an n-point dataset in a d-dimensional space our data structure achieves query time O(d · n ρ+o(1) ) and space O(n 1+ρ+o(1) + d · n), where ρ = 1 2c 2 −1 for the Euclidean space and approximation c > 1. For the Hamming space, we obtain an exponent of ρ = 1 2c−1 . Our result completes the direction set forth in [5] who gave a proof-of-concept that data-dependent hashing can outperform classic Locality Sensitive Hashing (LSH). In contrast to [5], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [15,3] for all approximation factors c > 1.From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.

show abstract

Parallel algorithms for geometric graph problems

Andoni

Nikolov

Onak³

et al. 2014

121

180

View full text Add to dashboard Cite

We give algorithms for geometric graph problems in the modern parallel models such as MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1 + )-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem [9], despite drawing significant attention in recent years.We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n 1+o (1) . We note that while recently [33] have developed a near-linear time algorithm for (1 + )-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in [33]. Furthermore, our algorithm immediately gives a (1+ )-approximation algorithm with n δ space in the streamingwith-sorting model with 1/δ O(1) passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.

show abstract

Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

Andoni

Laarhoven

Razenshteyn³

et al. 2017

150

View full text Add to dashboard Cite

We show tight upper and lower bounds for time-space trade-offs for the c-approximate Near Neighbor Search problem. For the d-dimensional Euclidean space and npoint datasets, we develop a data structure with space n 1+ρu+o(1) + O(dn) and query time n ρq+o(1) + dn o(1) for every ρ u , ρ q ≥ 0 with:In particular, for the approximation c = 2 we get:• Space n 1.77... and query time n o(1) , significantly improving upon known data structures that support very fast queries [IM98, KOR00];• Space n 1.14... and query time n 0.14... , matching the optimal data-dependent Locality-Sensitive Hashing (LSH) from [AR15];• Space n 1+o(1) and query time n 0.43... , making significant progress in the regime of near-linear space, which is arguably of the most interest for prac-This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor c > 1, improving upon [Kap15]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [BDGL16] and datadependent hashing [AINR14, AR15]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole trade-off (0.1) in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower * This paper merges two arXiv preprints: [Laa15c] (appeared online on November 24, 2015) and [ALRW16] (appeared online on May 9, 2016), and subsumes both of these articles. The full version containing all the proofs is available at https://arxiv.org/abs/1608.03580 bounds for one and two probes that match (0.1) for ρ q = 0, improving upon the best known lower bounds from [PTW10]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.

show abstract

Beyond Locality-Sensitive Hashing

Andoni¹,

Indyk²,

Nguyêݱn³

et al. 2013

110

138

View full text Add to dashboard Cite

show abstract

On the Optimality of the Dimensionality Reduction Method

2006

View full text Add to dashboard Cite

show abstract

Streaming Algorithms via Precision Sampling

Andoni

Krauthgamer

Onak³

2011

118

View full text Add to dashboard Cite

Abstract-A technique introduced by Indyk and Woodruff (STOC 2005) has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called Precision Sampling. Using this method, we obtain simple datastream algorithms that maintain a randomized sketch of an input vector x = (x1, x2, . . . , xn), which is useful for the following applications:

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alexandr Andoni

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Parallel algorithms for geometric graph problems

Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

Beyond Locality-Sensitive Hashing

On the Optimality of the Dimensionality Reduction Method

Streaming Algorithms via Precision Sampling

Contact Info

Product

Resources

About