Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

Andoni, Alexandr; Laarhoven, Thijs; Razenshteyn, Ilya; Waingarten, Erik

doi:10.1137/1.9781611974782.4

Cited by 80 publications

(190 citation statements)

References 29 publications

(94 reference statements)

Supporting

Mentioning

184

Contrasting

Order By: Relevance

“…We show that the latter is at least as hard as the (r, c)-ANNS with n 1 µ points and c O( log(n) log(1/ε) ) under the Hamming distance. Combined with the results of Panigrahy, Talwar, Wieder [32] and Andoni et al [5], we get non-trivial lower bounds in the cell-probe model with a single probe that captures an interesting class of algorithms based on adaptive coresets.…”

Section: Lower Bounds For Kde Problemsupporting

confidence: 65%

Hashing-Based-Estimators for Kernel Density in High Dimensions

Charikar

Siminelakis

2017

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

View full text Add to dashboard Cite

Given a set of points P ⊂ d and a kernel k, the Kernel Density Estimate at a point x ∈ d is defined as KDE P (x) 1 |P| y∈P k(x, y). We study the problem of designing a data structure that given a data set P and a kernel function, returns approximations to the kernel density of a query point in sublinear time. We introduce a class of unbiased estimators for kernel density implemented through locality-sensitive hashing, and give general theorems bounding the variance of such estimators. These estimators give rise to efficient data structures for estimating the kernel density in high dimensions for a variety of commonly used kernels. Our work is the first to provide data-structures with theoretical guarantees that improve upon simple random sampling in high dimensions.

show abstract

Section: Lower Bounds For Kde Problemsupporting

confidence: 65%

Hashing-Based-Estimators for Kernel Density in High Dimensions

Charikar

Siminelakis

2017

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

View full text Add to dashboard Cite

show abstract

“…Depending on the feature space and distance function chosen or learned by the practitioner, different fast approximate nearest neighbor search algorithms are available. These search algorithms, both for general high-dimensional feature spaces (e.g., Gionis et al 1999;Datar et al 2004;Bawa et al 2005;Andoni and Indyk 2008;Ailon and Chazelle 2009;Muja and Lowe 2009;Boytsov and Naidan 2013;Dasgupta and Sinha 2015;Mathy et al 2015;Andoni et al 2017) and specialized to image patches (e.g., Barnes et al 2009;Ta et al 2014), can rapidly determine which data points are close to each other while parallelizing across search queries. These methods often use locality-sensitive hashing (Indyk and Motwani, 1998), which comes with a theoretical guarantee on approximation accuracy, or randomized trees (e.g., Bawa et al 2005;Muja and Lowe 2009;Dasgupta and Sinha 2015;Mathy et al 2015), which quickly prune search spaces when the trees are sufficiently balanced.…”

Section: Explaining the Popularity Of Nearest Neighbor Methodsmentioning

confidence: 99%

Explaining the Success of Nearest Neighbor Methods in Prediction

Chen

Shah

2018

FNT in Machine Learning

View full text Add to dashboard Cite

“…One such application is given in high-dimensional settings where the exact nearest neighbor search is computationally expensive and Approximate Nearest Neighbor (ANN) search is often replaced in order to reduce this cost. Our flexible result allows us to use the state-of-the-art ANN algorithms (see e.g., Andoni et al [2017Andoni et al [ , 2018) while maintaining consistency and asymptotic normality.…”

Section: Adaptive Choice For Smentioning

confidence: 99%

Non-Parametric Inference Adaptive to Intrinsic Dimension

2019

View full text Add to dashboard Cite

We consider non-parametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension D of the conditioning variable is larger than the sample size n, estimation and inference is feasible as long as the distribution of the conditioning variable has small intrinsic dimension d, as measured by locally low doubling measures. Our estimation is based on a sub-sampled ensemble of the k-nearest neighbors (k-NN) Z-estimator. We show that if the intrinsic dimension of the covariate distribution is equal to d, then the finite sample estimation error of our estimator is of order n −1/(d+2) and our estimate is n 1/(d+2) -asymptotically normal, irrespective of D. The sub-sampling size required for achieving these results depends on the unknown intrinsic dimension d. We propose an adaptive data-driven approach for choosing this parameter and prove that it achieves the desired rates. We discuss extensions and applications to heterogeneous treatment effect estimation.

show abstract

Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

Cited by 80 publications

References 29 publications

Hashing-Based-Estimators for Kernel Density in High Dimensions

Hashing-Based-Estimators for Kernel Density in High Dimensions

Explaining the Success of Nearest Neighbor Methods in Prediction

Non-Parametric Inference Adaptive to Intrinsic Dimension

Contact Info

Product

Resources

About