Fast nearest neighbor search through sparse random projections and voting

Hyvönen, Ville; Pitkänen, Teemu; Tasoulis, Sotiris K.; Jääsaari, Elias; Tuomainen, Risto; Wang, Liang; Corander, Jukka; Roos, Teemu

doi:10.1109/bigdata.2016.7840682

Cited by 34 publications

(31 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 2 shows the accuracy-speed trade-off for all combinations of the considered tree types and search methods on two benchmark data sets. For RP trees, the results are in line with previous experiments [6]. For each type of tree, voting outperforms priority queue (for a given recall level, its query time is faster).…”

Section: Voting Searchsupporting

confidence: 87%

“…Our approach is based on exploiting the structure of randomized spacepartitioning trees [14,13,3,6]. ANN algorithms based on randomized space-partitioning trees have been used recently for example in machine translation [5], object detection [1] and recommendation engines [17].…”

Section: Introductionmentioning

confidence: 99%

“…Randomized k-d (RKD) trees [14] with a priority queue search are used in the popular open-source library FLANN [13]. Random projection (RP) trees [3] with a voting search have a stronger empirical performance than RKD trees with a priority queue search [6]. However, a single principal component (PCA) tree has been found to be more accurate than a single RP tree [16].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

Jääsaari¹,

Hyvönen

Roos

2019

Advances in Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy-speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.

show abstract

Section: Voting Searchsupporting

confidence: 87%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

Jääsaari¹,

Hyvönen

Roos

2019

Advances in Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

show abstract

“…Typically, the sparsity parameter a can be chosen as 1 √ d , as in [3], to obtain good accuracy. Now, each d-dimensional data-point p ∈ X is then projected onto the sparse vector r. The dataset X is then divided into two subsets at the median point of the projected values.…”

Section: A Mrpt Algorithmmentioning

confidence: 99%

“…We consider the problem of finding the k nearest neighbors of a query point in a given highdimensional dataset. To solve this problem efficiently, our goal is to speed up an existing algorithm [3] by parallelizing it, and to make it resilient to stragglers [4]. The k-nearest neighbor (k-NN) problem is often a first step used in a variety of real world applications including genomics [5], personalized search [6], network security [7], and web based recommendation systems [8].…”

Section: Introductionmentioning

confidence: 99%

An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation

Sheth

Dutta

Chaudhari

et al. 2018

2018 IEEE International Conference on Big Data (Big Data)

Self Cite

View full text Add to dashboard Cite

We propose a novel application of coded computing to the problem of the nearest neighbor estimation using MatDot Codes [2] that are known to be optimal for matrix multiplication in terms of recovery threshold under storage constraints. In approximate nearest neighbor algorithms, it is common to construct efficient in-memory indexes to improve query response time. One such strategy is Multiple Random Projection Trees (MRPT), which reduces the set of candidate points over which Euclidean distance calculations are performed. However, this may result in a high memory footprint and possibly paging penalties for large or high-dimensional data. Here we propose two techniques to parallelize MRPT that exploit data and model parallelism respectively by dividing both the data storage and the computation efforts among different nodes in a distributed computing cluster. This is especially critical when a single compute node cannot hold the complete dataset in memory. We also propose a novel coded computation strategy based on MatDot codes for the model-parallel architecture that, in a straggler-prone environment, achieves the storage-optimal recovery threshold, i.e., the number of nodes that are required to serve a query. We experimentally demonstrate that, in the absence of straggling, our distributed approaches require less query time than execution on a single processing node, providing near-linear speedups with respect to the number of worker nodes. Through our experiments on real systems with simulated straggling, we also show that our strategy achieves a faster query execution than the uncoded strategy in a straggler-prone environment.

show abstract

Generating Long and Coherent Text with Multi-Level Generative Adversarial Networks

Tang

Zhao

et al. 2021

Web and Big Data

View full text Add to dashboard Cite

Fast nearest neighbor search through sparse random projections and voting

Cited by 34 publications

References 21 publications

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation

Generating Long and Coherent Text with Multi-Level Generative Adversarial Networks

Contact Info

Product

Resources

About