Efficient k-nearest neighbor graph construction for generic similarity measures

Dong, Wei; Charikar, Moses; Li, Kai

doi:10.1145/1963405.1963487

Cited by 462 publications

(481 citation statements)

References 24 publications

Supporting

Mentioning

477

Contrasting

Unclassified

Order By: Relevance

“…Our Constant-Size Least Popular sampling policy (LP for short) can be applied to any KNN graph construction algorithm [4,5,10]. For simplicity, we apply it to a brute force approach that compares each pair of users and keeps the k most similar for each user.…”

Section: Baseline Algorithms and Competitorsmentioning

confidence: 99%

“…For applications for which data freshness is more valuable than the exactness of the results, such as news recommenders, such computation time is prohibitive. To overcome these costs, most applications therefore compute an approximate KNN graph by using preindexing mechanisms [5,11] or by exploiting greedy incremental strategies [4,10] to reduce the number of similarity computations. However, it seems hard to lower even further that number.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Nobody Cares if You Liked Star Wars: KNN Graph Construction on the Cheap

Kermarrec

Ruas

Taı̈ani

2018

Euro-Par 2018: Parallel Processing

View full text Add to dashboard Cite

Abstract. K-Nearest-Neighbors (KNN) graphs play a key role in a large range of applications. A KNN graph typically connects entities characterized by a set of features so that each entity becomes linked to its k most similar counterparts according to some similarity function. As datasets grow, KNN graphs are unfortunately becoming increasingly costly to construct, and the general approach, which consists in reducing the number of comparisons between entities, seems to have reached its full potential. In this paper we propose to overcome this limit with a simple yet powerful strategy that samples the set of features of each entity and only keeps the least popular features. We show that this strategy outperforms other more straightforward policies on a range of four representative datasets: for instance, keeping the 25 least popular items reduces computational time by up to 63%, while producing a KNN graph close to the ideal one.

show abstract

Section: Baseline Algorithms and Competitorsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Nobody Cares if You Liked Star Wars: KNN Graph Construction on the Cheap

Kermarrec

Ruas

Taı̈ani

2018

Euro-Par 2018: Parallel Processing

View full text Add to dashboard Cite

show abstract

“…We use the number of candidates and the number of full similarity computations as an architecture-and programming language-independent way to measure similarity search cost [33,45,46]. A naïve method may compute up to n(n − 1) = O(n 2 ) similarities to solve the APSS problem.…”

Section: Performance Measuresmentioning

confidence: 99%

Efficient identification of Tanimoto nearest neighbors

Anastasiu

Karypis

2017

Int J Data Sci Anal

View full text Add to dashboard Cite

Tanimoto, or extended Jaccard, is an important similarity measure which has seen prominent use in fields such as data mining and chemoinformatics. Many of the existing state-of-the-art methods for market basket analysis, plagiarism and anomaly detection, compound database search, and ligand-based virtual screening rely heavily on identifying Tanimoto nearest neighbors. Given the rapidly increasing size of data that must be analyzed, new algorithms are needed that can speed up nearest neighbor search, while at the same time providing reliable results. While many search algorithms address the complexity of the task by retrieving only some of the nearest neighbors, we propose a method that finds all of the exact nearest neighbors efficiently by leveraging recent advances in similarity search filtering. We provide tighter filtering bounds for the Tanimoto coefficient and show that our method, TAPNN, greatly outperforms existing base-

show abstract

“…Then we use p -nearest neighbor method to convert the similarity matrix to p-nearest neighbor graph [10]. In the N N  similarity matrix, ( , ) w i j represents the similarity between the image i and image j .…”

Section: Graph Constructionmentioning

confidence: 99%

Pedestrian Re-identification by Graph Clustering

Chen¹,

Tu²,

Lei³

2017

dtcse

View full text Add to dashboard Cite

Abstract. Matching people in multi-camera views, known as pedestrian re-identification problem, is a challenge task. Searching a designated pedestrian at the entire monitoring scene has been achieved in previous work. However, the existing results only return a sequence of pedestrian images ranked by the similarity with the input image, rather than matched images. In this paper, we use graph partitioning methods to solve the pedestrian re-identification problem. We first get a similarity matrix by calculating the similarity between each image and others. Then we consider the matrix as an undirected graph and use graph partitioning methods to partition it. The result of graph partitioning corresponds to the classification of pedestrian images. The main contributions of this paper include 1) we estimate the number of pedestrians on the multi-camera views, 2) we label a same object ID for sample images of the same pedestrian.

show abstract

Efficient k-nearest neighbor graph construction for generic similarity measures

Cited by 462 publications

References 24 publications

Nobody Cares if You Liked Star Wars: KNN Graph Construction on the Cheap

Nobody Cares if You Liked Star Wars: KNN Graph Construction on the Cheap

Efficient identification of Tanimoto nearest neighbors

Pedestrian Re-identification by Graph Clustering

Contact Info

Product

Resources

About