2020
DOI: 10.1109/tkde.2019.2909204
|View full text |Cite
|
Sign up to set email alerts
|

Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement

Abstract: Approximate Nearest neighbor search (ANNS) is fundamental and essential operation in applications from many domains, such as databases, machine learning, multimedia, and computer vision. Although many algorithms have been continuously proposed in the literature in the above domains each year, there is no comprehensive evaluation and analysis of their performances.In this paper, we conduct a comprehensive experimental evaluation of many state-of-the-art methods for approximate nearest neighbor search. Our study… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
189
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 243 publications
(194 citation statements)
references
References 69 publications
0
189
0
Order By: Relevance
“…To address these challenges, Bioconductor has developed software packages that incorporate recent advance in nearest-neighbors and clustering algorithms that improve computational efficiency through approaches such as using approximate methods instead of exact methods, thereby trading an acceptable amount of accuracy for vastly improved runtimes. For example, the BiocNeighbors package [97][98][99][100] can be used to search for nearest neighbors and then a shared nearest neighbor graph using cells as nodes can be built using the scran [59]. Further, approximate methods have the advantage of smoothing over noise and sparsity, and thus potentially providing a better fit to the data [101].…”
Section: Clusteringmentioning
confidence: 99%
See 1 more Smart Citation
“…To address these challenges, Bioconductor has developed software packages that incorporate recent advance in nearest-neighbors and clustering algorithms that improve computational efficiency through approaches such as using approximate methods instead of exact methods, thereby trading an acceptable amount of accuracy for vastly improved runtimes. For example, the BiocNeighbors package [97][98][99][100] can be used to search for nearest neighbors and then a shared nearest neighbor graph using cells as nodes can be built using the scran [59]. Further, approximate methods have the advantage of smoothing over noise and sparsity, and thus potentially providing a better fit to the data [101].…”
Section: Clusteringmentioning
confidence: 99%
“…BiocNeighbors [97][98][99][100] Exact and approximate methods for nearest neighbor detection that uses the BiocParallel [91] framework to parallelize operations SC3 [102], clusterExperiment [103], SIMLR [104], mbkmeans [106], BEARscc [107], clustree [108] Unsupervised clustering frameworks for single-cell data edgeR [3,62], DESeq2 [7], limma [115] Methods developed for bulk RNA-seq differential expression that can be used in combination with methods such as zinbwave [31,116] to account for the zero-inflation MAST [28], scDD [117], BASiCS [64,65], SCDE [118] Methods to identify differentially expressed features using statistical models that directly model zero-inflation slingshot [126], TSCAN [30], monocle [123,124,127], cellTree [128] Methods for trajectory analysis or pseudotime inference MAST [28], AUCell [141], scmap [77], PADOG [139], fgsea [137], goseq [138], slalom [142], scCoGAPS [143,144], EnrichmentBrowser [140] Methods for gene set / signature enrichment analysis iSEE [148] Interactive data exploration and visualization countsimQC [153], batchQC…”
Section: Downstream Statistical Analysesmentioning
confidence: 99%
“…An approximate neighborhood graph can be constructed substantially more efficiently [22,11]. To improve performance, one can use various graph pruning methods [20,23,13]: In particular, it is not useful to keep neighbors that are close to each other [20,13].…”
Section: Retrieval Algorithmsmentioning
confidence: 99%
“…For a recent experimental comparison of several retrieval approaches see [32]. Although, HNSW is possibly the best retrieval method for generic distances [23,20], in our work we use a modified variant of SW-graph, where retrieval starts from a single point (which is considerably more efficient compared to multiple starting points). The main advantage of HNSW over the older version of SW-graph is due to (1) introduction of pruning heuristics, (2) using a single starting point during retrieval.…”
Section: Retrieval Algorithmsmentioning
confidence: 99%
“…We use the benchmarking system described in [4] as the starting point for our study. Different approaches to benchmarking nearest neighbor search are described in [9,10,20]. We refer to [4] for a detailed comparison between the frameworks.…”
Section: Introductionmentioning
confidence: 99%