Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination

Li, Conglong; Zhang, Minjia; Andersen, David G.; He, Yuxiong

doi:10.1145/3318464.3380600

Cited by 29 publications

(15 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In parallel to our work, Li et al [52] proposed a machine learning method, developed on top of inverted-file (IVF [43] and IMI [5]) and k-NN graph (HNSW [57]) similarity search techniques, that solves the problem of early termination of approximate NN queries, while achieving a target recall. In contrast, our approach employs similarity search techniques based on data series indices [31], and with a very small training set (up to 200 training queries in our experiments), provides guarantees with per-query probabilistic bounds along different dimensions: on the distance error, on whether the current answer is the exact one, and on the time needed to find the exact answer.…”

Section: Related Workmentioning

confidence: 99%

Data Series Progressive Similarity Search with Probabilistic Quality Guarantees

Gogolou

Tsandilas

Echihabi

et al. 2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Experiments with synthetic and diverse real datasets demonstrate that our prediction methods constitute the first practical solution to the problem, significantly outperforming competing approaches.

show abstract

Section: Related Workmentioning

confidence: 99%

Data Series Progressive Similarity Search with Probabilistic Quality Guarantees

Gogolou

Tsandilas

Echihabi

et al. 2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…Furthermore, HNSW is proposed to run on GPU and the algorithm is further accelerated [30]. Based on HNSW, a training strategy to adaptively determines when terminate is proposed in Reference [20] and improves the searching speed. Although both our HSSG and HNSW algorithms use a hierarchical structure, there are still some differences between these two methods.…”

Section: Graph-based Methodmentioning

confidence: 99%

Hierarchical Satellite System Graph for Approximate Nearest Neighbor Search on Big Data

Zhang

Song

et al. 2021

ACM/IMS Trans. Data Sci.

View full text Add to dashboard Cite

Approximate nearest neighbor search is a classical problem in data science, which is widely applied in many fields. With the rapid growth of data in the real world, it becomes more and more important to speed up the nearest neighbor search process. Satellite System Graph (SSG) is one of the state-of-the-art methods to solve the problem. However, with the further increase of the data scale of problems, SSG still needs a considerable amount of time to finish the search due to the limitation of step length and start point locations. To solve the problem, we propose Hierarchical Satellite System Graph (HSSG) and present its index algorithm and search algorithm. The index process can be distributed deployed due to the good parallelism of our designed hierarchical structure. The theoretical analysis reveals that HSSG decreases the search steps and reduces the computational cost and reduces the search time by searching on the hierarchical structure with a similar indexing time compared with SSG, hence reaches a better search efficiency. The experiments on multiple datasets present that HSSG reduces the distance computations, accelerates the search process, and increases the search precision in the real tasks, especially under the tasks with large scale and crowded distributions, which presents a good application prospect of HSSG.

show abstract

“…For example, sparse retrieval methods (Section 3.6) often use the (weighted) inverted index to help find the top-𝐾 relevant documents efficiently, as that in ad hoc search [Croft et al, 2010]. Dense retrieval methods (Section 3.7, on the other hand, have to resort to efficient similarity search methods [Aumüller et al, 2017;Johnson et al, 2017;Li et al, 2020] to find relevant documents in a continuous vector space.…”

Section: Document Retrievalmentioning

confidence: 99%

“…Searching a larger subset increases both accuracy and latency. We review some commonly used ANN methods, following closely the descriptions in Li et al [2020]; Johnson et al [2017].…”

Section: Approximate Nearest Neighbor Searchmentioning

confidence: 99%

Neural Approaches to Conversational AI

Gao

Galley

2019

FNT in Information Retrieval

166

View full text Add to dashboard Cite

A conversational information retrieval (CIR) system is an information retrieval (IR) system with a conversational interface which allows users to interact with the system to seek information via multi-turn conversations of natural language, in spoken or written form. Recent progress in deep learning has brought tremendous improvements in natural language processing (NLP) and conversational AI, leading to a plethora of commercial conversational services that allow naturally spoken and typed interaction, increasing the need for more human-centric interactions in IR. As a result, we have witnessed a resurgent interest in developing modern CIR systems in both research communities and industry. This book surveys recent advances in CIR, focusing on neural approaches that have been developed in the last few years. This book is based on the authors' tutorial at SIGIR'2020 [Gao et al., 2020b], with IR and NLP communities as the primary target audience. However, audiences with other background, such as machine learning and human-computer interaction, will also find it an accessible introduction to CIR. We hope that this book will prove a valuable resource for students, researchers, and software developers. This manuscript is a working draft. Comments are welcome.

show abstract

Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination

Cited by 29 publications

References 47 publications

Data Series Progressive Similarity Search with Probabilistic Quality Guarantees

Data Series Progressive Similarity Search with Probabilistic Quality Guarantees

Hierarchical Satellite System Graph for Approximate Nearest Neighbor Search on Big Data

Neural Approaches to Conversational AI

Contact Info

Product

Resources

About