Proceedings 2004 VLDB Conference 2004
DOI: 10.1016/b978-012088469-8/50058-9
|View full text |Cite
|
Sign up to set email alerts
|

Top-k Query Evaluation with Probabilistic Guarantees

Abstract: Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagin's threshold algorithm (TA). Since the user's goal behind top-k queries is to identify one or a few relevant and novel data items, it is intriguing to use approximate variants of TA to reduce run-time costs. This paper introduces a family of approximate top-k algorithms based on probabilistic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
50
0

Year Published

2006
2006
2016
2016

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 80 publications
(50 citation statements)
references
References 13 publications
(17 reference statements)
0
50
0
Order By: Relevance
“…Ayanso et al [3] analyze the common histogram construction techniques and their impact on top-k retrieval. Theobald et al [21] propose a method for probabilistic topk queries by predicting the total score of a candidate item. In some cases, random access is limited or unavailable, NRA [12] is proposed with sequential access only.…”
Section: Related Workmentioning
confidence: 99%
“…Ayanso et al [3] analyze the common histogram construction techniques and their impact on top-k retrieval. Theobald et al [21] propose a method for probabilistic topk queries by predicting the total score of a candidate item. In some cases, random access is limited or unavailable, NRA [12] is proposed with sequential access only.…”
Section: Related Workmentioning
confidence: 99%
“…The algorithm terminates when the candidate queue is empty (and a virtual document that has not yet been seen in any index list and has a bestscore ≤ i=1...m high(i) can not qualify for the topk either). For approximating a top-k result with low error probability [52], the conservative bestscores, with high(i) values assumed for unknown scores, can be substituted by quantiles of the score distribution in the unvisited tails of the index lists. Technically, this amounts to estimating the convolution of the unknown scores of a candidate.…”
Section: Related Workmentioning
confidence: 99%
“…Top-k query processing has received much attention in a variety of settings such as similarity search on multimedia data [7,24,29,30,45,46], ranked retrieval on text and semistructured documents in digital libraries and on the Web [3,6,36,40,48,52,55], network and stream monitoring [4,14] collaborative recommendation and preference queries on ecommerce product catalogs [17,31,42,56], and ranking of SQL-style query results on structured data sources in general [1,11,18]. Among the ample work on top-k query processing, the TA family of algorithms for monotonic score aggregation [25,30,46] stands out as an extremely efficient and highly versatile method.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, our preliminary work is limited to rank formulation for numerical data, while this paper reports our extension, which enables both processing and formulation of the combination of numerical and categorical data. As supporting structures for ranked retrieval, one-dimensional (e.g., sorted access [8,7,3,4,11] or inverted index [1,18,19,5]) or multi-dimensional numerical indices (e.g., R-tree [17]) have been considered. In particular, our work is closely related to [5] indexing the relevance score of each possible value by applying Bayes' Rule on prior query workload and [17] indexing multi-dimensional objects by the similarity score using an R-tree index.…”
Section: Related Workmentioning
confidence: 99%