Filtered document retrieval with frequency-sorted indexes

Persin, Michael; Zobel, Justin; Sacks-Davis, Ron

doi:10.1002/(sici)1097-4571(199610)47:10<749::aid-asi3>3.0.co;2-2

Cited by 87 publications

(45 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In the list above the first and second chunk contain one pointer, and the third one two. The final frequency-sorted index size is rarely larger than when generated the classic way [29].…”

Section: Special Index Construction For Ranked Queriesmentioning

confidence: 99%

Using Information Retrieval techniques for supporting data mining

Kouris

Makris

Tsakalidis

2005

Data & Knowledge Engineering

View full text Add to dashboard Cite

“…In the list above the first and second chunk contain one pointer, and the third one two. The final frequency-sorted index size is rarely larger than when generated the classic way [29].…”

Section: Special Index Construction For Ranked Queriesmentioning

confidence: 99%

Using Information Retrieval techniques for supporting data mining

Kouris

Makris

Tsakalidis

2005

Data & Knowledge Engineering

View full text Add to dashboard Cite

“…Top-k query processing has received much attention in a variety of settings such as similarity search on multimedia data [7,24,29,30,45,46], ranked retrieval on text and semistructured documents in digital libraries and on the Web [3,6,36,40,48,52,55], network and stream monitoring [4,14] collaborative recommendation and preference queries on ecommerce product catalogs [17,31,42,56], and ranking of SQL-style query results on structured data sources in general [1,11,18]. Among the ample work on top-k query processing, the TA family of algorithms for monotonic score aggregation [25,30,46] stands out as an extremely efficient and highly versatile method.…”

Section: Related Workmentioning

confidence: 99%

Algebraic query optimization for distributed top-k queries

Neumann

Michel

2007

Informatik Forsch. Entw.

View full text Add to dashboard Cite

Distributed top-k query processing is increasingly becoming an essential functionality in a large number of emerging application classes. This paper addresses the efficient algebraic optimization of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We use a dynamic programming approach to find the optimal execution plan using compact data synopses for selectivity estimation that is the basis for our cost model. The optimized query is executed in a hierarchical way involving a small and fixed number of communication phases. We have performed experiments on real web data that show the benefits of distributed top-k query optimization both in network resource consumption and query response time

show abstract

“…By modifying the output criterion, also incremental-tunable algorithms are possible. As a specific instance of this class of algorithms, the Nosferatu algorithm (Pfeifer and Pennekamp 1997) assumes that inverted list entries are sorted by decreasing indexing weights (a similar algorithm has been described in Persin et al 1996). In this case, inverted lists are processed in parallel, and entries are read in the order of decreasing RSV increments.…”

Section: Blockmentioning

confidence: 99%

Retrieval quality vs. effectiveness of specificity-oriented search in XML collections

Fuhr

Gövert

2006

Inf Retrieval

View full text Add to dashboard Cite

Content-only queries in hierarchically structured documents should retrieve the most specific document nodes which are exhaustive to the information need. For this problem, we investigate two methods of augmentation, which both yield high retrieval quality. As retrieval effectiveness, we consider the ratio of retrieval quality and response time; thus, fast approximations to the 'correct' retrieval result may yield higher effectiveness. We present a classification scheme for algorithms addressing this issue, and adopt known algorithms from standard document retrieval for XML retrieval. As a new strategy, we propose incrementalinterruptible retrieval, which allows for instant presentation of the top ranking documents. We develop a new algorithm implementing this strategy and evaluate the different methods with the INEX collection.

show abstract

Filtered document retrieval with frequency-sorted indexes

Cited by 87 publications

References 5 publications

Using Information Retrieval techniques for supporting data mining

Using Information Retrieval techniques for supporting data mining

Algebraic query optimization for distributed top-k queries

Retrieval quality vs. effectiveness of specificity-oriented search in XML collections

Contact Info

Product

Resources

About