2009
DOI: 10.1145/1498698.1564507
|View full text |Cite
|
Sign up to set email alerts
|

An experimental investigation of set intersection algorithms for text searching

Abstract: Abstract. The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we propose several improved algorithms for computing the intersection of sorted arrays, and in particular for searching sorted arrays in the intersection context. We perform an experimental comparison with the algorithms from the previous studies from Demaine, López-Ortiz and Munro [ALENEX 2001], and from Baeza-Yates and Salinger [SPIRE 2005]; in addition, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
60
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(61 citation statements)
references
References 11 publications
1
60
0
Order By: Relevance
“…Experiments in [22], [23] compare several intersection algorithms and show that the complexity of intersections relies heavily on the distributions of the elements in the sets.…”
Section: Related Workmentioning
confidence: 99%
“…Experiments in [22], [23] compare several intersection algorithms and show that the complexity of intersections relies heavily on the distributions of the elements in the sets.…”
Section: Related Workmentioning
confidence: 99%
“…A typical way to solve a ranked intersection is to first compute a Boolean intersection, then compute the scores of all the resulting documents, and finally keep the documents with the k highest scores. This approach has triggered much research on the Boolean intersection problem [21,6,34,8,26]. This approach is, of course, suboptimal, since in principle one could use weight information to filter out documents that belong to the intersection but one can ensure will not make it to the top-k list.…”
Section: Basic Conceptsmentioning
confidence: 99%
“…Traditionally, the posting lists were stored on disk. With the availability of large amounts of main memory, this trend has changed to use the main memory of a cluster of machines, and many intersection algorithms have been designed for random access [21,6,34,20,35,37,8,26]. In distributed main-memory systems, usually documents are distributed across independent inverted indexes, and each index contributes with a few results to the final top-k list.…”
Section: Basic Conceptsmentioning
confidence: 99%
“…Thus, in order to retrieve the query result we usually have to scan the entire lists. There has been extensive work on list intersection algorithms, that is applicable to inverted files [6,39,43]. The focus in these works lies in reducing the CPU cost, since they are mostly aimed at specialized systems, which answer few types of queries and can afford to have all lists in main memory.…”
Section: Query Evaluationmentioning
confidence: 99%
“…The former exploits parallelism between different queries, while the latter parallelizes the processing within a single query. Finally, [6] offers an experimental comparison of several popular methods of list intersection with respect to their CPU cost.…”
Section: Related Workmentioning
confidence: 99%