Proceedings of the 14th ACM International Conference on Information and Knowledge Management 2005
DOI: 10.1145/1099554.1099739
|View full text |Cite
|
Sign up to set email alerts
|

Fast on-line index construction by geometric partitioning

Abstract: Inverted index structures are the mainstay of modern text retrieval systems. They can be constructed quickly using off-line mergebased methods, and provide efficient support for a variety of querying modes. In this paper we examine the task of on-line index construction -that is, how to build an inverted index when the underlying data must be continuously queryable, and the documents must be indexed and available for search as soon they are inserted. When straightforward approaches are used, document insertion… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
48
0

Year Published

2006
2006
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(49 citation statements)
references
References 16 publications
1
48
0
Order By: Relevance
“…Logarithmic Merge for base k = 2). The results (reported in Table 1) are consistent with earlier findings [6,1] and show that the off-line method really should not be used in a dynamic environment. Logarithmic merge exhibits indexing performance close to off-line index construction and query processing performance close to Immediate Merge.…”
Section: Resultssupporting
confidence: 90%
See 1 more Smart Citation
“…Logarithmic Merge for base k = 2). The results (reported in Table 1) are consistent with earlier findings [6,1] and show that the off-line method really should not be used in a dynamic environment. Logarithmic merge exhibits indexing performance close to off-line index construction and query processing performance close to Immediate Merge.…”
Section: Resultssupporting
confidence: 90%
“…This quadratic time complexity renders Immediate Merge infeasible for text collections much larger than the available main memory. Recently, Büttcher and Clarke [1] and Lester et al [6] have proposed mergebased update strategies that do not share this shortcoming. By allowing a controlled number of on-disk indices to exist in parallel, indexing efficiency is greatly increased, while query processing performance remains almost unchanged compared to the Immediate Merge strategy.…”
Section: Merge-based Index Maintenancementioning
confidence: 99%
“…Moreover, according to the observation in Figure 3, the attribute frequency distribution follows the Zipf law [37] like style. This interesting observation motivates us to explore the successful inverted index [7,22,25,33,35,38,39] used in information retrieval to manipulate the dataspaces.…”
Section: Sparse Featuresmentioning
confidence: 99%
“…Lester [22] partitioning the index for efficient on-line index. To make documents immediately accessible, the index is divided into a controlled number of partitions.…”
Section: Related Workmentioning
confidence: 99%
“…If query requests were permitted using the old index, recall and precision of the search engine would not be ensured since the web set has changed. Nowadays, most search engines use incremental index updating strategy [2]. This paper is related to the Internet connection among clusters.…”
Section: Introductionmentioning
confidence: 99%