2008
DOI: 10.1145/1386118.1386125
|View full text |Cite
|
Sign up to set email alerts
|

Efficient online index construction for text databases

Abstract: Inverted index structures are a core element of current text retrieval systems. They can be constructed quickly using offline approaches, in which one or more passes are made over a static set of input data, and, at the completion of the process, an index is available for querying. However, there are search environments in which even a small delay in timeliness cannot be tolerated, and the index must always be queryable and up to date. Here we describe and analyze a geometric partitioning … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(28 citation statements)
references
References 22 publications
0
28
0
Order By: Relevance
“…An updatable keyword search system is usually implemented with a hierarchy of indexes [24,14,21,26]. New data is accumulated in a small updatable structure that also supports concurrent queries, while the main part of the hierarchy consists of a set of read-only indexes.…”
Section: System Modelmentioning
confidence: 99%
“…An updatable keyword search system is usually implemented with a hierarchy of indexes [24,14,21,26]. New data is accumulated in a small updatable structure that also supports concurrent queries, while the main part of the hierarchy consists of a set of read-only indexes.…”
Section: System Modelmentioning
confidence: 99%
“…This is reflected in many specific explanations, examples, and arguments. Nonetheless, many of the techniques are readily applicable or at least transferable to other possible application domains of B-trees, in particular to information retrieval [83], file systems [71], and "No SQL" databases and key-value stores recently popularized for web services and cloud computing [21,29].…”
Section: Purpose and Scopementioning
confidence: 99%
“…The merge-based methods merge postings from memory and disk into a single file on disk. The latest related methods amortize the cost by permitting the creation of multiple inverted files on disk and merging them according to specific patterns [15,16]. Even though in-place index maintenance has linear asymptotic disk cost that is lower than the polynomial cost of merge-based methods, merge-based methods are experimentally shown to use sequential disk transfers and outperform inplace methods [17].…”
Section: Related Workmentioning
confidence: 99%
“…Unlike the latest methods that keep the merging cost low through balanced-tree schemes [6,11,15], in the present paper we follow the more straightforward approach of maintaining the postings on disk in fixed-size blocks. Each fixed-size block may contain the postings of a single frequent term or the posting lists of a lexicographically ordered subset of several infrequent terms.…”
Section: Introductionmentioning
confidence: 99%