Efficient online index construction for text databases

Lester, Nicholas; Moffat, Alistair; Zobel, Justin

doi:10.1145/1386118.1386125

Cited by 38 publications

(28 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An updatable keyword search system is usually implemented with a hierarchy of indexes [24,14,21,26]. New data is accumulated in a small updatable structure that also supports concurrent queries, while the main part of the hierarchy consists of a set of read-only indexes.…”

Section: System Modelmentioning

confidence: 99%

Workload-aware indexing for keyword search in social networks

Bjørklund

Götz

Gehrke

et al. 2011

Proceedings of the 20th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

More and more data is accumulated inside social networks. Keyword search provides a simple interface for exploring this content. However, a lot of the content is private, and a search system must enforce the privacy settings of the social network. In this paper, we present a workload-aware keyword search system with access control based on a social network. We make two technical contributions: (1) HeapUnion, a novel union operator that improves processing of search queries with access control by up to a factor of two compared to the best previous solution; and (2) highly accurate cost models that vary in sophistication and accuracy; these cost models provide input to an optimization algorithm that selects the most efficient organization of access control meta-data for a given workload. Our experimental results with real and synthetic data show that our approach outperforms previous work by up to a factor of three. General TermsPerformance, Security

show abstract

Section: System Modelmentioning

confidence: 99%

Workload-aware indexing for keyword search in social networks

Bjørklund

Götz

Gehrke

et al. 2011

Proceedings of the 20th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

show abstract

“…This is reflected in many specific explanations, examples, and arguments. Nonetheless, many of the techniques are readily applicable or at least transferable to other possible application domains of B-trees, in particular to information retrieval [83], file systems [71], and "No SQL" databases and key-value stores recently popularized for web services and cloud computing [21,29].…”

Section: Purpose and Scopementioning

confidence: 99%

Modern B-Tree Techniques

Graefe

2010

FNT in Databases

105

View full text Add to dashboard Cite

“…The merge-based methods merge postings from memory and disk into a single file on disk. The latest related methods amortize the cost by permitting the creation of multiple inverted files on disk and merging them according to specific patterns [15,16]. Even though in-place index maintenance has linear asymptotic disk cost that is lower than the polynomial cost of merge-based methods, merge-based methods are experimentally shown to use sequential disk transfers and outperform inplace methods [17].…”

Section: Related Workmentioning

confidence: 99%

“…Unlike the latest methods that keep the merging cost low through balanced-tree schemes [6,11,15], in the present paper we follow the more straightforward approach of maintaining the postings on disk in fixed-size blocks. Each fixed-size block may contain the postings of a single frequent term or the posting lists of a lexicographically ordered subset of several infrequent terms.…”

Section: Introductionmentioning

confidence: 99%

Low-cost management of inverted files for online full-text search

Margaritis

Anastasiadis

2009

Proceedings of the 18th ACM Conference on Information and Knowledge Management

View full text Add to dashboard Cite

In dynamic environments with frequent content updates, we require online full-text search that scales to large data collections and achieves low search latency. Several recent methods that support fast incremental indexing of documents typically keep on disk multiple partial index structures that they continuously update as new documents are added. However, spreading indexing information across multiple locations on disk tends to considerably decrease the search responsiveness of the system. In the present paper, we take a fresh look at the problem of online full-text search with consideration of the architectural features of modern systems. Selective Range Flush is a greedy method that we introduce to manage the index in the system by using fixed-size blocks to organize the data on disk and dynamically keep low the cost of data transfer between memory and disk. As we experimentally demonstrate with the Proteus prototype implementation that we developed, we retrieve indexing information at latency that matches the lowest achieved by existing methods. Additionally, we reduce the total building cost by 30% in comparison to methods with similar retrieval time.

show abstract

Efficient online index construction for text databases

Cited by 38 publications

References 22 publications

Workload-aware indexing for keyword search in social networks

Workload-aware indexing for keyword search in social networks

Modern B-Tree Techniques

Low-cost management of inverted files for online full-text search

Contact Info

Product

Resources

About