Low-cost management of inverted files for online full-text search

Margaritis, Giorgos; Anastasiadis, Stergios V.

doi:10.1145/1645953.1646012

Cited by 14 publications

(7 citation statements)

References 19 publications

(46 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The accumulated postings are routinely combined with the rest of the data in a hierarchy based on geometric partitioning [8]. More advanced techniques exist [7,9], but they would not change the tradeoffs that we show. We use PForDelta [15] for compression, which has shown efficient decompression performance in recent studies [13].…”

Section: Methodsmentioning

confidence: 93%

Search in social networks with access control

Bjørklund

Götz

Gehrke

2010

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data

View full text Add to dashboard Cite

More and more important data is accumulated inside social networks. Limiting the flow of private information across a social network is very important, and most social networks provide sophisticated privacy settings to control this flow. Creating such extensive access control knobs makes the search for content a hard problem since each user sees a unique subset of all the data.In this work, we take a first step at integrating access control based on a social network in a search system. We describe a set of solutions to the problem, including what indexes to construct and how to filter out inaccessible results. An experimental analysis illustrates the tradeoffs of the various strategies, and we point out a set of interesting future research directions in this area.

show abstract

Section: Methodsmentioning

confidence: 93%

Search in social networks with access control

Bjørklund

Götz

Gehrke

2010

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data

View full text Add to dashboard Cite

show abstract

“…The variety of cipher terms caused by key change will result in index update. For index update, Margaritis and Anastasiadis (2009) only flush selectively the terms with most posting lists in memory into disk to merge it with primary index when the memory gets full with new posting lists. Gurajada and Kumar (2009) propose a new merge-based index maintenance strategy for information retrieval systems.…”

Section: Related Workmentioning

confidence: 99%

Mimir: a term-distributed retrieval system for secret documents

Gao

2018

IJICT

View full text Add to dashboard Cite

In order to access sensitive documents shared over government, army and enterprise intranets, users rely on an indexing facility where they can quickly locate relevant documents they are allowed to access: 1) without leaking information about the remaining documents; 2) with a balanced load on the index servers. To address this problem, we propose Mimir, a distributed cipher retrieval system for sensitive documents. Mimir constructs the distributed indexes based on load balanced term distribution for better search efficiency and load balanced query. Mimir utilises encryption with random key, partial key update to protect sensitive data and improve query efficiency. Our experiments show that Mimir can effectively protect secret data and answer queries nearly as fast as an ordinary inverted index.

show abstract

“…In the buffer-and-flush approach, Margaritis and Anastasiadis [12] present an interesting alternative beyond the three strategies discussed above. They make a slightly different design choice: when the in-memory buffer reaches capacity, instead of flushing the entire in-memory index, they choose to flush only a portion of the term space (a contiguous range of terms based on lexicographic sort order), performing a merge with the corresponding on-disk portions of the inverted lists.…”

Section: Related Workmentioning

confidence: 99%

Dynamic memory allocation policies for postings in real-time Twitter search

Asadi

Lin

Busch

2013

Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

We explore a real-time Twitter search application where tweets are arriving at a rate of several thousands per second. Real-time search demands that they be indexed and searchable immediately, which leads to a number of implementation challenges. In this paper, we focus on one aspect: dynamic postings allocation policies for index structures that are completely held in main memory. The core issue can be characterized as a "Goldilocks Problem". Because memory remains today a scare resource, an allocation policy that is too aggressive leads to inefficient utilization, while a policy that is too conservative is slow and leads to fragmented postings lists. We present a dynamic postings allocation policy that allocates memory in increasingly-larger "slices" from a small number of large, fixed pools of memory. With an analytical model and experiments, we explore different settings that balance time (query evaluation speed) and space (memory utilization).

show abstract

Low-cost management of inverted files for online full-text search

Cited by 14 publications

References 19 publications

Search in social networks with access control

Search in social networks with access control

Mimir: a term-distributed retrieval system for secret documents

Dynamic memory allocation policies for postings in real-time Twitter search

Contact Info

Product

Resources

About