Three-Level Caching for Efficient Query Processing in Large Web Search Engines

Long, Xiaohui; Suel, Torsten

doi:10.1007/s11280-006-0221-0

Cited by 57 publications

(91 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Essentially, it is possible to cache query results [Fagni et al 2006;Gan and Suel 2009;Markatos 2001], a portion of the inverted index [Baeza-Yates et al 2008;Zhang et al 2008], or a combination of both [BaezaYates and Jonassen 2012;Baeza-Yates and Saint-Jean 2003;Baeza-Yates et al 2008;Long and Suel 2005;Saraiva et al 2001]. Furthermore, caching strategies for documents and snippets are also proposed [Ceccarelli et al 2011;Tsegay et al 2009;Turpin et al 2007].…”

Section: Search Results Cachingmentioning

confidence: 99%

Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines

Altıngövde

Ozcan

Cambazoğlu

et al. 2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work, in order to exploit this trade-off, we propose a hybrid result caching strategy where a dynamic result cache is split into two sections: an HTML cache and a docID cache. Moreover, using a realistic cost model, we evaluate the performance of different result prefetching strategies for the proposed hybrid cache and the baseline HTML-only cache. Finally, we propose a machine learning approach to predict singleton queries, which occur only once in the query stream. We show that when the proposed hybrid result caching strategy is coupled with the singleton query predictor, the hit rate is further improved.

show abstract

Section: Search Results Cachingmentioning

confidence: 99%

Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines

Altıngövde

Ozcan

Cambazoğlu

et al. 2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…With this, SDC achieved a hit ratio higher than 30% in experiments conducted on a log from Altavista. Long and Suel [12] have shown that upon storing pairs of frequent terms determined by the co-occurrence in the query logs, it is possible to better increase the hit ratio. For those, the authors proposed putting the pairs of frequent terms at an intermediate level of caching in between the broker cache and the end-server caches.…”

Section: Related Workmentioning

confidence: 99%

A Last-Resort Semantic Cache for Web Queries

Ferrarotti

Mendoza²

2009

String Processing and Information Retrieval

View full text Add to dashboard Cite

Abstract. We propose a method to evaluate queries using a last-resort semantic cache in a distributed Web search engine. The cache stores a group of frequent queries and for each of these queries it keeps minimal data, that is, the list of machines that produced their answers. The method for evaluating the queries uses the inverse frequency of the terms in the queries stored in the cache (Idf) to determine when the results recovered from the cache are a good approximation to the exact answer set. Experiments show that the method is effective and efficient.

show abstract

“…Much prior work has been devoted to caching posting lists [3,4,5,6,10,11,13,14,15]. Zhang et al [15] benchmark five posting list caching policies and find LFU (least frequently used -cache members are evicted based on their infrequency of access) to be superior, and that cache hit rates for static posting list are similar to the LFU, but with less computational overhead.…”

Section: Introductionmentioning

confidence: 99%

The Impact of Using Combinatorial Optimisation for Static Caching of Posting Lists

Petersen

Simonsen

Lioma

2015

Information Retrieval Technology

View full text Add to dashboard Cite

Abstract. Caching posting lists can reduce the amount of disk I/O required to evaluate a query. Current methods use optimisation procedures for maximising the cache hit ratio. A recent method selects posting lists for static caching in a greedy manner and obtains higher hit rates than standard cache eviction policies such as LRU and LFU. However, a greedy method does not formally guarantee an optimal solution. We investigate whether the use of methods guaranteed, in theory, to find an approximately optimal solution would yield higher hit rates. Thus, we cast the selection of posting lists for caching as an integer linear programming problem and perform a series of experiments using heuristics from combinatorial optimisation (CCO) to find optimal solutions. Using simulated query logs we find that CCO yields comparable results to a greedy baseline using cache sizes between 200 and 1000 MB, with modest improvements for queries of length two to three.

show abstract

Three-Level Caching for Efficient Query Processing in Large Web Search Engines

Cited by 57 publications

References 29 publications

Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines

Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines

A Last-Resort Semantic Cache for Web Queries

The Impact of Using Combinatorial Optimisation for Static Caching of Posting Lists

Contact Info

Product

Resources

About