2006
DOI: 10.1007/s11280-006-0221-0
|View full text |Cite
|
Sign up to set email alerts
|

Three-Level Caching for Efficient Query Processing in Large Web Search Engines

Abstract: Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to improve scalabili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
91
0

Year Published

2009
2009
2017
2017

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 57 publications
(91 citation statements)
references
References 29 publications
0
91
0
Order By: Relevance
“…Essentially, it is possible to cache query results [Fagni et al 2006;Gan and Suel 2009;Markatos 2001], a portion of the inverted index [Baeza-Yates et al 2008;Zhang et al 2008], or a combination of both [BaezaYates and Jonassen 2012;Baeza-Yates and Saint-Jean 2003;Baeza-Yates et al 2008;Long and Suel 2005;Saraiva et al 2001]. Furthermore, caching strategies for documents and snippets are also proposed [Ceccarelli et al 2011;Tsegay et al 2009;Turpin et al 2007].…”
Section: Search Results Cachingmentioning
confidence: 99%
“…Essentially, it is possible to cache query results [Fagni et al 2006;Gan and Suel 2009;Markatos 2001], a portion of the inverted index [Baeza-Yates et al 2008;Zhang et al 2008], or a combination of both [BaezaYates and Jonassen 2012;Baeza-Yates and Saint-Jean 2003;Baeza-Yates et al 2008;Long and Suel 2005;Saraiva et al 2001]. Furthermore, caching strategies for documents and snippets are also proposed [Ceccarelli et al 2011;Tsegay et al 2009;Turpin et al 2007].…”
Section: Search Results Cachingmentioning
confidence: 99%
“…With this, SDC achieved a hit ratio higher than 30% in experiments conducted on a log from Altavista. Long and Suel [12] have shown that upon storing pairs of frequent terms determined by the co-occurrence in the query logs, it is possible to better increase the hit ratio. For those, the authors proposed putting the pairs of frequent terms at an intermediate level of caching in between the broker cache and the end-server caches.…”
Section: Related Workmentioning
confidence: 99%
“…Much prior work has been devoted to caching posting lists [3,4,5,6,10,11,13,14,15]. Zhang et al [15] benchmark five posting list caching policies and find LFU (least frequently used -cache members are evicted based on their infrequency of access) to be superior, and that cache hit rates for static posting list are similar to the LFU, but with less computational overhead.…”
Section: Introductionmentioning
confidence: 99%