Three-level caching for efficient query processing in large Web search engines

Long, Xiaohui; Suel, Torsten

doi:10.1145/1060745.1060785

Cited by 120 publications

(20 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In modern search engines, query processing represents one of the major performance bottlenecks, so caching can help to speed up the search engine performance as well as to reduce the latency perceived by the users. Caching can be applied at different granularity including query results [28], posting lists of query terms [39], and posting list intersections [31]. Saraiva et al [39] proposed a two-level architecture where the front-end machine caches the results of popular queries, while the back-end machines have a cache for the posting lists of most frequently requested terms.…”

Section: Related Workmentioning

confidence: 99%

Topical result caching in web search engines

Mele

Tonellotto

Perego

2020

Information Processing & Management

View full text Add to dashboard Cite

Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache). It improves traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream.Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion.We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm.

show abstract

Section: Related Workmentioning

confidence: 99%

Topical result caching in web search engines

Mele

Tonellotto

Perego

2020

Information Processing & Management

View full text Add to dashboard Cite

show abstract

“…Altingovde, Ozcan, Cambazoglu, and Ulusoy (2011) coupled a result cache with a document id cache, which stores only document ids without snippets, further reducing the query traffic going to the backend search system. Long and Suel (2005) introduce, on top of result and posting list caching, a third level of cache where precomputed intersections of posting lists are stored. Li, Lee, Sivasubramaniam, and Giles (2007) propose a hybrid architecture involving result, posting list, and document caches.…”

Section: Related Workmentioning

confidence: 99%

A machine learning approach for result caching in web search engines

Küçükyılmaz

Cambazoğlu²,

Aykanat

et al. 2017

Information Processing & Management

View full text Add to dashboard Cite

A commonly used technique for improving search engine performance is result caching. In result caching, precomputed results (e.g., URLs and snippets of best matching pages) of certain queries are stored in a fast-access storage. The future occurrences of a query whose results are already stored in the cache can be directly served by the result cache, eliminating the need to process the query using costly computing resources. Although other performance metrics are possible, the main performance metric for evaluating the success of a result cache is hit rate. In this work, we present a machine learning approach to improve the hit rate of a result cache by facilitating a large number of features extracted from search engine query logs. We then apply the proposed machine learning approach to static, dynamic, and static-dynamic caching. Compared to the previous methods in the literature, the proposed approach improves the hit rate of the result cache up to 0.66%, which corresponds to 9.60% of the potential room for improvement.

show abstract

“…A further approach involves caching portions of a query (i.e., pairs of terms), as initially proposed in [14] and extended in [6]. This approach is named Intersection Caching and is implemented at search node level as well.…”

Section: Gabriel Tolosa Luca Becchetti Esteban Feuerstein Alberto mentioning

confidence: 99%

Performance improvements for search systems using an integrated cache of lists + intersections

et al. 2017

View full text Add to dashboard Cite

Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. Previous studies show that the use of caching techniques is crucial in search engines, as it helps reducing query response times and processing workloads on search servers. In this work we propose and evaluate a static cache that acts simultaneously as list and intersection cache, offering a more efficient way of handling cache space. We also use a query resolution strategy that takes advantage of the existence of this cache to reorder the query execution sequence. In addition, we propose effective strategies to select the term pairs that should populate the cache. We also represent the data in cache in both raw and compressed forms and evaluate the differences between them using different configurations of cache sizes. The results show that the proposed Integrated Cache outperforms the standard posting lists cache in most of the cases, taking advantage not only of the intersection cache but also the query resolution strategy

show abstract

Three-level caching for efficient query processing in large Web search engines

Cited by 120 publications

References 39 publications

Topical result caching in web search engines

Topical result caching in web search engines

A machine learning approach for result caching in web search engines

Performance improvements for search systems using an integrated cache of lists + intersections

Contact Info

Product

Resources

About