Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search

Kim, Saehoon; He, Yuxiong; Hwang, Seung-won; Elnikety, Sameh; Choi, Seungjin

doi:10.1145/2684822.2685289

Cited by 42 publications

(47 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dynamic pruning techniques such as WAND and BMW o er some relief as they o er the potential to safely skip the decompression of postings and the scoring of documents that cannot make the current top K. is makes the exact response time of a query di cult to predict, as not every posting in the postings lists will be decompressed and scored. Nevertheless recent work has considered making accurate predictions on the e ciency of a query, either in terms of absolute response time [29], or in terms of those queries with response times exceeding a threshold [19,21].…”

Section: Related Workmentioning

confidence: 99%

“…E ciency predictions facilitate a number of applications for ensuring e cient yet e ective retrieval -for instance, routing queries among busy replicated query shard servers [29]; selectively deploying multiple CPU cores for slow queries [19,21]; or adjusting the pruning aggressiveness or size of K for di erent queries [5,14,38]. Of these, the work of Tonello o et al [38] is among the most similar to ours, in that they vary the number of documents to be retrieved, K, as well as the pruning aggressiveness, before passing to a learning-to-rank re-ranking phase, based on the predicted execution time of the query.…”

Section: Related Workmentioning

confidence: 99%

“…As shown in [19,21,29,38], by using statistics derived from terms and queries it is possible to estimate the query processing time. However, since we must deal with ephemeral posting lists, most of the statistics used in prior research are not applicable.…”

Section: Predicting Complex Operator E Ciencymentioning

confidence: 99%

“…Indeed, while recent work in query e ciency prediction has shown the possibility of estimating the execution time of a query prior to its processing [19,21,29,38], none of the existing work has considered the execution time of queries containing query operators that generate ephemeral posting lists, such as #syn, or #1. is makes it di cult to select among query rewriting strategies that use such operators, as their likely execution time is unknown. Hence, in this work, we study the cost of scoring ephemeral posting lists, and use these observations to de ne accurate query e ciency predictions for advanced query operators.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Efficient & Effective Selective Query Rewriting with Efficiency Predictions

Macdonald

Tonellotto

Ounis

2017

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

To enhance e ectiveness, a user's query can be rewri en internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that bene t e ectiveness o en have a negative impact on e ciency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both e ectiveness and e ciency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the e ectiveness and e ciency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be a ained without signi cantly hindering the e ectiveness of the search engine.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Predicting Complex Operator E Ciencymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient & Effective Selective Query Rewriting with Efficiency Predictions

Macdonald

Tonellotto

Ounis

2017

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…Moreover, reducing each server's tail latency is critical when a request spans several servers and responses are aggregated from these servers. In this case, the slower servers typically dominate the response time [22].…”

Section: Introductionmentioning

confidence: 99%

Few-to-Many

Haque

Eom

et al. 2015

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

Self Cite

View full text Add to dashboard Cite

Interactive services, such as Web search, recommendations, games, and finance, must respond quickly to satisfy customers. Achieving this goal requires optimizing tail (e.g., 99th+ percentile) latency. Although every server is multicore, parallelizing individual requests to reduce tail latency is challenging because (1) service demand is unknown when requests arrive; (2) blindly parallelizing all requests quickly oversubscribes hardware resources; and (3) parallelizing the numerous short requests will not improve tail latency.This paper introduces Few-to-Many (FM) incremental parallelization, which dynamically increases parallelism to reduce tail latency. FM uses request service demand profiles and hardware parallelism in an offline phase to compute a policy, represented as an interval table, which specifies when and how much software parallelism to add. At runtime, FM adds parallelism as specified by the interval table indexed by dynamic system load and request execution time progress. The longer a request executes, the more parallelism FM adds. We evaluate FM in Lucene, an open-source enterprise search engine, and in Bing, a commercial Web search engine. FM improves the 99th percentile response time up to 32% in Lucene and up to 26% in Bing, compared to prior state-of-the-art parallelization. Compared to running requests sequentially in Bing, FM improves tail latency by a factor of two. These results illustrate that incremental parallelism is a powerful tool for reducing tail latency.

show abstract

Running Time Prediction for Web Search Queries

Rojas

Gil-Costa

2016

Parallel Processing and Applied Mathematics

View full text Add to dashboard Cite

Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search

Cited by 42 publications

References 24 publications

Efficient & Effective Selective Query Rewriting with Efficiency Predictions

Efficient & Effective Selective Query Rewriting with Efficiency Predictions

Few-to-Many

Running Time Prediction for Web Search Queries

Contact Info

Product

Resources

About