Estimating aggregates in time-constrained approximate queries in Oracle

Hu, Ying; Sundara, Seema; Srinivasan, Jagannathan

doi:10.1145/1516360.1516487

Cited by 9 publications

(13 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides a few limited benefits (see Previous Approaches below), the work (both I/O and computation) performed for answering past queries is often wasted afterwards. However, in an approximate query processing context (e.g., [6,19,34,36,66,85]), one might be able to change this paradigm and reuse much of the previous work done by the database system based on the following observation: * This manuscript is an extended report of the work published in ACM SIGMOD conference 2017.…”

Section: Introductionmentioning

confidence: 99%

Database Learning

Park

Tajik

Cafarella

et al. 2017

Proceedings of the 2017 ACM International Conference on Management of Data

View full text Add to dashboard Cite

In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries.We call this novel idea-learning from past query answersDatabase Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0× for the same accuracy level compared to existing AQP systems.

show abstract

Section: Introductionmentioning

confidence: 99%

Database Learning

Park

Tajik

Cafarella

et al. 2017

Proceedings of the 2017 ACM International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…There has been a large body of research on using sampling to provide quick answers to database queries, on database systems [9,15,16,22,23,24,25,33,44], and data stream systems [12,31]. Approximate aggregate processing has been the focus of many of these works, which study randomized joins [24], optimal sample construction [9,16], sample reusing [44], and sampling plan in a stream setting [12,31].…”

Section: Related Workmentioning

confidence: 99%

“…Approximate aggregate processing has been the focus of many of these works, which study randomized joins [24], optimal sample construction [9,16], sample reusing [44], and sampling plan in a stream setting [12,31]. Most of them use statistical inequalities and the central limit theorem to model the confidence interval or variance of the approximate aggregate answers [9,16,22,23,24,44]. Recently, Pansare et al [33] develop a very sophisticated Bayesian framework to infer the confidence bounds of approximate aggregate answers.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

The analytical bootstrap

Zeng

Shi

Mozafari

et al. 2014

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area of research that is now made more critical by the need for timely and cost-effective analytics over "Big Data". Assessing the quality (i.e., estimating the error) of approximate answers is essential for meaningful AQP, and the two main approaches used in the past to address this problem are based on either (i) analytic error quantification or (ii) the bootstrap method. The first approach is extremely efficient but lacks generality, whereas the second is quite general but suffers from its high computational overhead. In this paper, we introduce a probabilistic relational model for the bootstrap process, along with rigorous semantics and a unified error model, which bridges the gap between these two traditional approaches. Based on our probabilistic framework, we develop efficient algorithms to predict the distribution of the approximation results. These enable the computation of any bootstrap-based quality measure for a large class of SQL queries via a single-round evaluation of a slightly modified query. Extensive experiments on both synthetic and real-world datasets show that our method has superior prediction accuracy for bootstrap-based quality measures, and is several orders of magnitude faster than bootstrap.

show abstract

“…This situation has brought even more attention to the already-active area of Approximate Query Processing (AQP). As a critical and general approach for coping with massive datasets, sampling is widely used in databases [4,6,9,11,12,13,18], Map-Reduce systems [5,16], and data stream management systems [7,17].…”

Section: Introductionmentioning

confidence: 99%

Abs

Zeng

Shi

et al. 2014

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Approximate Query Processing (AQP) based on sampling is critical for supporting timely and cost-effective analytics over big data. To be applied successfully, AQP must be accompanied by reliable estimates on the quality of sample-produced approximate answers; the two main techniques used in the past for this purpose are (i) closedform analytic error estimation, and (ii) the bootstrap method. Approach (i) is extremely efficient but lacks generality, whereas (ii) is general but suffers from high computational overhead. Our recently introduced Analytical Bootstrap method combines the strengths of both approaches and provides the basis for our ABS system, which will be demonstrated at the conference. The ABS system models bootstrap by a probabilistic relational model, and extends relational algebra with operations on probabilistic relations to predict the distributions of the AQP results. Thus, ABS entails a very fast computation of bootstrap-based quality measures for a general class of SQL queries, which is several orders of magnitude faster than the standard simulation-based bootstrap. In this demo, we will demonstrate the generality, automaticity, and ease of use of the ABS system, and its superior performance over the traditional approaches described above.

show abstract

Estimating aggregates in time-constrained approximate queries in Oracle

Cited by 9 publications

References 4 publications

Database Learning

Database Learning

The analytical bootstrap

Abs

Contact Info

Product

Resources

About