Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data 2020
DOI: 10.1145/3318464.3380584
|View full text |Cite
|
Sign up to set email alerts
|

Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

Abstract: Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost model, therefore, is akin to better resource efficiency and lower operational costs. Unfortunately, the production workloads at Microsoft show that costs are very complex to model for big data systems. In this work, we investigate two key questions: (i) can we learn accurate cost… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
33
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 55 publications
(37 citation statements)
references
References 32 publications
(38 reference statements)
0
33
0
Order By: Relevance
“…Furthermore, when the cost-based optimizer is deployed in a distributed or parallel database, the cloud environment, or the cross-platform query engines, the complexity of cost model is increasing dramatically. Moreover, even with the true cardinality, the cost estimation of a query is not linear to the running time, which may lead to a suboptimal execution plan [45,81].…”
Section: Cost Modelmentioning
confidence: 99%
See 3 more Smart Citations
“…Furthermore, when the cost-based optimizer is deployed in a distributed or parallel database, the cloud environment, or the cross-platform query engines, the complexity of cost model is increasing dramatically. Moreover, even with the true cardinality, the cost estimation of a query is not linear to the running time, which may lead to a suboptimal execution plan [45,81].…”
Section: Cost Modelmentioning
confidence: 99%
“…Due to the difficulty in collecting statistics and the needs of picking the resources in big data systems, particularly in modern cloud data services, Siddiqui et al [81] propose a learning-based cost model and integrate it into the optimizer of SCOPE [7]. They build large number of small models to predict the costs of common (sub)queries, which are extracted from the workload history.…”
Section: Cost Model Alternativesmentioning
confidence: 99%
See 2 more Smart Citations
“…The different cost models associated with the big data systems were studied in research [16]. The currently available cost models depending on the cardinality measure and the learned cost model depending on operators, common sub expression were also discussed.…”
Section: A Review Of Literaturementioning
confidence: 99%