Dynamically optimizing queries over large scale data platforms

Karanasos, Konstantinos; Balmin, Andrey; Kutsch, M.; Özcan, Fatma; Ercegovac, Vuk; Xia, Chunyang; Jackson, Jesse

doi:10.1145/2588555.2610531

Cited by 29 publications

(26 citation statements)

References 36 publications

(46 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This proposal can be seen as an elegant adaptation of [39], proposed in a parallel database system, to a cloud system. More generally, with respect to the issue of query optimization in cloud environments, the most recent and relevant proposals are described in [11,41,53,61].…”

Section: Discussionmentioning

confidence: 99%

Big Data Management in the Cloud: Evolution or Crossroad?

Hameurlain

Morvan

2016

Communications in Computer and Information Science

View full text Add to dashboard Cite

OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. Abstract. In this paper, we try to provide a synthetic and comprehensive state of the art concerning big data management in cloud environments. In this perspective, data management based on parallel and cloud (e.g. MapReduce) systems are overviewed, and compared by relying on meeting software requirements (e.g. data independence, software reuse), high performance, scalability, elasticity, and data availability. With respect to proposed cloud systems, we discuss evolution of their data manipulation languages and we try to learn some lessons should be exploited to ensure the viability of the next generation of large-scale data management systems for big data applications.

show abstract

Section: Discussionmentioning

confidence: 99%

Big Data Management in the Cloud: Evolution or Crossroad?

Hameurlain

Morvan

2016

Communications in Computer and Information Science

View full text Add to dashboard Cite

show abstract

“…Researches on this purpose fall into two main approaches [25]. A first approach, called Single Point-based Optimization [5,17,18,20] consists in monitoring a plan execution so as to detect estimation errors and a resulting sub-optimalty. This latter is corrected by interrupting the current execution and re-optimizing the remainder of the plan using up-to-date statistics.…”

Section: Preliminariesmentioning

confidence: 99%

“…A considerable body of literature was dedicated to find solutions to this problem. These solutions include mainly: (i) techniques for better quality of the statistical metadata [7,11,13,22,27,28], (ii) run-time techniques [5,[17][18][19][20] to monitor a query execution and trigger reoptimization of the plan when a sub-optimality is detected, and (iii) compile-time strategies [1-3, 9, 12] that permit the optimizer to generate an execution plan, being aware of the imprecision of used estimates.…”

Section: Introductionmentioning

confidence: 99%

Handling Estimation Inaccuracy in Query Optimization

Moumen

Morvan

Hameurlain

2016

Web Technologies and Applications

View full text Add to dashboard Cite

OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. Abstract. Cost-based Optimizers choose query execution plans using a cost model. The latter relies on the accuracy of estimated statistics. Unfortunately, compile-time estimates often differ significantly from runtime values, leading to a suboptimal plan choices. In this paper, we propose a compile-time strategy, wherein the optimization process is fully aware of the estimation inaccuracy. This is ensured by the use of intervals of estimates rather than single-point estimates of error-prone parameters. These intervals serve to identify plans that provide stable performance in several run-time conditions, so called robust. Our strategy relies on a probabilistic approach to decide which plan to choose to start the execution. Our experiments show that our proposal allows a considerable improvement of the ability of a query optimizer to produce a robust execution plan in case of large estimation errors.

show abstract

“…Han et al [20] and Karanasos et al [23] both present their approaches to query optimization for distributed query execution by re-optimizing during execution using accurate statistic information about the data at the current stage of query execution. As we optimize query execution by hand, [23] show that collecting information such as the selectivity of predicates before query optimization only causes minor overhead.…”

Section: Related Workmentioning

confidence: 99%

“…As we optimize query execution by hand, [23] show that collecting information such as the selectivity of predicates before query optimization only causes minor overhead. This accurate information is required to determine which of the strategies we use in this paper is most efficient.…”

Section: Related Workmentioning

confidence: 99%

Fast OLAP query execution in main memory on large data in a cluster

Weidner

Dees

Sanders

2013

2013 IEEE International Conference on Big Data

View full text Add to dashboard Cite

Main memory column-stores have proven to be efficient for processing analytical queries. Still, there has been much less work in the context of clusters. Using only a single machine poses several restrictions: Processing power and data volume are bounded to the number of cores and main memory fitting on one tightly coupled system. To enable the processing of larger data sets, switching to a cluster becomes necessary. In this work, we explore techniques for efficient execution of analytical SQL queries on large amounts of data in a parallel database cluster while making maximal use of the available hardware. This includes precompiled query plans for efficient CPU utilization, full parallelization on single nodes and across the cluster, and efficient inter-node communication. We implement all features in a prototype for running a subset of TPC-H benchmark queries. We evaluate our implementation using a 128 node cluster running TPC-H queries with 30 000 gigabyte of uncompressed data.

show abstract

Dynamically optimizing queries over large scale data platforms

Cited by 29 publications

References 36 publications

Big Data Management in the Cloud: Evolution or Crossroad?

Big Data Management in the Cloud: Evolution or Crossroad?

Handling Estimation Inaccuracy in Query Optimization

Fast OLAP query execution in main memory on large data in a cluster

Contact Info

Product

Resources

About