Dynamic Query Scheduling in Parallel Data Warehouses

Märtens, Holger; Rahm, Erhard; Stöhr, Thomas

doi:10.1007/3-540-45706-2_43

Cited by 12 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As the multi‐core processors have seen a very fast evolution in the last decade and the scalability of distributed systems has also improved significantly , one can assume that the most efficient way to lower the query processing time is to parallelize their execution. Numerous studies have been performed regarding the parallel execution of queries in a database or data warehouse . Hadoop‐DB is a hybrid data warehouse environment that uses several relational database management system (PostgreSql) as data nodes and Hadoop + Hive as the execution engine.…”

Section: Related Work and Backgroundmentioning

confidence: 99%

Single-scan: a fast star-join query processing algorithm

PURDILĂ

Pentiuc

2015

Softw. Pract. Exper.

View full text Add to dashboard Cite

Summary A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popular query used in data warehouses is star‐join. In this paper, we present a fast and efficient star‐join query execution algorithm built on top of a MapReduce framework called Hadoop. By using dynamic filters against dimension tables, the algorithm needs a single scan of the fact table, which means a significant reduction of input/output operations and computational complexity. Also, the algorithm requires only two MapReduce iterations in total–one to build the filters against dimension tables and one to scan the fact table. Our experiments show that the proposed algorithm performs much better than the existing solutions in terms of execution time and input/output. Copyright © 2014 John Wiley & Sons, Ltd.

show abstract

Section: Related Work and Backgroundmentioning

confidence: 99%

Single-scan: a fast star-join query processing algorithm

PURDILĂ

Pentiuc

2015

Softw. Pract. Exper.

View full text Add to dashboard Cite

show abstract

“…This interaction is common in Multiple Query Optimization (MQO) problem and physical design in RDW (Sellis, 1988). The MQO is related to other optimization problems such as Buffer Management (BMP) (Cornell & Yu, 1989) and Query Scheduling (QSP) (Chipara, Lu & Roman, 2007;Märtens, Rahm & Stöhr, 2002), and the joint problem of BMP and QSP (Gupta, Sudarshan & Viswanathan, 2001;Tan & Lu, 1995;Gupta, Sudarshan & Viswanathan, 2001). These problems are studied in three main levels: off-line, dynamic (adaptive) and on-line.…”

Section: Related Workmentioning

confidence: 99%

“…The QSP has also been studied in several environments: centralized (Thomas, Diwan & Sudarshan, 2006), distributed and parallel databases (Märtens, Rahm & Stöhr, 2002). It has been proved as strongly NP-complete problem (Thomas, Diwan & Sudarshan, 2006).…”

Section: Off-line Optimizationmentioning

confidence: 99%

A Query Beehive Algorithm for Data Warehouse Buffer Management and Query Scheduling

Kerkad

Bellatreche

Richard

et al. 2014

International Journal of Data Warehousing and Mining

View full text Add to dashboard Cite

Analytical queries, like those used in data warehouses and OLAP, are generally interdependent. This is due to the fact that the database is usually modeled with a denormalized star schema or its variants, where most queries pass through a large central fact table. Such interaction has been largely exploited in query optimization techniques such as materialized views. Nevertheless, such approaches usually ignore buffer management and assume queries have a fixed order and are known in advance. We believe such assumptions are too strong and thus they need to be revisited and simplified. In this paper, we study the combination of two problems: buffer management and query scheduling, in both static and dynamic scenarios. We present an NP-hardness study of the joint problem, highlighting its complexity. We then introduce a new and highly efficient algorithm inspired by a beehive. We conduct an extensive experimental evaluation on a real DBMS showing the superiority of our algorithm compared to previous ones as well as its excellent scalability.

show abstract

Bibliography

2008

High‐Performance Parallel Database Processing and Grid Databases

View full text Add to dashboard Cite

Dynamic Query Scheduling in Parallel Data Warehouses

Cited by 12 publications

References 11 publications

Single-scan: a fast star-join query processing algorithm

Single-scan: a fast star-join query processing algorithm

A Query Beehive Algorithm for Data Warehouse Buffer Management and Query Scheduling

Bibliography

Contact Info

Product

Resources

About