No abstract
Commercial enterprise data warehouses are typically implemented on parallel databases due to the inherent scalability and performance limitation of a serial architecture. Queries used in such large data warehouses can contain complex predicates as well as multiple joins, and the resulting query execution plans generated by the optimizer may be suboptimal due to mis-estimates of row cardinalities. Progressive optimization (POP) is an approach to detect cardinality estimation errors by monitoring actual cardinalities at runtime and to recover by triggering re-optimization with the actual cardinalities measured. However, the original serial POP solution is based on a serial processing architecture, and the core ideas cannot be readily applied to a parallel shared-nothing environment. Extending the serial POP to a parallel environment is a challenging problem since we need to determine when and how we can trigger re-optimization based on cardinalities collected from multiple independent nodes. In this paper, we present a comprehensive and practical solution to this problem, including several novel voting schemes whether to trigger re-optimization, a mechanism to reuse local intermediate results across nodes as a partitioned materialized view, several flavors of parallel checkpoint operators, and parallel checkpoint processing methods using efficient communication protocols. This solution has been prototyped in a leading commercial parallel DBMS. We have performed extensive experiments using the TPC-H benchmark and a real-world database. Experimental results show that our solution has negligible runtime overhead and accelerates the performance of complex OLAP queries by up to a factor of 22.
If presented with inaccurate statistics, even the most sophisticated query optimizers make mistakes. They may wrongly estimate the output cardinality of a certain operation and thus make sub-optimal plan choices based on that cardinality. Maintaining accurate statistics is hard, both because each table may need a specifically parameterized set of statistics and because statistics get outdated as the database changes. Automated Statistic Collection (ASC) is a new component in IBM DB2 UDB that, without any DBA intervention, observes and analyzes the effects of faulty statistics and, in response, it triggers actions that continuously repair the latter. In this demonstration, we will show how ASC works to alleviate the DBA from the task of maintaining fresh, accurate statistics in several challenging scenarios. ASC is able to reconfigure the statistics collection parameters (e.g, number of frequent values for a column, or correlations between certain column pairs) on a per-table basis. ASC can also detect and guard against outdated statistics caused by high updates/inserts/deletes rates in volatile, dynamic databases. We will also show how ASC works from the inside: from how cardinality mis-estimations are introduced in different kind of operators, to how this error is propagated to later operations in the plan, to how this influences plan choices inside the optimizer.
The paper introduces the problem of designing dynamic workload management (WM) tools that are aware of the diversity of classes of users and their diverse access patterns. Our approach should be contrasted with the current WM tools and their ability of detecting performance degradations in accordance with the user's preference goals. We define the problem and suggest a formal definition that allows the further development of algorithms and architectures that allow the implementation of effective on-line database tuning strategies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.