Compressing Very Large Database Workloads for Continuous Online Index Selection

Kołaczkowski, Piotr

doi:10.1007/978-3-540-85654-2_71

Cited by 6 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the distance function strictly focuses on index selection, the entire workload must be known in advance, and sampling is performed instead of classification. Though stream-oriented, the approach in [6] does not guarantee consistency of classification results or a class. Data streams clustering approaches like [7] often employ on-line versions of the k-Means algorithm.…”

Section: Problem Descriptionmentioning

confidence: 97%

Consistent on-line classification of dbs workload events

Holze

Gaidies

Ritter

2009

Proceedings of the 18th ACM Conference on Information and Knowledge Management

View full text Add to dashboard Cite

An important goal of self-managing databases is the autonomic adaptation of the database configuration to evolving workloads. However, the diversity of SQL statements in real-world workloads typically causes the required analysis overhead to be prohibitive for a continuous workload analysis. The workload classification presented in this paper reduces the workload analysis overhead by grouping similar workload events into classes. Our approach employs clustering techniques based upon a general distance function for DBS workload events. To be applicable for a continuous workload analysis, our workload classification specifically addresses a stream-based, lightweight operation, a controllable loss of quality, and self-management.

show abstract

Section: Problem Descriptionmentioning

confidence: 97%

Consistent on-line classification of dbs workload events

Holze

Gaidies

Ritter

2009

Proceedings of the 18th ACM Conference on Information and Knowledge Management

View full text Add to dashboard Cite

show abstract

“…Offline tasks are those that do not require or do not allow processing each query separately, and can be implemented as typical batch jobs. For example, query clustering is important for workload summarization [16], but does not require real-time labeling of individual queries.…”

Section: System Architecturementioning

confidence: 99%

“…Workload summarization for index recommendation: The goal [3,16] is to find a representative sample of the workload as input to further database administration, tuning, and testing tasks [3,33]. In particular, workload summarization aids index recommendation, since the recommendation process is typically quadratic in the size of the workload [3].…”

Section: Applicationsmentioning

confidence: 99%

Database-Agnostic Workload Management

Jain,

Yan,

Cruane

et al. 2018

Preprint

View full text Add to dashboard Cite

We present a system to support generalized SQL workload analysis and management for multi-tenant and multi-database platforms. Workload analysis applications are becoming more sophisticated to support database administration, model user behavior, audit security, and route queries, but the methods rely on specialized feature engineering, and therefore must be carefully implemented and reimplemented for each SQL dialect, database system, and application. Meanwhile, the size and complexity of workloads are increasing as systems centralize in the cloud. We model workload analysis and management tasks as variations on query labeling, and propose a system design that can support general query labeling routines across multiple applications and database backends. The design relies on the use of learned vector embeddings for SQL queries as a replacement for application-specific syntactic features, reducing custom code and allowing the use of off-the-shelf machine learning algorithms for labeling. The key hypothesis, for which we provide evidence in this paper, is that these learned features can outperform conventional feature engineering on representative machine learning tasks. We present the design of a database-agnostic workload management and analytics service, describe potential applications, and show that separating workload representation from labeling tasks affords new capabilities and can outperform existing solutions for representative tasks, including workload sampling for index recommendation and user labeling for security audits.

show abstract

“…Because it took too long to estimate the cost of such a large workload Fig. 3 Final workload cost as a function of number of iterations for MG database for cold start and then to automatically select indexes, both workloads were compressed by a method presented in [16]. The final numbers of queries after compression were 289 for MG, and 62 for WA.…”

Section: Complex Workloadsmentioning

confidence: 99%

Automatic Index Selection in RDBMS by Exploring Query Execution Plan Space

Kołaczkowski

Rybiński

2009

Advances in Data Management

Self Cite

View full text Add to dashboard Cite

A novel approach to solving Index Selection Problem (ISP) is presented. In contrast to other known ISP approaches, our method searches the space of possible query execution plans, instead of searching the space of index configurations. An evolutionary algorithm is used for searching. The solution is obtained indirectly as the set of indexes used by the best query execution plans. The method has important features over other known algorithms: (1) it converges to the optimal solution, unlike greedy heuristics, which for performance reasons tend to reduce the space of candidate solutions, possibly discarding optimal solutions; (2) though the search space is huge and grows exponentially with the size of the input workload, searching the space of the query plans allows to direct more computational power to the most costly plans, thus yielding very fast convergence to "good enough" solutions; and (3) the costly reoptimization of the workload is not needed for calculating the objective function, so several thousands of candidates can be checked in a second. The algorithm was tested for large synthetic and real-world SQL workloads to evaluate the performace and scalability.

show abstract

Compressing Very Large Database Workloads for Continuous Online Index Selection

Cited by 6 publications

References 10 publications

Consistent on-line classification of dbs workload events

Consistent on-line classification of dbs workload events

Database-Agnostic Workload Management

Automatic Index Selection in RDBMS by Exploring Query Execution Plan Space

Contact Info

Product

Resources

About