An architecture for recycling intermediates in a column-store

Ivanova, Milena; Kersten, Martin; Nes, Niels; Gonçalves, Romulo

doi:10.1145/1862919.1862921

Cited by 40 publications

(39 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, the HashStash optimizer supports four different cases for reuse-aware operators: exact-, subsuming-, partial -, and overlapping-reuse. This is different from the existing approaches in [15,25,18], which only support the exact-reuse, and the subsuming-reuse cases. The exact case enables a join or aggregation operator to reuse a cached hash table which contains exactly the tuples required by the query.…”

Section: Reuse-aware Query Optimizermentioning

confidence: 68%

“…Reuse of Intermediates: In order to better support user sessions in DBMSs, various techniques have been developed in the past to reuse intermediates [25,15,18]. All these techniques typically require that results of individual operators are materialized into temporary tables.…”

Section: Related Workmentioning

confidence: 99%

“…To that end, their cost models do not take the peculiarities of hash tables as well as hardware-dependent parameters such CPU caches into account. In [15], the authors integrate reuse techniques into MonetDB, that implements an operator-at-a-time execution model which anyway relies on full materialization of all intermediate results and thus does not need to tackle the issues that result form additional materialization cost as in pipelined databases. [18] extends the work of [15] for pipelined databases and integrates the ideas into Vectorwise.…”

Section: Related Workmentioning

confidence: 99%

“…Motivation: Reusing intermediates in databases to speedup analytical query processing has been studied in the past [15,25,18,13,8,20,28]. These solutions typically require intermediate results of individual operators be materialized into temporary tables to be considered for reuse in subsequent queries.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Revisiting Reuse in Main Memory Database Systems

Dursun

Binnig

Çetintemel

et al. 2017

Proceedings of the 2017 ACM International Conference on Management of Data

View full text Add to dashboard Cite

Reusing intermediates in databases to speed-up analytical query processing has been studied in the past. Existing solutions typically require intermediate results of individual operators to be materialized into temporary tables to be considered for reuse in subsequent queries. However, these approaches are fundamentally ill-suited for use in modern main memory databases. The reason is that modern main memory DBMSs are typically limited by the bandwidth of the memory bus, thus query execution is heavily optimized to keep tuples in the CPU caches and registers. To that end, adding additional materialization operations into a query plan not only add additional traffic to the memory bus but more importantly prevent the important cache-and registerlocality opportunities resulting in high performance penalties.In this paper we study a novel reuse model for intermediates, which caches internal physical data structures materialized during query processing (due to pipeline breakers) and externalizes them so that they become reusable for upcoming operations. We focus on hash tables, the most commonly used internal data structure in main memory databases to perform join and aggregation operations. As queries arrive, our reuse-aware optimizer reasons about the reuse opportunities for hash tables, employing cost models that take into account hash table statistics together with the CPU and data movement costs within the cache hierarchy. Experimental results, based on our HashStash prototype demonstrate performance gains of 2× for typical analytical workloads with no additional overhead for materializing intermediates.

show abstract

Section: Reuse-aware Query Optimizermentioning

confidence: 68%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Revisiting Reuse in Main Memory Database Systems

Dursun

Binnig

Çetintemel

et al. 2017

Proceedings of the 2017 ACM International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…Caching to recycle work. Finally, we consider previous works [6,14,18,24,31] that address the problem of reusing intermediate query results, which is cast as a general caching problem. Our work substantially differs from those approaches in that they mainly focus on cache eviction, where past queries are used to decide what to keep in memory, in an on-line fashion.…”

Section: Related Workmentioning

confidence: 99%

Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks

2020

View full text Add to dashboard Cite

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in redundant and wasteful processing, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub)expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the multiple-choice knapsack problem. Extensive experiments on a prototype implementation of our system show significant benefits of worksharing for both TPC-DS workloads and detailed micro-benchmarks.

show abstract