An architecture for recycling intermediates in a column-store

Ivanova, Milena; Kersten, Martin; Nes, Niels; Gonçalves, Romulo

doi:10.1145/1559845.1559879

Cited by 58 publications

(40 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such techniques help avoiding recomputing identical queries but cannot be predicted by the optimizer nor used in a canonical way. Another option, implemented in MonetDB, is to recycle intermediate results and share them with subsequent queries [16]. Recent research on work sharing, however, offers ad-hoc "collaboration" of the concurrent queries minimizing the overall work done and the number of data accesses.…”

Section: Related Workmentioning

confidence: 99%

Scaling up analytical queries with column-stores

Alagiannis

Athanassoulis

Ailamaki

2013

Proceedings of the Sixth International Workshop on Testing Database Systems

View full text Add to dashboard Cite

As data analytics is used by an increasing number of applications, data analytics engines are required to execute workloads with increased concurrency, i.e., an increasing number of clients submitting queries. Data management systems designed for data analytics -a market dominated by column-stores -however, were initially optimized for single query execution, minimizing its response time. Hence, they do not treat concurrency as a first class citizen.In this paper, we experiment with one open-source and two commercial column-stores using the TPC-H and SSB benchmarks in a setup with an increasing number of concurrent clients submitting queries, focusing on whether the tested systems can scale up in a single node instance. The tested systems for in-memory workloads scale up, to some degree; however, when the server is saturated they fail to fully exploit the available parallelism. Further, we highlight the unpredictable response times for high concurrency.

show abstract

Section: Related Workmentioning

confidence: 99%

Scaling up analytical queries with column-stores

Alagiannis

Athanassoulis

Ailamaki

2013

Proceedings of the Sixth International Workshop on Testing Database Systems

View full text Add to dashboard Cite

show abstract

“…The intermediate results of the subqueries are shipped to the master server (10). Finally, it wraps up the query execution and sends the results to the user (11). Recycler.…”

Section: Architecturementioning

confidence: 99%

“…Recycler. A crucial component of the Octopus architecture is the MonetDB Recycler [11]. It is an extension of MonetDB execution model with capability to store and reuse intermediate results in query loads with overlapping computations.…”

Section: Architecturementioning

confidence: 99%

“…If the set intersection of the subplans of the arguments is not empty, meaning that they all belong to at least one common subplan, the instruction is assigned to the same subplan(s) (lines [11][12]. Following this general rule, the data access instructions to small query tables are replicated to all subplans.…”

Section: Distributed Plan Generationmentioning

confidence: 99%

“…It creates distributed execution plans and delegates subquery execution to available worker nodes, referred to as octopus tentacles. Data are shipped just-in-time (JIT) to the workers and kept in their caches using the recycler mechanism [11]. The run-time scheduler allocates subqueries on tentacles based on up-to-date status information.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Just-In-Time Data Distribution for Analytical Query Processing

Ivanova

Kersten

Groffen

2012

Advances in Databases and Information Systems

View full text Add to dashboard Cite

Abstract. Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings.

show abstract

A Chunk-Based Hash Table Caching Method for In-Memory Hash Joins

Xing

Zhou

et al. 2020

Web Information Systems Engineering – WISE 2020

View full text Add to dashboard Cite

An architecture for recycling intermediates in a column-store

Cited by 58 publications

References 21 publications

Scaling up analytical queries with column-stores

Scaling up analytical queries with column-stores

Just-In-Time Data Distribution for Analytical Query Processing

A Chunk-Based Hash Table Caching Method for In-Memory Hash Joins

Contact Info

Product

Resources

About