Proceedings of the 2017 Symposium on Cloud Computing 2017
DOI: 10.1145/3127479.3131613
|View full text |Cite
|
Sign up to set email alerts
|

A robust partitioning scheme for ad-hoc query workloads

Abstract: Data partitioning is crucial to improving query performance and several workload-based partitioning techniques have been proposed in database literature. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload a priori. Static workload-based data partitioning techniques are therefore not suitable for such settings. In this paper, we propose Amoeba, a distributed storage system that uses adaptive multi-attribute data partitioning … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 30 publications
(17 citation statements)
references
References 24 publications
0
15
0
Order By: Relevance
“…This is motivated by the fact that modern data analytics utilizes a large number of ad-hoc queries to do exploratory analysis. For example, in the context of building a robust partitioning scheme for ad-hoc query workloads, Shanbhag et al [38] found that after analyzing the first 80% of real-world workload traces the remaining 20% still contained 57% completely new queries.…”
Section: Efficiency Of Interleavingsmentioning
confidence: 99%
“…This is motivated by the fact that modern data analytics utilizes a large number of ad-hoc queries to do exploratory analysis. For example, in the context of building a robust partitioning scheme for ad-hoc query workloads, Shanbhag et al [38] found that after analyzing the first 80% of real-world workload traces the remaining 20% still contained 57% completely new queries.…”
Section: Efficiency Of Interleavingsmentioning
confidence: 99%
“…Some distributed streaming systems use adaptive load-balancing that redistribute the workload based on AQWA's cost model for estimating the workload, e.g., STAR [10], Tornado [16,17], Amoeba [21,22], and PS 2 Stream [11]. However, these techniques are relatively slow when updating the statistics and updating the workload cost model when the statistics change.…”
Section: Related Workmentioning
confidence: 99%
“…SWARM can distribute the workload among all executor machines, if necessary. Amoeba [33,32] is an adaptive data partitioning scheme in relational systems. Amoeba does not consider real-time stream processing.…”
Section: Related Workmentioning
confidence: 99%