Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems 2014
DOI: 10.1145/2594538.2594558
|View full text |Cite
|
Sign up to set email alerts
|

Skew in parallel query processing

Abstract: We study the problem of computing a conjunctive query q in parallel, using p of servers, on a large database. We consider algorithms with one round of communication, and study the complexity of the communication. We are especially interested in the case where the data is skewed, which is a major challenge for scalable parallel query processing. We establish a tight connection between the fractional edge packing of the query and the amount of communication in two cases. First, in the case when the only statisti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
107
1

Year Published

2016
2016
2019
2019

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 96 publications
(118 citation statements)
references
References 22 publications
(43 reference statements)
0
107
1
Order By: Relevance
“…For these classes we conclude that it is NP-complete to decide, whether for the family F of policies associated with some given CQ Q, a CQ Q ′ is parallel-correct for all distributions from F . In Section 5.2, we will see that this, in particular, holds for the families of distribution policies related to the practical Hypercube algorithm, that was previously investigated in several works [4,6,7,10,11]. In fact, we even show that this holds for a more general class of distribution policies specified in a declarative formalism.…”
mentioning
confidence: 54%
See 3 more Smart Citations
“…For these classes we conclude that it is NP-complete to decide, whether for the family F of policies associated with some given CQ Q, a CQ Q ′ is parallel-correct for all distributions from F . In Section 5.2, we will see that this, in particular, holds for the families of distribution policies related to the practical Hypercube algorithm, that was previously investigated in several works [4,6,7,10,11]. In fact, we even show that this holds for a more general class of distribution policies specified in a declarative formalism.…”
mentioning
confidence: 54%
“…, k}, B i is bucket i (x i , z i ), if 6 A hash function is a partial mapping from dom to a finite set whose elements are sometimes referred to as buckets. 7 For the purpose of specification it is irrelevant whether these predicates are materialized in the database.…”
Section: Hypercube Distribution Policiesmentioning
confidence: 99%
See 2 more Smart Citations
“…In this model, evaluation of conjunctive queries [15,7] as well as skyline queries [2] have been considered. Recently, Beame et al [8] proved a matching upper and lower bound for the amount of communication needed to compute a full conjunctive query without self-joins in one communication round. The upper bound is provided by a randomized algorithm called Hypercube which dates back to Ganguli et al [13] and was described by Afrati and Ullman [1] in the context of MapReduce algorithms.…”
Section: Related Workmentioning
confidence: 99%