Gaetano Geck scite author profile

¹

,

²

,

³

et al. 2015

A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We study the complexity of parallel-correctness for conjunctive queries as well as transferability of parallelcorrectness between queries. We also investigate the complexity of transferability for certain families of distribution policies, including, for instance, the Hypercube distribution.

Parallel-Correctness and Transferability for Conjunctive Queries

¹

,

²

,

³

et al. 2017

J. ACM

A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We study the complexity of parallel-correctness for conjunctive queries as well as transferability of parallelcorrectness between queries. We also investigate the complexity of transferability for certain families of distribution policies, including, for instance, the Hypercube distribution.

Data partitioning for single-round multi-join evaluation in massively parallel systems

¹

,

²

,

³

et al. 2016

A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem.Motivated by scenarios for workload optimization, we further consider the problem of parallelcorrectness transfer from a query Q to a query Q , that is, whether Q is parallel-correct for all distribution policies for which Q is parallel-correct. In this case, Q can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions. * The original version of this article was published in PODS 2015 as [2].

Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation

¹

,

ACM Trans. Comput. Logic

²

,

Neven

³

et al. 2019

Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communicationfree way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query, also referred to as parallel-correctness. This paper extends the study of the complexity of parallel-correctness and its constituents, parallel-soundness and parallelcompleteness, to unions of conjunctive queries with and without negation. As a by-product it is shown that the containment problem for conjunctive queries with negation is coNEXPTIME-complete.

Reasoning on data partitioning for single-round multi-join evaluation in massively parallel systems

¹

,

²

,

³

et al. 2017

Evaluating queries over massive amounts of data is a major challenge in the big data era. Modern massively parallel systems, like e.g. Spark, organize query answering as a sequence of rounds each consisting of a distinct communication phase followed by a computation phase. The communication phase redistributes data over the available servers, while in the subsequent computation phase each server performs the actual computation on its local data. There is a growing interest in single-round algorithms for evaluating multiway joins where data is first reshuffled over the servers and then evaluated in a parallel but communication-free way. As the amount of communication induced by a reshuffling of the data is a dominating cost in such systems, we introduce a framework for reasoning about data partitioning to detect when we can avoid the data reshuffling step. Specifically, we formalize the decision problems parallel-correctness and transfer of parallel-correctness, provide semantical characterizations, and obtain tight complexity bounds.