Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data 2015
DOI: 10.1145/2723372.2749437
|View full text |Cite
|
Sign up to set email alerts
|

Cost-based Fault-tolerance for Parallel Data Processing

Abstract: In order to deal with mid-query failures in parallel data engines (PDEs), different fault-tolerance schemes are implemented today: (1) fault-tolerance in parallel databases is typically implemented in a coarse-grained manner by restarting a query completely when a mid-query failure occurs, and (2) modern MapReduce-style PDEs implement a finegrained fault-tolerance scheme, which either materializes intermediate results or implements a lineage model to recover from mid-query failures. However, neither of these s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 16 publications
0
15
0
Order By: Relevance
“…• materi-all the Greenplum system that materializes all candidate operators of SIFT. • simulated-XDB an simulated-XDB system [6] on Greenplum. It regards all operators as checkpointing candidates.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…• materi-all the Greenplum system that materializes all candidate operators of SIFT. • simulated-XDB an simulated-XDB system [6] on Greenplum. It regards all operators as checkpointing candidates.…”
Section: Methodsmentioning
confidence: 99%
“…Typical MPP databases model a query plan as a directed acyclic graph [6,10]. Each vertex in this figure corresponds to an operator, which is regarded as the unit of query processing.…”
Section: Parallel Query Processingmentioning
confidence: 99%
See 2 more Smart Citations
“…In parallel data flow system FTOps [24], it showcases an optimizer that finds the best fault-tolerance strategy for each operator in a query plan. Also, a cost-based recovery [25] is proposed to select which intermediates should be materialized. These works are supplements to ours, but they focus on general workloads rather than iterative graph processing.…”
Section: A Checkpointingmentioning
confidence: 99%