2015 IEEE International Conference on Big Data (Big Data) 2015
DOI: 10.1109/bigdata.2015.7363770
|View full text |Cite
|
Sign up to set email alerts
|

Chronos: Failure-aware scheduling in shared Hadoop clusters

Abstract: Hadoop emerged as the de facto state-of-the-art system for MapReduce-based data analytics. The reliability of Hadoop systems depends in part on how well they handle failures. Currently, Hadoop handles machine failures by re-executing all the tasks of the failed machines (i.e., executing recovery tasks). Unfortunately, this elegant solution is entirely entrusted to the core of Hadoop and hidden from Hadoop schedulers. The unawareness of failures therefore may prevent Hadoop schedulers from operating correctly t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
6
0
1

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 11 publications
(5 reference statements)
2
6
0
1
Order By: Relevance
“…Relationship to previous work. This paper extends our previous contribution introduced in a previous paper [8] by providing more detailed descriptions and more thorough experiments. In particular, we have substantially extended two sections: While Section 2 gives an overview of MapReduce, Hadoop, scheduling in Hadoop and its fault-tolerance mechanism, Section 8 discusses related works on scheduling, failure recovery, task preemption and data-aware task scheduling in MapReduce.…”
Section: Introductionsupporting
confidence: 58%
“…Relationship to previous work. This paper extends our previous contribution introduced in a previous paper [8] by providing more detailed descriptions and more thorough experiments. In particular, we have substantially extended two sections: While Section 2 gives an overview of MapReduce, Hadoop, scheduling in Hadoop and its fault-tolerance mechanism, Section 8 discusses related works on scheduling, failure recovery, task preemption and data-aware task scheduling in MapReduce.…”
Section: Introductionsupporting
confidence: 58%
“…Almost all the surveyed schedulers in this paper have advantages in terms of fairness and completion time compared to the default Hadoop scheduling policy. Interestingly, FRESH [10] and COSHH-hybrid [8] have the potential to become a native part of Hadoop, replacing FIFO and Fair sharing, as well as Chronos [11] which holds a lot of promise while still needing further testing. When it comes to large enterprise environments, LsPS [15] represents a promising approach as it delivered unprecedented performance and user control in a scalable and dynamic cluster, vastly improving upon default schedulers.…”
Section: Discussionmentioning
confidence: 99%
“…2) Chronos: Instead of creating a totally new "default" scheduler from scratch, a different approach for enhancing the native Hadoop scheduler is proposed by the authors of Chronos: Failure-Aware Scheduling in Shared Hadoop Clusters [11]. The authors argue that the performance of Hadoop systems in part depends on how failures are handled.…”
Section: ) Lsps (Leveraging Size Patterns Scheduler)mentioning
confidence: 99%
“…In case of hardware or software failures, the affected hardware that might be prone to failure can be avoided in scheduling to avoid any further failures. Chronos (Yildiz et al 2015) is a Hadoop-based failure-aware scheduler that uses pre-emption on failed jobs. Then it recovers from failure by reallocating the failed jobs with pre-empted resources to meet the SLA objectives.…”
Section: Failure/anomaly Detection and Mitigationmentioning
confidence: 99%