HFSP: Size-based scheduling for Hadoop

Pastorelli, Mario; Barbuzzi, Antonio; Carra, Damiano; Dell’Amico, Matteo; Michiardi, Pietro

doi:10.1109/bigdata.2013.6691554

Cited by 29 publications

(18 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although these policies have been analyzed for distributedserver systems [14], [15], supercomputing workloads [25], and cloud compute-intensive workloads [10], a realistic investigation of such policies in datacenters for MapReduce workloads is currently missing. In particular, size-based scheduling has been employed in Hadoop [23] with adaptations of two policies: Shortest-Remaining-Processing-Time (SRPT) and Fair-SojournProtocol (FSP). However, these approaches have rather limited applicability in large-scale datacenters as they require either accurate estimations of job sizes [22] or periodic simulations of queued jobs in a virtually fair system [11], [23].…”

Section: E Improvements From Tyrexmentioning

confidence: 99%

“…In particular, size-based scheduling has been employed in Hadoop [23] with adaptations of two policies: Shortest-Remaining-Processing-Time (SRPT) and Fair-SojournProtocol (FSP). However, these approaches have rather limited applicability in large-scale datacenters as they require either accurate estimations of job sizes [22] or periodic simulations of queued jobs in a virtually fair system [11], [23]. The main idea behind FSP is to extend the SRPT policy with a job aging function which virtually decreases the sizes of the waiting jobs, thus avoiding starvation of the large jobs.…”

Section: E Improvements From Tyrexmentioning

confidence: 99%

See 1 more Smart Citation

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks

Ghit

Epema

2016

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

View full text Add to dashboard Cite

Abstract-Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With such workloads, short jobs may experience slowdowns that are an order of magnitude larger than large jobs do, while the users may expect slowdowns that are more in proportion with the job sizes. To address this problem of large job slowdown variability in MapReduce frameworks, we design a scheduling system called TYREX that is inspired by the well-known TAGS task assignment policy in distributed-server systems. In particular, TYREX partitions the resources of a MapReduce framework, allowing any job running in any partition to read data stored on any machine, imposes runtime limits in the partitions, and successively executes parts of jobs in a work-conserving way in these partitions until they can run to completion. We develop a statistical model for dynamically setting the runtime limits that achieves nearoptimal job slowdown performance, and we empirically evaluate TYREX on a cluster system with workloads consisting of both synthetic and real-world benchmarks. We find that TYREX cuts in half the job slowdown variability while preserving the median job slowdown when compared to state-of-the-art MapReduce schedulers such as FIFO and FAIR. Furthermore, TYREX reduces the job slowdown at the 95 th percentile by more than 50% when compared to FIFO and by 20-40% when compared to FAIR.

show abstract

Section: E Improvements From Tyrexmentioning

confidence: 99%

Section: E Improvements From Tyrexmentioning

confidence: 99%

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks

Ghit

Epema

2016

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

View full text Add to dashboard Cite

show abstract

“…Robust approaches to deal with uncertainty are widely used on MapReduce systems [13], [20], in Hadoop [25], [22], on databases [15] and on web servers [3]. The HSFS and FLEX schedulers provide robustness in scheduling against uncertain job size [25], [17]. Cannon and Jeannot [2] analyzed the correlation between various metrics used to measure robustness and provided scheduling heuristics that optimizes both makespan and robustness for scheduling task graph on heterogeneous system.…”

Section: Related Workmentioning

confidence: 99%

Replicated Data Placement for Uncertain Scheduling

Chaubey

Saule

2015

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

View full text Add to dashboard Cite

Scheduling theory is a common tool to analyze the performance of parallel and distributed computing systems, such as their load balance. How to distribute the input data to be able to execute a set of tasks in a minimum amount of time can be modeled as a scheduling problem. Often these models assume that the computation time required for each task is known accurately. However in many practical case, only approximate values are available at the time of scheduling.In this paper, we investigate how replicating the data required by the tasks can help coping with the inaccuracies of the processing times. In particular, we investigate the problem of scheduling independent tasks to optimize the makespan on a parallel system where the processing times of the tasks are only known up to a multiplicative factor. The problem is decomposed in two phases: a first offline phase where the data of the tasks are placed and a second online phase where the tasks are actually scheduled.For this problem we investigate three different strategies, each allowing a different degree of replication of tasks: a) No Replication b) Replication everywhere and c) Replication in groups. We propose approximation algorithms and theoretical lower bound on achievable approximation ratios. This allows us to study the tradeoff between the number of replication and the guarantee on the makespan.

show abstract

“…Does not consider user-specified goal. [20] HFSP Avoids job starvation, guarantees short response time.…”

Section: Heterogeneous Mapreduce Scheduling Techniquesmentioning

confidence: 99%

Analysis of MapReduce scheduling and its improvements in cloud environment

D’Souza

Chandrasekaran

2015

2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES)

View full text Add to dashboard Cite

MapReduce has become a prominent Parallel processing model used for analysing large scale data. MapReduce applications are increasingly being deployed in the cloud along with other applications sharing the same physical resources. In this scenario, efficient scheduling of MapReduce applications is of utmost importance. Also, MapReduce has to consider various other parameters like energy efficiency and meeting SLA goals besides achieving performance when executing jobs in cloud environments. In this work, we have classified MapReduce Scheduling as Cluster based Scheduling and Objective based Scheduling. We then summarize and analyse the different class of schedulers highlighting the strong points and limitations of each of the scheduling approaches. The Adaptive scheduling techniques provide dynamic resource management and meet performance goals. The Energy efficient scheduling techniques aim to cut data centre costs by using different approaches. Finally, we discuss the current challenges and future work.

show abstract

HFSP: Size-based scheduling for Hadoop

Cited by 29 publications

References 27 publications

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks

Replicated Data Placement for Uncertain Scheduling

Analysis of MapReduce scheduling and its improvements in cloud environment

Contact Info

Product

Resources

About