On scheduling in map-reduce and flow-shops

Moseley, Benjamin; Dasgupta, Anirban; Kumar, Ravi; Sarlós, Tamás

doi:10.1145/1989493.1989540

Cited by 92 publications

(87 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rather than rely on hardware to deliver highavailability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The problem of map-reduceScheduling [3] by abstracting the above requirements and desiderata inscheduling terms. In particular, we focus on multiple-task multiple-machine two-stage non-migratory scheduling with precedence constraints; these constraints exist between each map task and reducetask for a job.We consider a subset of [4]production workload that consists of MapReduce jobs with nodependencies.…”

Section: Existing Systemmentioning

confidence: 99%

Optimization of Slot and Map Reduce Workload

Aughad¹,

Gaikwad²,

Khanore³

et al. 2016

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

Abstract:The increasing use of internet leads to handle lots of data by internet service providers. MapReduce is one of the goodsolutions for implementing large scale distributed data application. AMapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reducetasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general executionconstraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. Makespanand total completion time are two key performancemetrics T his paper proposes two algorithm for these two key. Our first class of algorithms focuses onthe job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. Our second class ofalgorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload.

show abstract

Section: Existing Systemmentioning

confidence: 99%

Optimization of Slot and Map Reduce Workload

Aughad¹,

Gaikwad²,

Khanore³

et al. 2016

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

show abstract

“…Note that the particular form of the phase overlapping in MapReduce makes data processing different from the traditional multiple stage processing models in manufacturing and communication networks [16,24,38]. In particular, in a typical tandem queueing model [38] (or flowshop model [16,24]) a job must complete processing at one station before moving to the following station. In contrast, in MapReduce a job may have tasks that start the shuffle phase before all the tasks complete the map phase.…”

Section: Summary Of Contributionsmentioning

confidence: 99%

“…To provide context for this result, it is useful to consider the related literature from the flowshop scheduling community, e.g., [10,16,19,24]. In particular, the flowshop literature considers traditional tandem queues (which do not allow overlapping) with all jobs arrive at time zero and tend to focus on a different performance metric: makespan.…”

Section: Offline Schedulingmentioning

confidence: 99%

Joint optimization of overlapping phases in MapReduce

Lin

Zhang

Wierman

et al. 2013

Performance Evaluation

View full text Add to dashboard Cite

MapReduce is a scalable parallel computing framework for big data processing. It exhibits multiple processing phases, and thus an efficient job scheduling mechanism is crucial for ensuring efficient resource utilization. This paper studies the scheduling challenge that results from the overlapping of the "map" and "shuffle" phases in MapReduce. We propose a new, general model for this scheduling problem, and validate this model using cluster experiments. Further, we prove that scheduling to minimize average response time in this model is strongly NP-hard in the offline case and that no online algorithm can be constant-competitive. However, we provide two online algorithms that match the performance of the offline optimal when given a slightly faster service rate (i.e., in the resource augmentation framework). Finally, we validate the algorithms using a workload trace from a Google cluster and show that the algorithms are near optimal in practical settings.

show abstract

“…Hammoud et al propose center-of-gravity reduce task scheduling aiming to lower MapReduce network traffic [25], which model reduce input distribution as mass distribution model, by properly assign reduce tasks to save network cost, so we can call it data locality in reduce phase, which is not a consideration in our study but in future work. Benjamin Moseley et al study the scheduling problem in MapReduce and Flowshops [26], which formalize job scheduling in map-reduce as a novel generalization of the two-stage classical flexible flow shop (FFS) problem: instead of a single task at each stage, a job now consists of a set of tasks per stage. He et al develop a new MapReduce scheduling technique to enhance map task's data locality named as MaBtchmaking [27], which is a very good and efficient scheduling technique, motivated by improving data locality of map tasks on the basis of well known in-used Hadoop schedulers, such as FIFO and Hadoop Fair Scheduler.…”

Section: Related Workmentioning

confidence: 99%

An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

Zhao

Yang

Fan

et al. 2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYScheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.

show abstract

On scheduling in map-reduce and flow-shops

Cited by 92 publications

References 36 publications

Optimization of Slot and Map Reduce Workload

Optimization of Slot and Map Reduce Workload

Joint optimization of overlapping phases in MapReduce

An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

Contact Info

Product

Resources

About