Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures 2011
DOI: 10.1145/1989493.1989540
|View full text |Cite
|
Sign up to set email alerts
|

On scheduling in map-reduce and flow-shops

Abstract: The map-reduce paradigm is now standard in industry and academia for processing large-scale data. In this work, we formalize job scheduling in map-reduce as a novel generalization of the two-stage classical flexible flow shop (FFS) problem: instead of a single task at each stage, a job now consists of a set of tasks per stage. For this generalization, we consider the problem of minimizing the total flowtime and give an efficient 12-approximation in the offline setting and an online (1 + Motivated by map-reduce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
87
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 92 publications
(87 citation statements)
references
References 36 publications
0
87
0
Order By: Relevance
“…Rather than rely on hardware to deliver highavailability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The problem of map-reduceScheduling [3] by abstracting the above requirements and desiderata inscheduling terms. In particular, we focus on multiple-task multiple-machine two-stage non-migratory scheduling with precedence constraints; these constraints exist between each map task and reducetask for a job.We consider a subset of [4]production workload that consists of MapReduce jobs with nodependencies.…”
Section: Existing Systemmentioning
confidence: 99%
“…Rather than rely on hardware to deliver highavailability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The problem of map-reduceScheduling [3] by abstracting the above requirements and desiderata inscheduling terms. In particular, we focus on multiple-task multiple-machine two-stage non-migratory scheduling with precedence constraints; these constraints exist between each map task and reducetask for a job.We consider a subset of [4]production workload that consists of MapReduce jobs with nodependencies.…”
Section: Existing Systemmentioning
confidence: 99%
“…Note that the particular form of the phase overlapping in MapReduce makes data processing different from the traditional multiple stage processing models in manufacturing and communication networks [16,24,38]. In particular, in a typical tandem queueing model [38] (or flowshop model [16,24]) a job must complete processing at one station before moving to the following station. In contrast, in MapReduce a job may have tasks that start the shuffle phase before all the tasks complete the map phase.…”
Section: Summary Of Contributionsmentioning
confidence: 99%
“…To provide context for this result, it is useful to consider the related literature from the flowshop scheduling community, e.g., [10,16,19,24]. In particular, the flowshop literature considers traditional tandem queues (which do not allow overlapping) with all jobs arrive at time zero and tend to focus on a different performance metric: makespan.…”
Section: Offline Schedulingmentioning
confidence: 99%
“…Hammoud et al propose center-of-gravity reduce task scheduling aiming to lower MapReduce network traffic [25], which model reduce input distribution as mass distribution model, by properly assign reduce tasks to save network cost, so we can call it data locality in reduce phase, which is not a consideration in our study but in future work. Benjamin Moseley et al study the scheduling problem in MapReduce and Flowshops [26], which formalize job scheduling in map-reduce as a novel generalization of the two-stage classical flexible flow shop (FFS) problem: instead of a single task at each stage, a job now consists of a set of tasks per stage. He et al develop a new MapReduce scheduling technique to enhance map task's data locality named as MaBtchmaking [27], which is a very good and efficient scheduling technique, motivated by improving data locality of map tasks on the basis of well known in-used Hadoop schedulers, such as FIFO and Hadoop Fair Scheduler.…”
Section: Related Workmentioning
confidence: 99%