2016
DOI: 10.1016/j.asoc.2015.04.039
|View full text |Cite
|
Sign up to set email alerts
|

Scheduling algorithm based on prefetching in MapReduce clusters

Abstract: Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(11 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…As data are transferred in the copy and shuffle stages, they have the most significant impact at execution time. There is an assigned weight for each stage, which is the ratio of the stage's execution time to the total execution time [28,29,32,33,34,35,36]. It is also possible to calculate errors by comparing the estimated weights by a real one [15].…”
Section: Introductionmentioning
confidence: 99%
“…As data are transferred in the copy and shuffle stages, they have the most significant impact at execution time. There is an assigned weight for each stage, which is the ratio of the stage's execution time to the total execution time [28,29,32,33,34,35,36]. It is also possible to calculate errors by comparing the estimated weights by a real one [15].…”
Section: Introductionmentioning
confidence: 99%
“…Job scheduling in the big data platform is crucial to the optimization of the platform performance. In order to improve the efficiency of the job execution and optimize the performance of the platform, the researches [8,9,10] propose data placement strategy and job scheduling algorithm based on the minimum data transmission time to reduce the data transmission time and improve the efficiency of the job execution. However, the revenue is not considered in the algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…A set of jobs E j is found to be executed at T time and the required resources of E j do not conflict with pre-allocated resources (lines 2-8). If E j is not null, the job is selected that makes the waste resource rate minimize and the optimal start time of the job is T (lines [10][11][12][13][14][15][16]. If E j is null, T is set to the start time of the next time period in P R (line 18).…”
Section: Sas Algorithmmentioning
confidence: 99%
“…A research work in [22] proposed HPSO (High-Performance Scheduling Optimizer) that enhances the scheduling algorithm based on pre-fetching of data needed to map tasks. This pre-fetching reduces the time needed to transfer data over the network and hence improves the performance.…”
Section: Related Workmentioning
confidence: 99%