2021
DOI: 10.1109/access.2020.3040719
|View full text |Cite
|
Sign up to set email alerts
|

Scheduling Spark Tasks With Data Skew and Deadline Constraints

Abstract: Data skew has an essential impact on the performance of big data processing. Spark task scheduling with data skew and deadline constraints is considered to minimize the total rental cost in this paper. A modified scheduling architecture is developed in terms of the unique characteristics of the considered problem. A mathematical model is constructed, and a Spark task scheduling algorithm is proposed considering both the data skew and deadline constraints. The algorithm consists of three components: stage seque… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 12 publications
0
1
0
Order By: Relevance
“…For improving performance, data locality is a key factor considered by the task scheduling of Spark stages [10]. The task scheduling determines the executor on which node the task runs and the data locality refers to scheduling task close to data, so that the communication overload can be reduced [19], [15]. In particular, in the map stage, the taskScheduler uses the delay scheduling algorithm [34] that tries to assign the map task to the node which stores the data block, and in the reduce stage, the taskScheduler assigns the reduce task to one of the nodes that holds more intermediate data to the task, thus to minimize the data transfer volume.…”
Section: Introductionmentioning
confidence: 99%
“…For improving performance, data locality is a key factor considered by the task scheduling of Spark stages [10]. The task scheduling determines the executor on which node the task runs and the data locality refers to scheduling task close to data, so that the communication overload can be reduced [19], [15]. In particular, in the map stage, the taskScheduler uses the delay scheduling algorithm [34] that tries to assign the map task to the node which stores the data block, and in the reduce stage, the taskScheduler assigns the reduce task to one of the nodes that holds more intermediate data to the task, thus to minimize the data transfer volume.…”
Section: Introductionmentioning
confidence: 99%