HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework

Gandomi, Abolfazl; Reshadi, Midia; Movaghar, Ali; Khademzadeh, Ahmad

doi:10.1186/s40537-019-0253-9

Cited by 26 publications

(22 citation statements)

References 24 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These job schedulers can be classified as to whether they consider data locality at the Map task level, data locality at the Reduce task level, or data locality at the job level (both Map and Reduce tasks) [8]. For instance, Hybrid scheduling MapReduce priority (HybSMRP) [9] was presented by Ghandomi et al in 2019 and is a hybrid scheduler that combines dynamic job priority and data localization. It determines job priority based on three parameters: running time, job size, and waiting time.…”

Section: Related Workmentioning

confidence: 99%

CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning

et al. 2022

Self Cite

View full text Add to dashboard Cite

Scheduling of MapReduce jobs is an integral part of Hadoop and effective job scheduling has a direct impact on Hadoop performance. Data locality is one of the most important factors to be considered in order to improve efficiency, as it affects data transmission through the system. A number of researchers have suggested approaches for improving data locality, but few have considered cache locality. In this paper, we present a state-of-the-art job scheduler, CLQLMRS (Cache Locality with Q-Learning in MapReduce Scheduler) for improving both data locality and cache locality using reinforcement learning. The proposed algorithm is evaluated by various experiments in a heterogeneous environment. Experimental results show significantly decreased execution time compared with FIFO, Delay, and the Adaptive Cache Local scheduler.

show abstract

Section: Related Workmentioning

confidence: 99%

CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…The dynamics and heterogeneity of computing nodes in larger networks were not discussed by their paper. Gandomi et al [19] combined two existing techniques: dynamic job prioritizing and data localization to form a hybrid scheduling algorithm, aiming at increasing data locality rate, and decreasing completion time. The proposed schedulers were evaluated on a Hadoop cluster of one master node and 20 slave nodes, which had homogeneous architecture with stable and fast network connections.…”

Section: Related Workmentioning

confidence: 99%

The Performance Optimization of Big Data Processing by Adaptive MapReduce Workflow

Tang

2022

IEEE Access

View full text Add to dashboard Cite

The discussion context of this paper is big data processing of MapReduce by volunteer computing in dynamic and opportunistic environments. This paper conducts a series of simulations to explore the relationship between the overall performance of volunteer overlays responding to different workload of big data problems. The discovery from the simulations includes some optimization points in overlay size, going over which by adding more volunteers brings little benefit for the overall performance. Based on the discovery of optimization points, this paper proposes a bootstrapping protocol, which can adapt volunteers into variable-sizes overlays, enabling workflow of single-round MapReduce or multipleround MapReduce, and a single or multiple overlays for each round. The variable overlays aim to create adaptive workflow during MapReduce processing, so that the optimization points can be caught. As another benefit, the unnecessary computing-capacities can be released during computing when the optimization points are reached. The case study shows a few optimization workflows that are formed by the proposed bootstrapping protocol to process the big data cases. The workflows lead to the optimization points and dynamically balance the workload at the same time. The experiment results have demonstrated that the optimization strategies have either achieved 36% or 71% higher performance than the plain MapReduce workflow and minimized the use of computing resources by releasing 12.5% to 75% volunteers during computing, where the original plain MapReduce must hold all the volunteers to the end of computing. The extensibility of the simulation parameterization to more diverse real-world applications have been clarified.

show abstract

“…A hybrid scheduling algorithm, HybSMRP, is proposed in [7] to improve data local execution and job latency. Authors proposed two techniques to achieve their objectives: dynamic priority and localization ID.…”

Section: Literature Surveymentioning

confidence: 99%

Handling Non-Local Executions to Improve MapReduce Performance Using Ant Colony Optimization

et al. 2021

View full text Add to dashboard Cite

Improving the performance of the MapReduce scheduler is a primary objective, especially in a heterogeneous virtual cloud environment. A map task is assigned with an input split(IS) which consists of one or more data blocks. When a map task is assigned to more than one data block, non-local execution is performed. In classical MapReduce scheduling schemes, data blocks are copied over the network to a node in where the map task is running. This increases job latency and consumes more network bandwidth within and between racks in the cloud data-center. Considering this situation, we propose a methodology "improving data locality using ant colony optimization (IDLACO)" to minimize the number of non-local executions and virtual network bandwidth consumption when IS are assigned to more than one data block. First IDLACO determines a list of an optimal number of data blocks for each map task of a job to perform a non-local execution reducing the job latency and virtual network consumption. Then, the target virtual machine to execute the map task is determined on the basis of its heterogeneous performance. Finally, if a set of data blocks is transferred to the same node for repeated job execution, it is decided to temporarily cache those data block in the target virtual machine. The performance of IDLACO is analysed and compared with fair scheduler and Holistic scheduler based on the parameters, such as the number of non-local executions, average map task latency, job latency, and amount of bandwidth consumed for a MapReduce job. Results show that our proposed IDLACO significantly outperforms the classical fair scheduler and Holistic scheduler.

show abstract

HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework

Cited by 26 publications

References 24 publications

CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning

CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning

The Performance Optimization of Big Data Processing by Adaptive MapReduce Workflow

Handling Non-Local Executions to Improve MapReduce Performance Using Ant Colony Optimization

Contact Info

Product

Resources

About