Improved fair Scheduling Algorithm for Hadoop Clustering

Sneha, Sneha; Sebastian, Shoney

doi:10.13005/ojcst/10.01.26

Cited by 2 publications

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When the dataset size is sufficiently large, the name node processes scan lead to failure, or high over head may be accrued. To overcome these issues of the conventional fair scheduler in Hadoop, the authors in [28] proposed an improved fair scheduling algorithm for clustering user jobs. The advantage of the improved fair scheduling scheme is its efficiency in producing throughput for datasets of variable size; however, the disadvantages are that long jobs can slow the algorithm and cause overloading issues at a node.…”

Section: Problem Statementmentioning

confidence: 99%

A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

et al. 2020

View full text Add to dashboard Cite

The improvement of Hadoop performance has received considerable attention from researchers in cloud computing fields. Most studies have focused on improving the performance of a Hadoop cluster. Notably, various parameters are required to configure Hadoop and must be adjusted to improve performance. This paper proposes a mechanism to improve Hadoop, schedule jobs, and allocate and utilize resources. Specifically, we present an improved ant colony optimization method to schedule jobs according to the job size and the time expected for execution. Priority is given to the job with the minimum data size and minimum response time. The resource usage and running jobs by data node are predicted using an artificial neural network, and job activity and resource usage are monitored using the resource manager. Moreover, we enhance the Hadoop Name node performance by adding an aggregator node to the default HDFS framework architecture. The changes involve four entities: the name node, secondary name node, aggregator nodes, and data nodes, where the aggregator node is responsible for assigning the jobs among the data node, and the Name node keeps tracking only the aggregator nodes. We test the overall scheme among Amazon EC2 and S3, and show the results of throughput and CPU response time for different data sizes. Finally, we show that the proposed approach shows significant improvement compare to native Hadoop and other approaches.

show abstract

Section: Problem Statementmentioning

confidence: 99%

A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

et al. 2020

View full text Add to dashboard Cite

show abstract

SMOSA: Spider monkey optimization‐based scheduling algorithm for heterogeneous Hadoop

Zhang

Guan

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

Hadoop is a typical framework for processing big data. Task scheduling algorithms have a significant impact on the processing performance of Hadoop clusters. Existing scheduling algorithms of Hadoop fail to consider the performance differences between nodes in heterogeneous Hadoop clusters, causing problems such as uneven task allocation and low resource utilization. Aiming to solve this problem, we propose a spider monkey optimization-based scheduling algorithm (SMOSA) for heterogeneous Hadoop. First, the cluster heartbeat mechanism is used to obtain information such as memories and CPUs of nodes to comprehensively consider the actual load capacity of each node. Then, the spider monkey optimization algorithm is adopted to find the optimal mapping relationship between tasks and resources by taking the task completion time as the objective function and updating the position of the spider monkey.Finally, we calculate the remaining rate of node hardware resources, and according to the task type, the node with the higher remaining rate of resource will give priority to the task. Data are compressed for I/O type tasks to reduce disk operations and increase the speed of task execution. Experimental results show that, compared with existing scheduling algorithms, the SMOSA can effectively reduce task execution time and can significantly improve scheduling efficiency and task execution speed especially in heterogeneous Hadoop clusters. For different types of tasks, the execution time can be reduced by up to 19%.

show abstract

Improved fair Scheduling Algorithm for Hadoop Clustering

Cited by 2 publications

References 10 publications

A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

A Multi-Optimization Technique for Improvement of Hadoop Performance with a Dynamic Job Execution Method Based on Artificial Neural Network

SMOSA: Spider monkey optimization‐based scheduling algorithm for heterogeneous Hadoop

Contact Info

Product

Resources

About