Integrated Data, Task and Resource Management to Speed Up Processing Small Files in Hadoop Cluster

doi:10.22266/ijies2024.0430.46

Cited by 2 publications

(2 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike traditional methods, Hadoop allows for the flexible movement of computation, primarily MapReduce jobs, to the location of the data, managed by a Hadoop Distributed File System (HDFS). Consequently, efficient data placement within compute nodes becomes essential for effective big data processing [5]. Hadoop's default approach to data locality relies heavily on the physical proximity of data to computation nodes, which may not always guarantee optimal performance.…”

Section: Introductionmentioning

confidence: 99%

Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters

Qureshi

2024

Electronics

View full text Add to dashboard Cite

Efficient resource allocation is crucial in clusters with frugal Single-Board Computers (SBCs) possessing limited computational resources. These clusters are increasingly being deployed in edge computing environments in resource-constrained settings where energy efficiency and cost-effectiveness are paramount. A major challenge in Hadoop scheduling is load balancing, as frugal nodes within the cluster can become overwhelmed, resulting in degraded performance and frequent occurrences of out-of-memory errors, ultimately leading to job failures. In this study, we introduce an Adaptive Multi-criteria Selection for Efficient Resource Allocation (AMS-ERA) in Frugal Heterogeneous Hadoop Clusters. Our criterion considers CPU, memory, and disk requirements for jobs and aligns the requirements with available resources in the cluster for optimal resource allocation. To validate our approach, we deploy a heterogeneous SBC-based cluster consisting of 11 SBC nodes and conduct several experiments to evaluate the performance using Hadoop wordcount and terasort benchmark for various workload settings. The results are compared to the Hadoop-Fair, FOG, and IDaPS scheduling strategies. Our results demonstrate a significant improvement in performance with the proposed AMS-ERA, reducing execution time by 27.2%, 17.4%, and 7.6%, respectively, using terasort and wordcount benchmarks.

show abstract

Section: Introductionmentioning

confidence: 99%

Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters

Qureshi

2024

Electronics

View full text Add to dashboard Cite

show abstract

“…Unlike traditional methods, Hadoop allows for flexible movement of computation primarily MapReduce jobs, to the location of the data, managed by Hadoop Distributed File System (HDFS). Consequently, efficient data placement within compute nodes becomes essential for effective Big Data processing [6]. Hadoop's default approach to data locality relies heavily on the physical proximity of data to computation nodes, which may not always guarantee optimal performance.…”

Section: Introductionmentioning

confidence: 99%

Towards Improving YARN performance for Frugal Heterogeneous SBC-based Edge Clusters

Qureshi

2024

Preprint

View full text Add to dashboard Cite

Efficient resource allocation is crucial in clusters with frugal Single-Board Computers (SBCs) possessing limited computational resources. These clusters are increasingly being deployed in edge computing environments in resource-constrained settings where energy efficiency and cost-effectiveness are paramount. A major challenge in Hadoop YARN scheduling is load-balancing, as frugal nodes within the cluster can become overwhelmed, resulting in degraded performance and frequent occurrences of out-of-memory errors, ultimately leading to job failures. In this study, we introduce an Adaptive Multi-criteria Selection for Efficient Resource Allocation (AMS-ERA) in Frugal Heterogeneous Hadoop Clusters. Our criterion considers CPU, memory and disk requirements for jobs and aligns the requirements with available resources in the cluster for optimal resource allocation. To validate our approach, we deploy a heterogeneous SBC-based cluster consisting of 11 SBC nodes and conduct several experiments to evaluate the performance using Hadoop wordcount and terasort benchmark for various workload settings. The results are compared to the Hadoop-Fair, FOG and IDaPS scheduling strategies. Our results demonstrate a significant improvement in performance with the proposed AMS-ERA, reducing execution time by 27.2%, 17.4% and 7.6% respectively using terasort and wordcount benchmarks.

show abstract

Integrated Data, Task and Resource Management to Speed Up Processing Small Files in Hadoop Cluster

Cited by 2 publications

References 27 publications

Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters

Adaptive Multi-Criteria Selection for Efficient Resource Allocation in Frugal Heterogeneous Hadoop Clusters

Towards Improving YARN performance for Frugal Heterogeneous SBC-based Edge Clusters

Contact Info

Product

Resources

About