2024
DOI: 10.22266/ijies2024.0430.46
|View full text |Cite
|
Sign up to set email alerts
|

Integrated Data, Task and Resource Management to Speed Up Processing Small Files in Hadoop Cluster

Abstract: Huge demand on business intelligence applications over large volume of enterprise data has resulted in rapid adoption of High performance data analytics. Hadoop based high performance computing environment are optimized for large files. Data centric execution with localization of computing proximal to data provided higher performance for large files. But for small files, the performance reduced and overhead increased due to the way Hadoop handles files. Resource allocation and scheduling policies of Hadoop has… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…Unlike traditional methods, Hadoop allows for the flexible movement of computation, primarily MapReduce jobs, to the location of the data, managed by a Hadoop Distributed File System (HDFS). Consequently, efficient data placement within compute nodes becomes essential for effective big data processing [5]. Hadoop's default approach to data locality relies heavily on the physical proximity of data to computation nodes, which may not always guarantee optimal performance.…”
Section: Introductionmentioning
confidence: 99%
“…Unlike traditional methods, Hadoop allows for the flexible movement of computation, primarily MapReduce jobs, to the location of the data, managed by a Hadoop Distributed File System (HDFS). Consequently, efficient data placement within compute nodes becomes essential for effective big data processing [5]. Hadoop's default approach to data locality relies heavily on the physical proximity of data to computation nodes, which may not always guarantee optimal performance.…”
Section: Introductionmentioning
confidence: 99%
“…Unlike traditional methods, Hadoop allows for flexible movement of computation primarily MapReduce jobs, to the location of the data, managed by Hadoop Distributed File System (HDFS). Consequently, efficient data placement within compute nodes becomes essential for effective Big Data processing [6]. Hadoop's default approach to data locality relies heavily on the physical proximity of data to computation nodes, which may not always guarantee optimal performance.…”
Section: Introductionmentioning
confidence: 99%