2020
DOI: 10.2478/cait-2020-0056
|View full text |Cite
|
Sign up to set email alerts
|

Performance Optimization System for Hadoop and Spark Frameworks

Abstract: The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 25 publications
(19 reference statements)
0
7
0
1
Order By: Relevance
“…Where fluxnum is a function implementing the numerical flux and N is the array of the normal vectors of the cells defined in (3).…”
Section: Transport Simulationmentioning
confidence: 99%
See 1 more Smart Citation
“…Where fluxnum is a function implementing the numerical flux and N is the array of the normal vectors of the cells defined in (3).…”
Section: Transport Simulationmentioning
confidence: 99%
“…The use of data compression in the field of high-performance computing has already been explored in several works [3,5]. Besides memory saving, the use of data compression can also be motivated by the need to transfer data between the CPU and the GPU efficiently.…”
Section: Introductionmentioning
confidence: 99%
“…The data compression techniques can reduce storage usage and the number of I/O operations, improving processing performance. Recent studies [14,15] show that compression methods combined with HPC can significantly enhance the performance of Big data workflows. One of the optimal satellite image formats is Cloud Optimized GeoTIFF (COG) [16], which provides essential advantages compared to traditional formats, such as NetCDF [17].…”
Section: Background and Motivationmentioning
confidence: 99%
“…The article presents a codeless performance-efficient decision-making system mixing the approaches mentioned above to enhance our previous studies [8].…”
Section: Related Workmentioning
confidence: 99%
“…Before processing data, the framework decompresses the data if the input file is compressed in HDFS. Our recent studies show [7,8,9] that the average memory usage for selected scientific workflows is 13-17% for Hadoop and 20-40% for Spark jobs, which neglect the full utilization of the RAM of HDFS nodes. Therefore, the usage of RAM-free space may boost the performance of HDFS processing.…”
Section: Introductionmentioning
confidence: 99%