In-Memory Caching Orchestration for Hadoop

Kwak, Jaewon; Hwang, Eunji; Yoo, Tae-kyung; Nam, Beomseok; Choi, Young-ri

doi:10.1109/ccgrid.2016.73

Cited by 7 publications

(1 citation statement)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3) Grep is a mix of CPU-bound and I/O-bound operations that searches for a substring in a text le. In addition, the cache a nity feature [15] determines how to utilize the bene t of cached data in each application such that it can be classi ed into three categories based on this feature: low cache a nity (Sort), medium cache a nity (WordCount), and high cache a nity (Grep).…”

Section: Methodsmentioning

confidence: 99%

Smart Data Prefetching Using KNN to Improve Hadoop Performance

Ghazali

Down

2023

Preprint

View full text Add to dashboard Cite

Hadoop is an open-source framework that enables the parallel processing of large data sets across a cluster of machines. It faces several challenges that can lead to poor performance, such as I/O operations, network data transmission, and high data access time. In recent years, researchers have explored prefetching techniques to reduce the data access time as a potential solution to these problems. Nevertheless, several issues must be considered to optimize the prefetching mechanism. These include launching the prefetch at an appropriate time to avoid conflicts with other operations and minimize waiting time, determining the amount of prefetched data to avoid overload and underload, and placing the prefetched data in a location that can be accessed efficiently when required. In this paper, we propose a smart prefetch mechanism that consists of three phases designed to address these issues. First, we enhance the task progress rate to calculate the optimal time for triggering prefetch operations. Next, we utilize K-Nearest Neighbor (KNN) clustering to identify which data blocks should be prefetched in each round, employing the data locality feature to determine the placement of prefetched data. Our experimental results demonstrate that our proposed smart prefetch mechanism improves job execution time by an average of 28.33% by increasing the rate of local tasks.

show abstract

Section: Methodsmentioning

confidence: 99%