2015 IEEE International Advance Computing Conference (IACC) 2015
DOI: 10.1109/iadcc.2015.7154903
|View full text |Cite
|
Sign up to set email alerts
|

A novel approach for efficient handling of small files in HDFS

Abstract: The Hadoop Distributed File System (HDFS) is a representative cloud storage platform having scalable, reliable and low-cost storage capability. It is designed to handle large files. Hence, it suffers performance penalty while handling a huge number of small files. Further, it does not consider the correlation between the files to provide prefetching mechanism that is useful to improve access efficiency. In this paper, we propose a novel approach to handle small files in HDFS. The proposed approach combines the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 9 publications
0
7
0
Order By: Relevance
“…Their approach reduces memory consumption on the NameNode, but it didn't show how much the read and write performances are impacted. [16][17] [18] Y. Zhang et al proposed merging related small files according to WebGIS application, which improved the storage effeciency and HDFS metadata management, however the results are limited by the scene. D. Dev , al.…”
Section: Small File Problemmentioning
confidence: 99%
“…Their approach reduces memory consumption on the NameNode, but it didn't show how much the read and write performances are impacted. [16][17] [18] Y. Zhang et al proposed merging related small files according to WebGIS application, which improved the storage effeciency and HDFS metadata management, however the results are limited by the scene. D. Dev , al.…”
Section: Small File Problemmentioning
confidence: 99%
“…In this respect, discovering knowledge out of huge volumes of data can be performed more efficiently, based on the high‐performance computing of cloud 21 . In addition, file correlations have become an increasingly important consideration for performance enhancement in various contexts, such as file systems, 22‐26 distributed systems, 27‐30 and cloud systems 3,18,31‐36 …”
Section: Introductionmentioning
confidence: 99%
“…In order to meet the characteristics of WebGIS accessing patterns, Liu et al [ 8 ] proposed WebGIS-based HDFS storage to support popular web applications, but their method is sensitive to small file I/O performance, and the accessing time of their method is still relatively long. Being aware of the quality of service of the DataNodes, Cloud storage from HDFS [ 9 , 10 ] can increase the storage utilization ratio and improve storage performance. This makes the Hadoop distributed computing model more suitable to unstable wide area network environments.…”
Section: Introductionmentioning
confidence: 99%