2014
DOI: 10.1155/2014/561340
|View full text |Cite
|
Sign up to set email alerts
|

Design and Implementation of File Deduplication Framework on HDFS

Abstract: File systems are designed to control how files are stored and retrieved. Without knowing the context and semantics of file contents, file systems often contain duplicate copies and result in redundant consumptions of storage space and network bandwidth. It has been a complex and challenging issue for enterprises to seek deduplication technologies to reduce cost and increase the storage efficiency. To solve such problem, researchers proposed in-line or offline solutions for primary storages or backup systems at… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0
1

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 11 publications
0
5
0
1
Order By: Relevance
“…These issues include the Identification of a trustworthy service provider, Privacy of the customer data, security to the customer information, Dependability of the services, Scalability, Integration, non-transparent nature, poor identification of feedbacks, Weak service level agreement etc. [15]. Trust management techniques are classified into four categories namely: policy-based trust, reputation-based trust, recommendation-based trust and prediction-based trust.…”
Section: Cloud Trust Managementmentioning
confidence: 99%
“…These issues include the Identification of a trustworthy service provider, Privacy of the customer data, security to the customer information, Dependability of the services, Scalability, Integration, non-transparent nature, poor identification of feedbacks, Weak service level agreement etc. [15]. Trust management techniques are classified into four categories namely: policy-based trust, reputation-based trust, recommendation-based trust and prediction-based trust.…”
Section: Cloud Trust Managementmentioning
confidence: 99%
“…Hence, it eliminates duplicate chunks of data even if the corresponding files are not identical. With performing file-level deduplication, Xu et al [9] designed a file deduplication framework on Hadoop system, where Secure Hash Algorithm 2 (SHA-2) [16] was utilized to conduct the data mapping through the whole file. Therefore, it eliminates duplicate copies of the same file instead of chunks.…”
Section: Data Deduplicationmentioning
confidence: 99%
“…Moreover, most of the files transmitted in enterprise are images or documents with small size. Therefore, how to efficiently reduce the redundant cost of storage spaces, especially dealing with small files, has already been a complex and challenging issue for enterprises [9]. Secondly, elasticity is also an essential factor for a private cloud storage system.…”
Section: Introductionmentioning
confidence: 99%
“…If the consumers and businesses are provided with Internet access, they can directly access their personal files from any corner of the world without installation. This technology enables fruitful computing by incorporating data storage, processing, and bandwidth [1]. In the cloud the data always is roaming, and in such a case the data privacy and tamper-resistance are not guaranteed.…”
Section: Introductionmentioning
confidence: 99%