A survey and comparative study of data deduplication techniques

Malhotra, Jyoti; Bakal, J. W.

doi:10.1109/pervasive.2015.7087116

Cited by 21 publications

(5 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The size/length of the output of a hash function does not depend on the length of the input. Hash can be regarded as a ”signature” for a given text [ 49 , 50 , 51 ]. One of the major applications of hash functions lies in the field of multimedia broadcast networks, as a content identifier [ 49 , 50 , 51 ].…”

Section: Proposed Distributed Architecture and Clustering Algorithmentioning

confidence: 99%

“…Hash can be regarded as a ”signature” for a given text [ 49 , 50 , 51 ]. One of the major applications of hash functions lies in the field of multimedia broadcast networks, as a content identifier [ 49 , 50 , 51 ]. The hash function aids the network by providing content identification to easily determine which content has been broadcasted, timing information, and to what station.…”

Section: Proposed Distributed Architecture and Clustering Algorithmentioning

confidence: 99%

See 1 more Smart Citation

A Novel Weighted Clustering Algorithm Supported by a Distributed Architecture for D2D Enabled Content-Centric Networks

Aslam

Alam

Hasan

et al. 2020

Sensors

View full text Add to dashboard Cite

Next generation cellular systems need efficient content-distribution schemes. Content-sharing via Device-to-Device (D2D) clustered networks has emerged as a popular approach for alleviating the burden on the cellular network. In this article, we utilize Content-Centric Networking and Network Virtualization to propose a distributed architecture, that supports efficient content delivery. We propose to use clustering at the user level for content-distribution. A weighted multifactor clustering algorithm is proposed for grouping the D2D User Equipment (DUEs) sharing a common interest. The proposed algorithm is evaluated in terms of energy efficiency, area spectral efficiency, and throughput. The effect of the number of clusters on these performance parameters is also discussed. The proposed algorithm has been further modified to allow for a tradeoff between fairness and other performance parameters. A comprehensive simulation study demonstrates that the proposed clustering algorithm is more flexible and outperforms several classical and state-of-the-art algorithms.

show abstract

Section: Proposed Distributed Architecture and Clustering Algorithmentioning

confidence: 99%

Section: Proposed Distributed Architecture and Clustering Algorithmentioning

confidence: 99%

A Novel Weighted Clustering Algorithm Supported by a Distributed Architecture for D2D Enabled Content-Centric Networks

Aslam

Alam

Hasan

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…Although integrating data deduplication with file migration can improve the slow tier space utilization and potentially reduce migration cost, it also brings some performance issues such as high compute and memory resource utilization, high latency, and low throughput [29][30][31]. The chunking process, chunk ID generation, and chunk ID searches in the indexing table are timeconsuming.…”

Section: Deduplication and Challenges Of Integrating Deduplication Wimentioning

confidence: 99%

TDDFS

Cao

Wen

et al. 2019

ACM Trans. Storage

View full text Add to dashboard Cite

With the rapid increase in the amount of data produced and the development of new types of storage devices, storage tiering continues to be a popular way to achieve a good tradeoff between performance and costeffectiveness. In a basic two-tier storage system, a storage tier with higher performance and typically higher cost (the fast tier) is used to store frequently-accessed (active) data while a large amount of less-active data are stored in the lower-performance and low-cost tier (the slow tier). Data are migrated between these two tiers according to their activity. In this article, we propose a Tier-aware Data Deduplication-based File System, called TDDFS, which can operate efficiently on top of a two-tier storage environment. Specifically, to achieve better performance, nearly all file operations are performed in the fast tier. To achieve higher cost-effectiveness, files are migrated from the fast tier to the slow tier if they are no longer active, and this migration is done with data deduplication. The distinctiveness of our design is that it maintains the non-redundant (unique) chunks produced by data deduplication in both tiers if possible. When a file is reloaded (called a reloaded file) from the slow tier to the fast tier, if some data chunks of the file already exist in the fast tier, then the data migration of these chunks from the slow tier can be avoided. Our evaluation shows that TDDFS achieves close to the best overall performance among various file-tiering designs for two-tier storage systems.

show abstract

“…However, a full content based hashing calculation may increase high computation cost [3]. A compromised way is taking a partial content based hashing calculation, which may bring a faster response to users, with a few sacrifices of deduplication inaccuracy [4,5]. This work in the paper aims at how to design and implement the various file deduplication schemes for space saving.…”

Section: Introductionmentioning

confidence: 99%

Design and Implementation of Various File Deduplication Schemes on Storage Devices

Wu¹,

Yu²,

Leu³

et al. 2015

Proceedings of the 11th EAI International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustn

View full text Add to dashboard Cite

As the smart devices revolutionize, people may generate a lot of data and store the data in the local or remote file system in their daily lives. Even though the novel computer hardware and network technologies can handle the demand of generating a big volume of data, effective file deduplication can save storage space in either the private computing environment or the public cloud system. In the paper, we aim at designing and implementing various file deduplication schemes on storage device, which are based on different duplication checking rules, including file name, file size, and file full/partial content hash value. Comprehensive experiment results show that a partial content hashing based file deduplication can have a better trade-off between the computation cost and deduplication accuracy.

show abstract

A survey and comparative study of data deduplication techniques

Cited by 21 publications

References 5 publications

A Novel Weighted Clustering Algorithm Supported by a Distributed Architecture for D2D Enabled Content-Centric Networks

A Novel Weighted Clustering Algorithm Supported by a Distributed Architecture for D2D Enabled Content-Centric Networks

TDDFS

Design and Implementation of Various File Deduplication Schemes on Storage Devices

Contact Info

Product

Resources

About