Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

Bhagwat, Deepavali; Eshghi, Kave; Long, Darrell D. E.; Lillibridge, Mark

doi:10.1109/mascot.2009.5366623

Cited by 257 publications

(143 citation statements)

References 19 publications

Supporting

Mentioning

141

Contrasting

Unclassified

Order By: Relevance

“…Thus only single instance of the file is saved and subsequent copies are replaced with a pointer to the original file. Block De-duplication [6], [7] divides the files into fixed-size block or variable-size blocks. For Fixed-size chunking, a file is partitioned into fixed size chunks for example each block with 8KB or 16KB.…”

Section: F De-duplication Techniquesmentioning

confidence: 99%

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Devi¹,

Khanna²,

Bhalaji³

2014

IJMLC

View full text Add to dashboard Cite

Abstract-Cloud Storage provide users with abundant storage space and make user friendly for immediate data access. But there is a lack of analysis on optimizing cloud storage for effective data access. With the development of storage and technology, digital data has occupied more and more space. According to statistics, 60% of digital data is redundant, and the data compression can only eliminate intra-file redundancy. In order to solve these problems, De-Duplication has been proposed. Many organizations have set up private cloud storage with their unused resources for resource utilization. Since private cloud storage has limited amount of hardware resources, they need to optimally utilize the space to hold maximum data. In this paper, we discuss the flaws in existing methods for Data De-Duplication. Our proposed method namely Dynamic Whole File De-duplication (DWFD) provides dynamic space optimization in private cloud storage backup as well as increase the throughput and de-duplication efficiency.

show abstract

Section: F De-duplication Techniquesmentioning

confidence: 99%

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Devi¹,

Khanna²,

Bhalaji³

2014

IJMLC

View full text Add to dashboard Cite

show abstract

“…Extreme Binning [8] is a file-similarity based cluster deduplication scheme. It can easily route similar data to the same deduplication node by extracting similarity characteristics in backup streams, but often suffers from low duplicate elimination ratio when data streams lack detectable similarity.…”

Section: Cluster Deduplication Techniquesmentioning

confidence: 99%

“…Source inline data deduplication is favored in industry and academia, because it can immediately identify and eliminate duplicates in datasets at the source of data generation and hence significantly reduce physical storage capacity requirements and save network bandwidth during data transfer. To satisfy scalable capacity and performance requirements in Big Data protection, cluster deduplication [6,7,8,9,11,12] has been proposed to provide high deduplication throughput in massive backup data. It includes inter-node data assignment from backup clients to multiple deduplication nodes by a data routing scheme, and independent intra-node deduplication in individual nodes.…”

Section: Introductionmentioning

confidence: 99%

“…While these methods can easily find the node with highest similarity by extracting similarity features in the backup data streams, they often fail to obtain high deduplication effectiveness in individual deduplication servers. Extreme Binning [8] is a well-know example of similarity-based approaches that exploit file similarity in backup cluster systems. A more recent study, called SiLo [18], exploits both locality and similarity in backup streams to achieve a near-exact deduplication but at a RAM cost that is much lower than locality-only or similarityonly based methods.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Scalable Inline Cluster Deduplication Framework for Big Data Protection

Jiang

Xiao

2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

“…Scalable storage. Extreme Binning [3], HydraFS [27] and DeDe [7] are scalable storage systems that support data deduplication. Extreme Binning exploits file similarity rather than chunk locality.…”

Section: Related Workmentioning

confidence: 99%

Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud

Wong

et al. 2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Deduplication is an approach of avoiding storing data blocks with identical content, and has been shown to effectively reduce the disk space for storing multi-gigabyte virtual machine (VM) images. However, it remains challenging to deploy deduplication in a real system, such as a cloud platform, where VM images are regularly inserted and retrieved. We propose LiveDFS, a live deduplication file system that enables deduplication storage of VM images in an open-source cloud that is deployed under low-cost commodity hardware settings with limited memory footprints. LiveDFS has several distinct features, including spatial locality, prefetching of metadata, and journaling. LiveDFS is POSIXcompliant and is implemented as a Linux kernel-space file system. We deploy our LiveDFS prototype as a storage layer in a cloud platform based on OpenStack, and conduct extensive experiments. Compared to an ordinary file system without deduplication, we show that LiveDFS can save at least 40% of space for storing VM images, while achieving reasonable performance in importing and retrieving VM images. Our work justifies the feasibility of deploying LiveDFS in an open-source cloud.

show abstract

Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

Cited by 257 publications

References 19 publications

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

A Scalable Inline Cluster Deduplication Framework for Big Data Protection

Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud

Contact Info

Product

Resources

About