2009 IEEE International Symposium on Modeling, Analysis &Amp; Simulation of Computer and Telecommunication Systems 2009
DOI: 10.1109/mascot.2009.5366623
|View full text |Cite
|
Sign up to set email alerts
|

Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

Abstract: Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput.We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of indi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
141
0
1

Year Published

2011
2011
2020
2020

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 257 publications
(143 citation statements)
references
References 19 publications
1
141
0
1
Order By: Relevance
“…Thus only single instance of the file is saved and subsequent copies are replaced with a pointer to the original file. Block De-duplication [6], [7] divides the files into fixed-size block or variable-size blocks. For Fixed-size chunking, a file is partitioned into fixed size chunks for example each block with 8KB or 16KB.…”
Section: F De-duplication Techniquesmentioning
confidence: 99%
“…Thus only single instance of the file is saved and subsequent copies are replaced with a pointer to the original file. Block De-duplication [6], [7] divides the files into fixed-size block or variable-size blocks. For Fixed-size chunking, a file is partitioned into fixed size chunks for example each block with 8KB or 16KB.…”
Section: F De-duplication Techniquesmentioning
confidence: 99%
“…Extreme Binning [8] is a file-similarity based cluster deduplication scheme. It can easily route similar data to the same deduplication node by extracting similarity characteristics in backup streams, but often suffers from low duplicate elimination ratio when data streams lack detectable similarity.…”
Section: Cluster Deduplication Techniquesmentioning
confidence: 99%
“…Source inline data deduplication is favored in industry and academia, because it can immediately identify and eliminate duplicates in datasets at the source of data generation and hence significantly reduce physical storage capacity requirements and save network bandwidth during data transfer. To satisfy scalable capacity and performance requirements in Big Data protection, cluster deduplication [6,7,8,9,11,12] has been proposed to provide high deduplication throughput in massive backup data. It includes inter-node data assignment from backup clients to multiple deduplication nodes by a data routing scheme, and independent intra-node deduplication in individual nodes.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Scalable storage. Extreme Binning [3], HydraFS [27] and DeDe [7] are scalable storage systems that support data deduplication. Extreme Binning exploits file similarity rather than chunk locality.…”
Section: Related Workmentioning
confidence: 99%