Deriving and comparing deduplication techniques using a model-based classification

Kaiser, Jürgen; Brinkmann, André; Süß, Tim; Meister, Dirk

doi:10.1145/2741948.2741952

Cited by 6 publications

(2 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A feature of backup workloads is strong locality, which has been used to tackle the challenges facing deduplication-based backup storage. [27][28][29] Here locality refers to the fact that current backup stream tend to have patterns that correspond to an earlier backup stream. For example, in order to accelerate the process of detecting duplicate chunks, the system groups consecutive unique chunks into fixed-sized containers to preserve their F I G U R E 1 An example of data deduplication.…”

Section: Background Of Data Reduction Techniquesmentioning

confidence: 99%

A cost‐efficient resemblance detection scheme for post‐deduplication delta compression in backup systems

Wang

Yan

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

Delta compression, which is efficient in removing repeated string among similar chunks, can be used as a complement to data deduplication in backup storage for extra space savings. The process of detecting similar candidates to use as the base for delta compression is called resemblance detection. Several indexes are required for resemblance detection. Maintaining them in RAM would limit the system scalability and increase system cost. Storing them on the disk suffers from low throughput due to poor random I/O performance of the disk. In this article, we present the history-aware resemblance detection (HARD), a cost-efficient resemblance detection approach that captures most of the similar chunks with a limited memory footprint. HARD is based on the observation that, for chunks in a backup, most of their similar chunks can be found in the most recent backups. HARD thus only indexes super-features in the most recent backups for resemblance detection to reduce the memory footprint of resemblance indexes while captures most of the potential similar chunks for delta compression.Experimental results based on three real-world datasets show that HARD achieves higher compression than the state-of-the-art approach.

show abstract

Section: Background Of Data Reduction Techniquesmentioning

confidence: 99%

A cost‐efficient resemblance detection scheme for post‐deduplication delta compression in backup systems

Wang

Yan

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…However, we vary the number of used processes in Section V-C. Table I shows the different sizes of the checkpoints. c) Deduplication: We analyzed each checkpoint with the FS-C deduplication tool suite [49], which has already been applied in several deduplication studies [50], [51]. We chose fixed-sized chunking and content-defined chunking (CDC) as chunking methods.…”

Section: Deduplication Of Checkpointsmentioning

confidence: 99%