WAN-optimized replication of backup datasets using stream-informed delta compression

Shilane, Philip; Huang, Mark; Wallace, Grant; Hsu, Windsor W.

doi:10.1145/2385603.2385606

Cited by 95 publications

(69 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…of these approaches for likeness detection needs high overheads of computation and categorisation. Shilane et al projected a stream-informed delta compression (SIDC) approach utilized in a WAN surroundings for reducing similar information transmission and so fast information replication [9]. This approach is superfeature primarily based and enhances the block-level deduplication by solely police work resemblance among nonduplicate blocks within the cache that preserves the backup stream locality.…”

Section: Resemblance Detection Based Data Reductionmentioning

confidence: 99%

An Effective Data Reduction in a Twin Cloud Environment using an Authorized De-dup. Technique with DARE Scheme

Gode¹,

Dalvi²

2017

IJCA

View full text Add to dashboard Cite

Data deduplication is the technique of reduction in the data which keeps only one physical copy and generates pointers to that copy for referencing other redundant data. To secure the confidentiality of sensitive data throughout deduplication, the convergent cryptography technique is employed that encrypts the info before uploading it onto the general public cloud.. In our paper, we present the deduplicationaware resemblance detection and elimination (DARE) scheme which supports authorization in twin cloud environment. This scheme uses a duplicate-adjacency information for resemblance detection where we have to consider any two data blocks to be similar only if their respective adjacent data blocks are duplicate. Our proposed system achieves deduplication on encrypted data with minimum overhead and also enhances the security by managing convergent keys. In addition our system also increases the security level by providing OTP validation technique to avoid unauthorized access to the cloud data.

show abstract

Section: Resemblance Detection Based Data Reductionmentioning

confidence: 99%

An Effective Data Reduction in a Twin Cloud Environment using an Authorized De-dup. Technique with DARE Scheme

Gode¹,

Dalvi²

2017

IJCA

View full text Add to dashboard Cite

show abstract

“…Data deduplication is an efficient data reduction approach that not only reduces storage space [4], [5], [6], [7], [8], [9], [10] by eliminating duplicate data but also mini-mizes the transmission of redundant data in low-bandwidth network environments [11], [12], [13], [14]. In general, a chunk-level data deduplication scheme splits data blocks of a data stream (e.g., backup files, databases, and virtual machine images) into multiple data chunks that are each uniquely identified and duplicate-detected by a secure SHA-1 or MD5 hash signature (also called a fingerprint) [5], [11].…”

Section: Introductionmentioning

confidence: 99%

“…While data deduplication has been widely deployed in storage systems for space savings, the fingerprint-based dedu-plication approaches have an inherent drawback: they often fail to detect the similar chunks that are largely identical except for a few modified bytes, because their secure hash digest will be totally different even only one byte of a data chunk was changed [4], [5], [12], [15], [16]. It becomes a big challenge when applying data deduplication to storage data-sets and workloads that have frequently modified data, which demands an effective and efficient way to eliminate redun-dancy among frequently modified and thus similar data.…”

Section: Introductionmentioning

confidence: 99%

“…Delta compression, an efficient approach to removing redundancy among similar data chunks has gained increas-ing attention in storage systems [12], [17], [18], [19], [20]. For example, if chunk A2 is similar to chunk A1 (the base-chunk), the delta compression approach calculates and then only stores the differences (delta) and mapping relation between A2 and A1.…”

Section: Introductionmentioning

confidence: 99%

“…The state-of-the-art solutions [12], [15], [16], [17] detect similarity for delta compression by computing sev-eral Rabin fingerprints as features and grouping them into superfingerprints, also referred to as super-features (SF) (detailed in Section 3.3). Nevertheless, to index a dataset of 80 TB and assuming an average chunk size of 8 KB and 16 bytes per index entry, for example, about 200 GB worth of super-feature index entries must be generated, which will still be too large to fit in memory [12]. Since the random accesses to ondisk index are much slower than that to RAM, the frequent accesses to on-disk super-features will cause the system throughput to become unacceptably low for the users [6], [12], [21].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads

Rannels¹,

Falconieri²,

Phillips³

et al. 2018

IJRTER

View full text Add to dashboard Cite

Abstract-Data reduction has become increasingly important in storage systems due to the explosive growth of digital data in the world that has ushered in the big data era. One of the main challenges facing large-scale data reduction is how to maximally detect and eliminate redundancy at very low overheads. In this paper, we present DARE, a low-overhead deduplication-aware resemblance detection and elimination scheme that effectively exploits existing duplicate-adjacency information for highly efficient resemblance detection in data deduplication based backup/archiving storage systems. The main idea behind DARE is to employ a scheme, call DuplicateAdjacency based Resemblance Detection (DupAdj), by considering any two data chunks to be similar (i.e., candidates for delta compression) if their respective adjacent data chunks are duplicate in a deduplication system, and then further enhance the resemblance detection efficiency by an improved super-feature approach. Our experimental results based on real-world and synthetic backup datasets show that DARE only consumes about 1/4 and 1/2 respectively of the computation and indexing overheads required by the traditional super-feature approaches while detecting 2-10 percent more redundancy and achieving a higher throughput, by exploiting existing duplicate-adjacency information for resemblance detection and finding the "sweet spot" for the super-feature approach

show abstract

A cost‐efficient resemblance detection scheme for post‐deduplication delta compression in backup systems

Wang

Yan

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

Delta compression, which is efficient in removing repeated string among similar chunks, can be used as a complement to data deduplication in backup storage for extra space savings. The process of detecting similar candidates to use as the base for delta compression is called resemblance detection. Several indexes are required for resemblance detection. Maintaining them in RAM would limit the system scalability and increase system cost. Storing them on the disk suffers from low throughput due to poor random I/O performance of the disk. In this article, we present the history-aware resemblance detection (HARD), a cost-efficient resemblance detection approach that captures most of the similar chunks with a limited memory footprint. HARD is based on the observation that, for chunks in a backup, most of their similar chunks can be found in the most recent backups. HARD thus only indexes super-features in the most recent backups for resemblance detection to reduce the memory footprint of resemblance indexes while captures most of the potential similar chunks for delta compression.Experimental results based on three real-world datasets show that HARD achieves higher compression than the state-of-the-art approach.

show abstract

WAN-optimized replication of backup datasets using stream-informed delta compression

Cited by 95 publications

References 19 publications

An Effective Data Reduction in a Twin Cloud Environment using an Authorized De-dup. Technique with DARE Scheme

An Effective Data Reduction in a Twin Cloud Environment using an Authorized De-dup. Technique with DARE Scheme

DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads

A cost‐efficient resemblance detection scheme for post‐deduplication delta compression in backup systems

Contact Info

Product

Resources

About