Frequency Based Chunking for Data De-Duplication

Lu, Guanlin; Jin, Yu; Du, Di

doi:10.1109/mascots.2010.37

Cited by 56 publications

(18 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Samuel et al [11] presented the design of a system for composing and enforcingcontext-aware disclosure rules for preserving privacy and securityof multimedia big data systems. Lu et al [12] proposed a frequency based chunkingalgorithm, which explicitly considers the frequencyinformation of data segments during the chunking process. Yu et al [13] presented the leap-based CDCalgorithm and added a secondary condition to it in order toreduce the computing overhead and maintain the samededuplication ratio.…”

Section: Related Workmentioning

confidence: 99%

Estimation of Secure Data Deduplication in Big Data

Kumar¹,

Shalini²

2017

IJARCSSE

View full text Add to dashboard Cite

Bigdata is linked with the entireties of composite data sets. In bigdata environment, data is in the form of unstructured data and may contain number of duplicate copies of same data. To manage such a complex unstructured data hadoop is to be used. A hadoop is an open source platform specially designed for bigdata environment. Hadoop can handle unstructured data very efficiently as compare to tradition data processing tools. To reduce duplicity of data concept of deduplication is used. In this paper an evaluation of different chunking and deduplication techniques has been presented.

show abstract

Section: Related Workmentioning

confidence: 99%

Estimation of Secure Data Deduplication in Big Data

Kumar¹,

Shalini²

2017

IJARCSSE

View full text Add to dashboard Cite

show abstract

“…On the other hand, the building-up algorithm divides the stream into small chunks that are then composed when the deduplication gain is not affected. Moreover, a variant of the breaking apart algorithm can be combined with a statistical chunk frequency estimation algorithm, further dividing large chunks that contain smaller chunks appearing frequently in the data stream and consequently allowing higher space savings [Lu et al 2010].…”

Section: Granularitymentioning

confidence: 99%

A Survey and Classification of Storage Deduplication Systems

2014

View full text Add to dashboard Cite

The automatic elimination of duplicate data in a storage system, commonly known as deduplication, is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid-state drives, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development.The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.

show abstract

“…Data compression reduces the file size by eliminating redundant data contained in a document, while data deduplication identifies duplicate data elements, such as an entire file [13,14] and data block [15][16][17][18][19][20][21][22][23], and eliminates both intra-file and inter-file data redundancy, hence reducing the data to be transferred or stored. When multiple instances of the same data element are detected, only one single copy of the data element is transferred or stored.…”

Section: B Data Deduplicationmentioning

confidence: 99%

“…The redundant data element is replaced with a reference or pointer to the unique data copy. Based on the algorithm granularity, data deduplication algorithms can be classified into three categories: whole file hashing [13,14], sub-file hashing [15][16][17][18][19][20][21][22][23], and delta encoding [24]. Traditional data de-duplication operates at the application layer, such as object caching, to eliminate redundant data transfers.…”

Section: B Data Deduplicationmentioning

confidence: 99%

“…Some focus on maximizing bandwidth utilization, others address latency, and still others address protocol inefficiency which hinders the effective delivery of packets across the WAN. The most common techniques, employed by WAN optimization to maximize application performance across the WAN, include compression [6][7][8][9][10], data deduplication [11][12][13][14][15][16][17][18][19][20][21][22][23][24], caching [25][26][27][28][29][30][31][32][33][34][35][36][37][38][39], prefetching , and protocol optimization [63][64][65][66][67][68][69][70][71][72][73][74][75][76]…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On Wide Area Network Optimization

Zhang¹,

Ansari²,

et al. 2012

IEEE Commun. Surv. Tutorials

View full text Add to dashboard Cite

Abstract-Applications, deployed over a wide area network (WAN) which may connect across metropolitan, regional or national boundaries, suffer performance degradation owing to unavoidable natural characteristics of WANs such as high latency and high packet loss rate. WAN optimization, also known as WAN acceleration, aims to accelerate a broad range of applications and protocols over a WAN. In this paper, we provide a survey on the state of the art of WAN optimization or WAN acceleration techniques, and illustrate how these acceleration techniques can improve application performance, mitigate the impact of latency and loss, and minimize bandwidth consumption. We begin by reviewing the obstacles in efficiently delivering applications over a WAN. Furthermore, we provide a comprehensive survey of the most recent content delivery acceleration techniques in WANs from the networking and optimization point of view. Finally, we discuss major WAN optimization techniques which have been incorporated in widely deployed WAN acceleration productsmultiple optimization techniques are leveraged by a single WAN accelerator to improve application performance in general.

show abstract

Frequency Based Chunking for Data De-Duplication

Cited by 56 publications

References 7 publications

Estimation of Secure Data Deduplication in Big Data

Estimation of Secure Data Deduplication in Big Data

A Survey and Classification of Storage Deduplication Systems

On Wide Area Network Optimization

Contact Info

Product

Resources

About