Efficient Deduplication Techniques for Modern Backup Operation

Min, Jaehong; YOON, D; Won, Youjip

doi:10.1109/tc.2010.263

Cited by 77 publications

(39 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is also frequently used to calculate the key values in the recent big data and cloud computing environment [3,4,5]. It is critical to develop a high speed SHA-1 hardware since the performance of the entire system depends on the speed of key creation, especially under key-value store environment [6,7,8].…”

Section: Introductionmentioning

confidence: 99%

Parallelizing SHA-1

Lee

Kim

et al. 2015

IEICE Electron. Express

Self Cite

View full text Add to dashboard Cite

In this paper, we propose the parallel architecture for high speed calculations of SHA-1, a widely used cryptographic hash function. Parallel SHA-1 consists of a number of base modules which process the message digest in parallel manner. The base module uses state of art SHA-1 acceleration techniques: loop unfolding, pre-processing, and pipelining. We achieved the performance improvement of 5.8% over the pipeline architecture that is known to have nearly achieved the theoretical performance limit. We implemented our system on the Xilinx Virtex-6 FPGA and verified the operations by interfacing it with MicroBlaze soft processor core.

show abstract

Section: Introductionmentioning

confidence: 99%

Parallelizing SHA-1

Lee

Kim

et al. 2015

IEICE Electron. Express

Self Cite

View full text Add to dashboard Cite

show abstract

“…Nowadays, researchers suggest setting the expected chunk size by rule of thumb. For example, 4 KB [17] or 8 KB [3,7,18] are considered reasonable by some researchers; Symantec Storage Foundation 7.0 (Mountain View, CA, USA) recommends a chunk size of 16 KB or higher [19]; IBM (Armonk, NY, USA) mentioned the average chunk size for most deduplicated files is about 100 KB [20]. However, these expected chunk sizes lack either theoretical proof or experimental evaluation.…”

Section: Related Workmentioning

confidence: 99%

“…At present, deduplication is widely used in secondary storage systems such as backup or archival systems [1][2][3][4], and is also gradually used in primary storage systems, such as file systems [5,6]. Content defined chunking (CDC) [7] can achieve high duplicate elimination ratios (DERs), and therefore is the most widely used data chunking algorithm.…”

Section: Introductionmentioning

confidence: 99%

A Logistic Based Mathematical Model to Optimize Duplicate Elimination Ratio in Content Defined Chunking Based Big Data Storage System

et al. 2016

View full text Add to dashboard Cite

Abstract:Deduplication is an efficient data reduction technique, and it is used to mitigate the problem of huge data volume in big data storage systems. Content defined chunking (CDC) is the most widely used algorithm in deduplication systems. The expected chunk size is an important parameter of CDC, and it influences the duplicate elimination ratio (DER) significantly. We collected two realistic datasets to perform an experiment. The experimental results showed that the current approach of setting the expected chunk size to 4 KB or 8 KB empirically cannot optimize DER. Therefore, we present a logistic based mathematical model to reveal the hidden relationship between the expected chunk size and the DER. This model provides a theoretical basis for optimizing DER by setting the expected chunk size reasonably. We used the collected datasets to verify this model. The experimental results showed that the R 2 values, which describe the goodness of fit, are above 0.9, validating the correctness of this mathematic model. Based on the DER model, we discussed how to make DER close to the optimum by setting the expected chunk size reasonably.

show abstract

“…2. The Data de-duplication [4]- [6] can operate at the whole file, block (Chunk), and bit level. Whole file de-duplication or Single Instance Storage (SIS) [3] finds the hash value for the entire file which is the file index.…”

Section: F De-duplication Techniquesmentioning

confidence: 99%

“…This method of detecting duplicates is File level de-duplication. Extreme Binning [4] uses this approach by dividing the chunk index into two tiers namely Primary index and Bin [4]. Primary Index contains the representative ChunkID, Whole file hash and pointer to bin.…”

Section: E File Level De-duplication -Extreme Binningmentioning

confidence: 99%

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Devi¹,

Khanna²,

Bhalaji³

2014

IJMLC

View full text Add to dashboard Cite

Abstract-Cloud Storage provide users with abundant storage space and make user friendly for immediate data access. But there is a lack of analysis on optimizing cloud storage for effective data access. With the development of storage and technology, digital data has occupied more and more space. According to statistics, 60% of digital data is redundant, and the data compression can only eliminate intra-file redundancy. In order to solve these problems, De-Duplication has been proposed. Many organizations have set up private cloud storage with their unused resources for resource utilization. Since private cloud storage has limited amount of hardware resources, they need to optimally utilize the space to hold maximum data. In this paper, we discuss the flaws in existing methods for Data De-Duplication. Our proposed method namely Dynamic Whole File De-duplication (DWFD) provides dynamic space optimization in private cloud storage backup as well as increase the throughput and de-duplication efficiency.

show abstract

Efficient Deduplication Techniques for Modern Backup Operation

Cited by 77 publications

References 22 publications

Parallelizing SHA-1

Parallelizing SHA-1

A Logistic Based Mathematical Model to Optimize Duplicate Elimination Ratio in Content Defined Chunking Based Big Data Storage System

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Contact Info

Product

Resources

About