DEBAR: A scalable high-performance de-duplication storage system for backup and archiving

Yang, Taowei; Jiang, Hong; Feng, Dan; Niu, Zhongying; Zhou, Ke; Wan, Yaping

doi:10.1109/ipdps.2010.5470468

Cited by 56 publications

(35 citation statements)

References 12 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tianming Yang, et al, developed DEBAR system with distributed fingerprint index structure arranged by binary sequence [6]. In order to overcome the bottleneck of large index look-up, Bhagwat D, et al, proposed Extreme Binning splitting the index by file and select one fingerprint to represent the file in memory [7].…”

Section: Related Workmentioning

confidence: 99%

“…Due to the reduction in storage space after deduplication, more disks are freed and can be put in a power saving state of standby, spin-down or off state, thus reducing the overall energy consumption of the backup system. P reduced = �P maxIO − P standby � * T iosave * N active + �P standby − P idle � * T backup * S dup S energy_unit (6) The first half of Eq(6) represents the energy consumption of disks occupied by duplicate data during the regular time for IO access. T iosave represents the time overhead for transferring the data andN active denotes the number of working disks.…”

Section: B Energy-aware Backup Strategymentioning

confidence: 99%

See 1 more Smart Citation

Energy-Aware Deduplication in Backup Storage Systems

Yan¹,

Wu²

2015

Proceedings of the 2015 International Industrial Informatics and Computer Engineering Conference

View full text Add to dashboard Cite

Abstract. Deduplication technology is widely used to eliminate duplicate data and save resources in storage systems of modern data centers. However, from the energy point of view, deduplication cannot always achieve good energy efficiency for storage systems. In this paper, we propose an energy-aware deduplication strategy for backup tasks to achieve the balance of the process in terms of energy consumption, backup throughput and deduplication ratio. Experiments have been made on different datasets to verify the effectiveness of the new strategy.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: B Energy-aware Backup Strategymentioning

confidence: 99%

Energy-Aware Deduplication in Backup Storage Systems

Yan¹,

Wu²

2015

Proceedings of the 2015 International Industrial Informatics and Computer Engineering Conference

View full text Add to dashboard Cite

show abstract

“…Finally, distinct nodes may handle distinct tasks. For instance, whereas some nodes partition data and compute signatures, other nodes query and update indexes, thus parallelizing even further the deduplication process [Yang et al 2010a[Yang et al , 2010b.…”

Section: Scopementioning

confidence: 99%

“…As archival and backup storage have overlapping requirements, some solutions address both [Yang et al 2010b]. In fact, these two storage environments have common assumptions regarding data immutability and allow trading off latency for throughput.…”

Section: Backup and Archival Storagementioning

confidence: 99%

A Survey and Classification of Storage Deduplication Systems

2014

View full text Add to dashboard Cite

The automatic elimination of duplicate data in a storage system, commonly known as deduplication, is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid-state drives, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development.The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.

show abstract

“…Source inline data deduplication is favored in industry and academia, because it can immediately identify and eliminate duplicates in datasets at the source of data generation and hence significantly reduce physical storage capacity requirements and save network bandwidth during data transfer. To satisfy scalable capacity and performance requirements in Big Data protection, cluster deduplication [6,7,8,9,11,12] has been proposed to provide high deduplication throughput in massive backup data. It includes inter-node data assignment from backup clients to multiple deduplication nodes by a data routing scheme, and independent intra-node deduplication in individual nodes.…”

Section: Introductionmentioning

confidence: 99%