2010 IEEE International Symposium on Parallel &Amp; Distributed Processing (IPDPS) 2010
DOI: 10.1109/ipdps.2010.5470468
|View full text |Cite
|
Sign up to set email alerts
|

DEBAR: A scalable high-performance de-duplication storage system for backup and archiving

Abstract: We present DEBAR, a scalable and high-performance de-duplication storage system for backup and archiving, to overcome the throughput and scalability limitations of the state-of-the-art data de-duplication schemes, including the Data Domain De-duplication File System (DDFS). DEBAR uses a two-phase de-duplication scheme (TPDS) that exploits memory cache and disk index properties to judiciously turn the notoriously random and small disk I/Os of fingerprint lookups and updates into large sequential disk I/Os, henc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2011
2011
2015
2015

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 56 publications
(35 citation statements)
references
References 12 publications
(19 reference statements)
0
34
0
Order By: Relevance
“…Tianming Yang, et al, developed DEBAR system with distributed fingerprint index structure arranged by binary sequence [6]. In order to overcome the bottleneck of large index look-up, Bhagwat D, et al, proposed Extreme Binning splitting the index by file and select one fingerprint to represent the file in memory [7].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Tianming Yang, et al, developed DEBAR system with distributed fingerprint index structure arranged by binary sequence [6]. In order to overcome the bottleneck of large index look-up, Bhagwat D, et al, proposed Extreme Binning splitting the index by file and select one fingerprint to represent the file in memory [7].…”
Section: Related Workmentioning
confidence: 99%
“…Due to the reduction in storage space after deduplication, more disks are freed and can be put in a power saving state of standby, spin-down or off state, thus reducing the overall energy consumption of the backup system. P reduced = �P maxIO − P standby � * T iosave * N active + �P standby − P idle � * T backup * S dup S energy_unit (6) The first half of Eq(6) represents the energy consumption of disks occupied by duplicate data during the regular time for IO access. T iosave represents the time overhead for transferring the data andN active denotes the number of working disks.…”
Section: B Energy-aware Backup Strategymentioning
confidence: 99%
“…Finally, distinct nodes may handle distinct tasks. For instance, whereas some nodes partition data and compute signatures, other nodes query and update indexes, thus parallelizing even further the deduplication process [Yang et al 2010a[Yang et al , 2010b.…”
Section: Scopementioning
confidence: 99%
“…As archival and backup storage have overlapping requirements, some solutions address both [Yang et al 2010b]. In fact, these two storage environments have common assumptions regarding data immutability and allow trading off latency for throughput.…”
Section: Backup and Archival Storagementioning
confidence: 99%
“…Source inline data deduplication is favored in industry and academia, because it can immediately identify and eliminate duplicates in datasets at the source of data generation and hence significantly reduce physical storage capacity requirements and save network bandwidth during data transfer. To satisfy scalable capacity and performance requirements in Big Data protection, cluster deduplication [6,7,8,9,11,12] has been proposed to provide high deduplication throughput in massive backup data. It includes inter-node data assignment from backup clients to multiple deduplication nodes by a data routing scheme, and independent intra-node deduplication in individual nodes.…”
Section: Introductionmentioning
confidence: 99%