Jürgen Kaiser scite author profile

Jürgen Kaiser

5Publications

94Citation Statements Received

33Citation Statements Given

How they've been cited

178

How they cite others

Affiliations

Johannes Gutenberg University Mainz, Paderborn University

Publications

Order By: Most citations

A study on data deduplication in HPC storage systems

Meister

Kaiser

Brinkmann

et al. 2012

View full text Add to dashboard Cite

Deduplication is a storage saving technique that is highly successful in enterprise backup environments. On a ﬁle system, a single data block might be stored multiple times across different ﬁles, for example, multiple versions of a ﬁle might exist that are mostly identical. With deduplication, this data replication is localized and redundancy is removed – by storing data just\ud once, all ﬁles that use identical regions refer to the same unique data. The most common approach splits ﬁle data into chunks\ud and calculates a cryptographic ﬁngerprint for each chunk. By checking if the ﬁngerprint has already been stored, a chunk is classiﬁed as redundant or unique. Only unique chunks are stored. This paper presents the ﬁrst study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers (BSC, DKRZ,\ud RENCI, RWTH). In contrast to previous deduplication studies focusing mostly on backup data, we have analyzed over one PB\ud (1212 TB) of online ﬁle system data. The evaluation shows that typically 20% to 30% of this online data can be removed by applying data deduplication techniques, peaking up to 70% for some data sets. This reduction can only be achieved by a subﬁle deduplication approach, while approaches based on whole-ﬁle\ud comparisons only lead to small capacity savings.Peer ReviewedPostprint (published version

show abstract

Block locality caching for data deduplication

Meister¹,

Kaiser²,

Brinkmann³

2013

View full text Add to dashboard Cite

Data deduplication systems discover and remove redundancies between data blocks by splitting the data stream into chunks and comparing a hash of each chunk with all previously stored hashes. Storing the corresponding chunk index on hard disks immediately limits the achievable throughput, as these devices are unable to support the high number of random IOs induced by this index. Several approaches to overcome this chunk lookup disk bottleneck have been proposed. Often, the approaches try to capture the locality information of a backup run and use this in the next backup run to predict future chunk requests. However, often this locality is only captured by a surrogate, e.g., the order of the chunks in containers. [37]. Furthermore, some approaches degenerate slowly when the systems operate over months and years because the locality information becomes outdated.We propose a novel approach, called Block Locality Cache (BLC), that captures the previous backup run significantly better than existing approaches and also always uses up-todate locality information and which is, therefore, less prone to aging.We evaluate the approach using a trace-based simulation of multiple real-world backup datasets. The simulation compares the Block Locality Cache with the approach of Zhu et al. [37] and provides a detailed analysis of the behavior and IO pattern. Furthermore, a prototype implementation is used to validate the simulation.

show abstract

A configurable rule based classful token bucket filter network request scheduler for the lustre file system

Qian¹,

Li²,

Ihara³

et al. 2017

View full text Add to dashboard Cite

Design of an exact data deduplication cluster

Kaiser

Meister

Brinkmann

et al. 2012

View full text Add to dashboard Cite

Cation exchange resins based on phosphonomethyl‐substituted phenols

et al. 1992

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jürgen Kaiser

A study on data deduplication in HPC storage systems

Block locality caching for data deduplication

A configurable rule based classful token bucket filter network request scheduler for the lustre file system

Design of an exact data deduplication cluster

Cation exchange resins based on phosphonomethyl‐substituted phenols

Contact Info

Product

Resources

About