Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud

Ng, Chun-Ho; Ma, Mingcao; Wong, Eddy W.Y.; Lee, Patrick P. C.; Lui, John C. S.

doi:10.1007/978-3-642-25821-3_5

Cited by 51 publications

(39 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, we think it is worthwhile to investigate data compression and deduplication techniques that have been developed for VMI storage (e.g. [7,14]) in the context of VMI caches to gain even more storage efficacy.…”

Section: Discussionmentioning

confidence: 99%

Scalable virtual machine deployment using VM image caches

Razavi¹,

Kielmann²

2013

Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

In IaaS clouds, VM startup times are frequently perceived as slow, negatively impacting both dynamic scaling of web applications and the startup of high-performance computing applications consisting of many VM nodes. A significant part of the startup time is due to the large transfers of VM image content from a storage node to the actual compute nodes, even when copy-on-write schemes are used. We have observed that only a tiny part of the VM image is needed for the VM to be able to start up. Based on this observation, we propose using small caches for VM images to overcome the VM startup bottlenecks. We have implemented such caches as an extension to KVM/QEMU. Our evaluation with up to 64 VMs shows that using our caches reduces the time needed for simultaneous VM startups to the one of a single VM.

show abstract

Section: Discussionmentioning

confidence: 99%

Scalable virtual machine deployment using VM image caches

Razavi¹,

Kielmann²

2013

Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

show abstract

“…Another solution for increasing deduplication throughput and reducing I/O latency is to use a Bloom filter and explore spatial locality by preserving the disk layout, then prefetching contiguous chunk signatures to cache as in DDFS. These two improvements were presented for an inline centralized deduplication system along with a novel fault-tolerant journaling mechanism for tracking system transactions, and recovering data and corresponding signatures in failure scenarios [Ng et al 2011].…”

Section: Primary Storagementioning

confidence: 99%

A Survey and Classification of Storage Deduplication Systems

2014

View full text Add to dashboard Cite

The automatic elimination of duplicate data in a storage system, commonly known as deduplication, is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid-state drives, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development.The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.

show abstract

“…It takes significant memory resource for filtering and caching. NG et al [12] use a related filtering technique for integrating deduplication in Linux file system and the memory consumed is up to 2GB for a single machine. That is still too big in our context discussed below.…”

Section: Background and Related Workmentioning

confidence: 99%

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage

Zhang

Tang

Jiang

et al. 2012

2012 IEEE Fifth International Conference on Cloud Computing

View full text Add to dashboard Cite

Abstract-In a virtualized cloud computing environment, frequent snapshot backup of virtual disks improves hosting reliability but storage demand of such operations is huge. While dirtybitbased technique can identify unmodified data between versions, full deduplication with fingerprint comparison can remove more redundant content at the cost of computing resources. This paper presents a multi-level selective deduplication scheme which integrates inner-VM and cross-VM duplicate elimination under a stringent resource requirement. This scheme uses popular common data to facilitate fingerprint comparison while reducing the cost and it strikes a balance between local and global deduplication to increase parallelism and improve reliability. Experimental results show the proposed scheme can achieve high deduplication ratio while using a small amount of cloud resources.

show abstract

Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud

Cited by 51 publications

References 9 publications

Scalable virtual machine deployment using VM image caches

Scalable virtual machine deployment using VM image caches

A Survey and Classification of Storage Deduplication Systems

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage

Contact Info

Product

Resources

About