As the size of cloud systems and the number of hosted VMs rapidly grow, the scalability of shared VM storage systems becomes a serious issue. Client-side flash-based caching has the potential to improve the performance of cloud VM storage by employing flash storage available on the client-side of the storage system to exploit the locality inherent in VM IOs. However, because of the limited capacity and durability of flash storage, it is important to determine the proper size and configuration of the flash caches used in cloud systems. This paper provides answers to the key design questions of cloud flash caching based on dm-cache, a block-level caching solution customized for cloud environments, and a large amount of long-term traces collected from real-world public and private clouds. The study first validates that cloud workloads have good cacheability and dm-cache-based flash caching incurs low overhead with respect to commodity flash devices. It further reveals that write-back caching substantially outperforms write-through caching in typical cloud environments due to the reduction of server IO load. It also shows that there is a tradeoff on making a flash cache persistent across client restarts which saves hours of cache warm-up time but incurs considerable overhead from committing every metadata update persistently. Finally, to reduce the data loss risk from using write-back caching, the paper proposes a new cache-optimized RAID technique, which minimizes the RAID overhead by introducing redundancy of cache dirty data only, and shows to be significantly faster than traditional RAID and write-through caching.
Abstract-Existing parallel file systems are unable to differentiate I/Os requests from concurrent applications and meet per-application bandwidth requirements. This limitation prevents applications from meeting their desired Quality of Service (QoS) as high-performance computing (HPC) systems continue to scale up. This paper presents vPFS, a new solution to address this challenge through a bandwidth virtualization layer for parallel file systems. vPFS employs user-level parallel file system proxies to interpose requests between native clients and servers and to schedule parallel I/Os from different applications based on configurable bandwidth management policies. vPFS is designed to be generic enough to support various scheduling algorithms and parallel file systems. Its utility and performance are studied with a prototype which virtualizes PVFS2, a widely used parallel file system. Enhanced proportional sharing schedulers are enabled based on the unique characteristics (parallel striped I/Os) and requirement (high throughput) of parallel storage systems. The enhancements include new threshold-and layout-driven scheduling synchronization schemes which reduce global communication overhead while delivering total-service fairness. An experimental evaluation using typical HPC benchmarks (IOR, NPB BTIO) shows that the throughput overhead of vPFS is small (< 3% for write, < 1% for read). It also shows that vPFS can achieve good proportional bandwidth sharing (> 96% of target sharing ratio) for competing applications with diverse I/O patterns.
No abstract
The ever-increasing scale of modern highperformance computing (HPC) systems presents a variety of challenges to the parallel file system (PFS) based storage in these systems. The scalability of application checkpointing is a particularly important challenge because it is critical to the reliability of computing and it often dominates the I/Os in a HPC system. When a large number of parallel processes simultaneously perform checkpointing, the PFS metadata servers can become a serious bottleneck due to the large volume of concurrent metadata operations. This paper specifically addresses this PFS metadata management issue in order to support scalable application checkpointing in large HPC systems. It proposes a new technique named PFSdelegation which delegates the management of the PFS storage space used for checkpointing to applications, thereby relieving the load of metadata operations on the PFS during their checkpointing. This proposed technique is prototyped on PVFS2, a widely used PFS implementation, and evaluated on a HPC cluster using a representative parallel I/O benchmark, IOR. Experiments with up to 128 parallel processes show that the PFS-delegation based checkpointing is significantly faster than the traditional shared-file and file-per-process based checkpointing methods (7% and 10% speedup when the underlying PVFS2 uses a centralized metadata server; 22% and 31% speedup when using distributed metadata servers). The results also demonstrate that the PFS-delegation based checkpointing substantially reduces the total number of metadata operations handled by the metadata servers during the checkpointing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.