Parallel Prefetching and Caching Is Hard

Ambühl, Christoph; Weber, Birgitta

doi:10.1007/978-3-540-24749-4_19

Cited by 4 publications

(2 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The problem of finding optimal caching on multiple-disk is proved to be NP-hard [19]. A simpler problem on a single-disk setup has a polynomial solution [20], which is, unfortunately, too complex to be applied in real applications.…”

Section: A Problem Statementmentioning

confidence: 99%

Towards cost-effective and high-performance caching middleware for distributed systems

Zhao

Qiao

Raicu

2016

IJBDI

View full text Add to dashboard Cite

One performance bottleneck of distributed systems lies on the hard disk drive (HDD) whose single read/write head has physical limitations to support concurrent I/Os. Although the solid-state drive (SSD) has been introduced for years, HDDs are still dominant storage due to large capacity and low cost. This paper proposes a caching middleware that manages the underlying heterogeneous storage devices in order to allow distributed file systems to achieve both high performance and low cost. Specifically, we design and implement a user-level caching system that offers SSD-like performance at a cost similar to a HDD. We demonstrate how such a middleware improves the performance of distributed file systems, such as the HDFS. Experimental results show that the caching system delivers up to 7X higher throughput and 76X higher IOPS than Linux Ext4 file system, and accelerates HDFS by 28% on 32 nodes.

show abstract

Section: A Problem Statementmentioning

confidence: 99%

Towards cost-effective and high-performance caching middleware for distributed systems

Zhao

Qiao

Raicu

2016

IJBDI

View full text Add to dashboard Cite

show abstract

“…The problem of finding optimal caching on multiple-disk is proved to be NP-hard [6]. A simpler problem on a single-disk setup has a polynomial solution [5], which is, unfortunately, too complex to be applied in real applications.…”

Section: B Heuristic Cachingmentioning

confidence: 99%

HyCache+: Towards Scalable High-Performance Caching Middleware for Parallel File Systems

Zhao

Qiao

Raicu

2014

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

View full text Add to dashboard Cite

Abstract-The ever-growing gap between the computation and I/O is one of the fundamental challenges for future computing systems. This computation-I/O gap is even larger for modern large scale high-performance systems due to their state-of-the-art yet decades long architecture: the compute and storage resources form two cliques that are interconnected with shared networking infrastructure. This paper presents a distributed storage middleware, called HyCache+, right on the compute nodes, which allows I/O to effectively leverage the high bi-section bandwidth of the high-speed interconnect of massively parallel high-end computing systems. HyCache+ provides the POSIX interface to end users with the memory-class I/O throughput and latency, and transparently swap the cached data with the existing slowspeed but high-capacity networked attached storage. HyCache+ has the potential to achieve both high performance and lowcost large capacity, the best of both worlds. To further improve the caching performance from the perspective of the global storage system, we propose a 2-phase mechanism to cache the hot data for parallel applications, called 2-Layer Scheduling (2LS), which minimizes the file size to be transferred between compute nodes and heuristically replaces files in the cache. We deploy HyCache+ on the IBM BlueGene/P supercomputer, and observe two orders of magnitude faster I/O throughput than the default GPFS parallel file system. Furthermore, the proposed heuristic caching approach shows 29X speedup over the traditional LRU algorithm.

show abstract

Integrated prefetching and caching in single and parallel disk systems

Albers

Büttner

2005

Information and Computation

View full text Add to dashboard Cite

Parallel Prefetching and Caching Is Hard

Cited by 4 publications

References 9 publications

Towards cost-effective and high-performance caching middleware for distributed systems

Towards cost-effective and high-performance caching middleware for distributed systems

HyCache+: Towards Scalable High-Performance Caching Middleware for Parallel File Systems

Integrated prefetching and caching in single and parallel disk systems

Contact Info

Product

Resources

About