Richard Golding scite author profile

Enterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system design is tedious and complicated to do by hand, usually leading to solutions that are grossly overprovisioned, substantially under-performing or, in the worst case, both.To solve the configuration nightmare, we present MINERVA: a suite of tools for designing storage systems automatically. MINERVA uses declarative specifications of application requirements and device capabilities; constraint-based formulations of the various subproblems; and optimization techniques to explore the search space of possible solutions. This paper also explores and evaluates the design decisions that went into MINERVA, using specialized micro and macro-benchmarks. We show that MINERVA can successfully handle a workload with substantial complexity (a decision-support database benchmark). MINERVA created a 16-disk design in only a few minutes that achieved the same performance as a 30-disk system manually designed by human experts. Of equal importance, MINERVA was able to predict the r esulting system's performance before it was built. AbstractEnterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system design is tedious and complicated to do by hand, usually leading to solutions that are grossly over-provisioned, substantially under-performing or, in the worst case, both.To solve the configuration nightmare, we present MIN-ERVA: a suite of tools for designing storage systems automatically. MINERVA uses declarative specifications of application requirements and device capabilities; constraintbased formulations of the various sub-problems; and optimization techniques to explore the search space of possible solutions. This paper also explores and evaluates the design decisions that went into MINERVA, using specialized microand macro-benchmarks. We show that MINERVA can successfully handle a workload with substantial complexity (a decision-support database benchmark). MIN-ERVA created a 16-disk design in only a few minutes that achieved the same performance as a 30-disk system manually designed by human experts. Of equal importance, MINERVA was able to predict the resulting system's performance before it was built.

show abstract

A longitudinal survey of Internet host reliability

Long

Muir

Golding³

View full text Add to dashboard Cite

An accurate estimate of host reliability is important for correct analysis of many fault-tolerance and replication mechanisms. In a previous study, we estimated host system reliability by querying a large number of hosts to find how long they had been functioning, estimating the mean timeto-failure (MTTF) and availability from those measures, and in turn deriving an estimate of the mean time-to-repair (MTTR). Howevel; this approach had a bias towards more reliable hosts that could result in overestimating MTTR and underestimating availability. To address this bias we have conducted a second experiment, using a fault-tolerant replicated monitoring tool. This tool directly measures TTe TTR, and availability by polling many sites frequently from several locations. We find that these more accurate results generally confirm and improve our earlier estimates, particularly for TTR. We also find that failure and repair are unlikely to follow Poisson processes.

show abstract

The HP AutoRAID hierarchical storage system

Wilkes

Golding

Staelin

et al. 1996

ACM Trans. Comput. Syst.

260

View full text Add to dashboard Cite

Configuring redundant disk arrays is a black art. To configure an array properly, a system administrator must understand the details of both the array and the workload it will support. Incorrect understanding of either, or changes in the workload over time, can lead to poor performance. We present a solution to this problem: a two-level storage hierarchy implemented inside a single disk-array controller. In the upper level of this hierarchy, two copies of active data are stored to provide full redundancy and excellent performance. In the lower level, RAID 5 parity protection is used to provide excellent storage cost for inactive data, at somewhat lower performance. The technology we describe in this article, know as HP AutoRAID, automatically and transparently manages migration of data blocks between these two levels as access patterns change. The result is a fully redundant storage system that is extremely easy to use, is suitable for a wide variety of workloads, is largely insensitive to dynamic workload changes, and performs much better than disk arrays with comparable numbers of spindles and much larger amounts of front-end RAM cache. Because the implementation of the HP AutoRAID technology is almost entirely in software, the additional hardware cost for these benefits is very small. We describe the HP AutoRAID technology in detail, provide performance data for an embodiment of it in a storage array, and summarize the results of simulation studies used to choose algorithms implemented in the array.

show abstract

Reliability for Networked Storage Nodes

Rao

Hafner

Golding

2011

IEEE Trans. Dependable and Secure Comput.

View full text Add to dashboard Cite

High-end enterprise storage has traditionally consisted of monolithic systems with customized hardware, multiple redundant components and paths, and no single point of failure. Distributed storage systems realized through networked storage nodes offer several advantages over monolithic systems such as lower cost and increased scalability. In order to achieve reliability goals associated with enterprise-class storage systems, redundancy will have to be distributed across the collection of nodes to tolerate both node and drive failures. In this paper, we present alternatives for distributing this redundancy, and models to determine the reliability of such systems. We specify a reliability target and determine the configurations that meet this target. Further, we perform sensitivity analyses, where selected parameters are varied to observe their effect on reliability.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Richard Golding

Weak consistency group communication for wide-area systems

Minerva

A longitudinal survey of Internet host reliability

The HP AutoRAID hierarchical storage system

Reliability for Networked Storage Nodes

Contact Info

Product

Resources

About