Computer systems have achieved significant progress in the areas of technology, performance, capability, and RAS (reliability/availability/serviceability) during the last quarter century. In this paper, we shall review the advances of IBM computer systems in the RAS area. This progress has for the most part been evolutionary; however, in some cases it has been revolutionary. RAS developments have been driven primarily by technological advances and by increases in functional capability and complexity, but RAS considerations have also played a leading role and have improved technological and functional capability. The paper briefly reviews the progress of computer technology. It points out how IBM has maintained or improved its systems RAS capabilities in the face of the greatly increased number of components and system complexity by improved system recovery and serviceability capability, as well as by basic improvements in intrinsic component failure rate. The paper also covers the CPU, tape, and disk areas and shows how RAS improvements in these areas have been significant. The main objective is to provide a comprehensive view of significant developments in the RAS characteristics of IBM computer systems over the past twenty-five years.
Introduction and general conceptsReliability is a measure of the consistency with which a system successfully provides its specified services. Serviceability is a measure of the ease with which the system is restored to its specified state. Availability is the percentage of the time during which the system is providing that specified service [1]. The characteristics of and the effect on the system with regard to these three interrelated quantities are referred to as the system RAS.