R. Swami scite author profile

R. Swami

3Publications

5Citation Statements Received

0Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Error detection and handling in a superscalar, speculative out-of-order execution processor system

Saxena¹,

Chen²,

Swami³

et al.

View full text Add to dashboard Cite

The HaL SPARC64TM Processol; the first 64-bit SPARC-V9 architecture implementation, uses several techniques to ensure a high degree of system reliability, error detection, and error recovery. The CPU of the multi-chip module processor has a superscalal; speculative issue unit, and an out-oforder execution datapath. These two processor components complicate the maintenance of precise state in the event of errors. By exploiting the SPARC-V9 architectural features, and the micro-architecture for speculative execution, S P A R C M~~ maintains precise state in the event of exceptions and errors, logs and reports errors, and facilitates error detection during full system bringup. This paper presents details of error detection and handling in the CPU, the cache system, and the Memory Management Unit (MMU). The HaL R1 system also implements a fault-secure memory system design. The memory system corrects all single-bit errors, detects double bit errors, detects single address line failures, and detects all single dynamic RAM (DRAM) chip failures. Certain debug features have been added to the system that are useful during system bring-up. 1: IntroductionDesign philosophy and design trade-offs are strongly affected by the overall system goals. Historically, high reliability and availability system goals were the domain of military, industrial, aerospace, main-frame, and communications applications [ 11. Recently, reliability and availability goals have also assumed importance in microprocessor-based workstation environments. For instance, reliability studies (reported in [l]) in the late 1970's on systems like the B5500, Univac 1108, IBM Dual 370/165, PDP-10, and CRAY-1 indicate a mean-time-to-crash on the order of 10 to 15 hours, which translates (considering the performance during the 70's) to about 2x10" instructions executed between failures. In toda 's high performance workstations (approximately 2x10 instructions per second performance) this would have translated to a mean-time-to-crash value of almost 17 minutes. The fact that current workstations typically run for far longer than 2.17 minutes before crashing not only points to the use of robust design techniques and technology, but also to an increased emphasis on reliable design methods. The most visible aspect of this is the use of error correcting codes in DRAM-based memory in most of the commercial workstations.Although the specification of reliability requirements (in terms of mean-time-to-failure) or availability requirements (average system down time, which is a function of mean-time-to-failure and mean-time-to-repair) could be made precise; translating these requirements into precise design decisions and design trade-offs is not straightforward'. The problem is compounded by the difficulty in proving that design decisions and trade-offs do meet the specified reliability requirements. This is a recognized problem. However, it is possible to include error-checking mechanisms that help the designers to obtain appropriate error detection and recovery techniques....

show abstract

Microarchitecture of HaL's CPU

Patkar¹,

Katsuno²,

Li³

et al.

View full text Add to dashboard Cite

A 64b 4-issue out-of-order execution RISC processor

Shen¹,

Patkar²,

Ando³

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

R. Swami

Error detection and handling in a superscalar, speculative out-of-order execution processor system

Microarchitecture of HaL's CPU

A 64b 4-issue out-of-order execution RISC processor

Contact Info

Product

Resources

About