M. Peercy scite author profile

1993

Proc. IEEE

This paper examines the wide variety of fault tolerance techniques applied to VLSl technology. We provide a brief synopsis of fault models at the device, gate, and function levels. Then, we introduce the basic methods available to the designer of fault tolerance measures by surveying redundancy techniques. The majority of this paper discusses fault detection, which is the discovery of a fault before it delivers errant data to the rest of the system. We examine techniques of fault detection that use space, time, and information redundancies. Along with these redundancy measures, we also investigate algorithm-based fault tolerance in VLSI components. After discussing large-scale processor-level implementations of fault detection, we describe fault tolerance in automated VLSl production systems.After our treatment of the detection of errors and faults, we examine the other areas of fault tolerance in VLSI: reconjiguration of the system and recovery of system operation. We look into the issues involved in reconjiguration after discover?/ of a fault in fabrication or in operation. Our time here is spent generally in VLSI arrays. We conclude our discussion offault tolerance in VLSl with an overview of the on-chip recovery capabilities of a VLSl microprocessor.

show abstract

Design and evaluation of hardware strategies for reconfiguring hypercubes and meshes under faults

1994

IEEE Trans. Comput.

Design and analysis of software reconfiguration strategies for hypercube multicomputers under multiple faults

Distributed algorithms for shortest-path, deadlock-free routing and broadcasting in arbitrarily faulty hypercubes

Software schemes of reconfiguration and recovery in distributed memory multicomputers using the actor model