Abstract-In this paper, we present a novel fault tolerance design technique, which is applicable at the register transfer level, based on protecting the functionality of logic circuits using a probabilistic fault model. The proposed technique selects the most susceptible workload of combinational circuits to protect against probabilistic faults. The workload susceptibility is ranked as the likelihood of any fault to bypass the inherent logical masking of the circuit and propagate an erroneous response to its outputs, when that workload is executed. The workload protection is achieved through a Triple Modular Redundancy (TMR) scheme by using the patterns that have been evaluated as most susceptible. We apply the proposed technique on LGSynth91 and ISCAS85 benchmarks and evaluate its fault tolerance capabilities against errors induced by permanent faults and soft errors. We show that the proposed technique, when it is applied to protect only the 32 most susceptible patterns, achieves on average of all the examined benchmarks, an error coverage improvement of 98% and 94% against errors induced by single stuck-at faults (permanent faults) and soft errors (transient faults), respectively, compared to a reduced TMR scheme that protects the same number of susceptible patterns without ranking them.
Abstract-Electronic systems with power-constrained embedded devices are used for a variety of IoT applications, such as geomonitoring, parking sensors and surveillance. Such applications may tolerate few errors. However, with the increasing occurrence of faults in-the-field, devices that exhibit systematic erroneous behaviour must be eventually identified and replaced. In this paper, we propose a novel low cost error monitoring technique to assist the maintainability planning of low power IoT applications by ranking devices based on the systematic erroneous behaviour they exhibit. Small on-chip monitors are used to collect the signal probability information at the outputs of each device which is then transmitted to the system software via the communications channel of the system to rank them accordingly. To evaluate the error monitoring capabilities of the proposed technique, we injected multiple bit-flips and stuck-at faults on a set of the EPFL and the ISCAS benchmarks. Results demonstrate an average error coverage of 84.4% and 73.1% of errors induced by bit-flips and stuck-at faults, respectively, with an average area cost of 1.52%. A maintainability planning simulation shows that the proposed technique achieves a reduction of 26x to 263x in area cost and static power, and consumes over 625x less power for communications when compared against duplication and comparison.
Low power fault tolerance design techniques trade reliability to reduce the area cost and the power overhead of integrated circuits by protecting only a subset of their workload or their most vulnerable parts. However, in the presence of faults not all workloads are equally susceptible to errors. In this paper, we present a low power fault tolerance design technique that selects and protects the most susceptible workload. We propose to rank the workload susceptibility as the likelihood of any error to bypass the logic masking of the circuit and propagate to its outputs. The susceptible workload is protected by a partial Triple Modular Redundancy (TMR) scheme. We evaluate the proposed technique on timing-independent and timing-dependent errors induced by permanent and transient faults. In comparison with unranked selective fault tolerance approach, we demonstrate a) a similar error coverage with a 39.7% average reduction of the area overhead or b) a 86.9% average error coverage improvement for a similar area overhead. For the same area overhead case, we observe an error coverage improvement of 53.1% and 53.5% against permanent stuck-at and transition faults, respectively, and an average error coverage improvement of 151.8% and 89.0% against timing-dependent and timing-independent transient faults, respectively. Compared to TMR, the proposed technique achieves an area and power overhead reduction of 145.8% to 182.0%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.