Radiation-induced soft error is a significant reliability issue in nanoscale technology nodes. In this paper, a novel approach based on probabilistic model checking is proposed to quantify the soft error vulnerabilities of the registers in the control paths at the Register-Transfer Level (RTL). Efficient abstraction and model simplification techniques are proposed to significantly improve the scalability of our method. The experimental results show the effectiveness of proposed techniques to successfully quantify the register vulnerabilities in the RTL design, to be used for cost-effective selective register protection.
I. INTRODUCTIONSoft error induced by radiation effects is a significant reliability concern in nanoscale VLSI design, and sequential elements (flip-flops) are the dominant contributors to the overall system soft error rate [1]. Accurate analysis of their vulnerabilities plays a key role in selective protection schemes.The architecture level evaluation techniques mostly rely on architecturally correct execution analysis and are not applicable to irregular structures such as the controllers with sequential elements [2]. Circuit level analysis techniques [3,4] are applied to the flattened netlists and hence, lose the abstraction efficiency of error analysis and mitigation at RTL, as the useful high level behavioral semantics are ignored.In the typical RTL designs, the error propagation and masking analysis in control paths are much more challenging than that in data paths. In addition, the control signals may manifest much non-uniform probabilities due to different workloads. For soft error evaluation work at the RTL, fault injections [5] require a long simulation time to obtain results with a reasonable accuracy. A model-checking based technique is employed in [6] to identify the registers that must be protected for correct system functionality, but it only provides a binary classification (yes/no) rather than a quantitative metric.In this paper, we propose a novel method based on formal Probabilistic Model Checking (PMC) to quantitatively evaluate register vulnerabilities in the RTL control paths, therefore avoid time-consuming statistical fault injection. We model the probabilistic behaviors of the RTL designs as Discrete Time Markov Chains (DTMCs), and take workload dependencies into consideration. Furthermore, we leverage the RTL behavioral semantics and exploit several abstraction and simplification techniques to exponentially reduce the size of the state space. The experiment results show that the proposed method is able to handle complex control modules in a typical embedded processor.