Jaime Espinosa scite author profile

Hernández

Abella

et al. 2015

Increasingly complex microcontroller designs for safety-relevant automotive systems require the adoption of new methods and tools to enable a cost-effective verification of their robustness. In particular, costs associated to the certification against the ISO26262 safety standard must be kept low for economical reasons. In this context, simulation-based verification using instruction set simulators (ISS) arises as a promising approach to partially cope with the increasing cost of the verification process as it allows taking design decisions in early design stages when modifications can be performed quickly and with low cost. However, it remains to be proven that verification in those stages provides accurate enough information to be used in the context of automotive microcontrollers. In this paper we analyze the existing correlation between fault injection experiments in an RTL microcontroller description and the information available at the ISS to enable accurate ISS-based fault injection.

The Challenge of Detection and Diagnosis of Fugacious Hardware Faults in VLSI Designs

Ruíz

et al. 2013

Abstract. Current integration scales are increasing the number and types of faults that embedded systems must face. Traditional approaches focus on dealing with those transient and permanent faults that impact the state or output of systems, whereas little research has targeted those faults being logically, electrically or temporally masked -which we have named fugacious. A fast detection and precise diagnosis of faults occurrence, even if the provided service is unaffected, could be of invaluable help to determine, for instance, that systems are currently under the influence of environmental disturbances like radiation, suffering from wearout, or being affected by an intermittent fault. Upon detection, systems may react to adapt the deployed fault tolerance mechanisms to the diagnosed problem. This paper explores these ideas evaluating challenges and requirements involved, and provides an outline of potential techniques to be applied.

Characterizing fault propagation in safety-critical processor designs

Hernández

Abella

2015

Abstract-Achieving reduced time-to-market in modern electronic designs targeting safety critical applications is becoming very challenging, as these designs need to go through a certification step that introduces a non-negligible overhead in the verification and validation process. To cope with this challenge, safety-critical systems industry is demanding new tools and methodologies allowing quick and cost-effective means for robustness verification. Microarchitectural simulators have been widely used to test reliability properties in different domains but their use in the process of robustness verification remains yet to be validated against other accepted methods such as RTL or gate-level simulation. In this paper we perform fault injections in an RTL model of a processor to characterize fault propagation. The results and conclusions of this characterization will serve to devise to what extent fault injection methodologies for robustness verification using microarchitectural simulators can be employed.

An Aspect-Oriented Approach to Hardware Fault Tolerance for Embedded Systems

Ruíz

et al. 2014

The steady reduction of transistor size has brought embedded solutions into everyday life. However, the same features of deep-submicron technologies that are increasing the application spectrum of these solutions are also negatively affecting their dependability. Current practices for the design and deployment of hardware fault tolerance and security strategies remain in practice specific (defined on a case-per-case basis) and mostly manual and error prone. Aspect orientation, which already promotes a clear separation between functional and non-functional (dependability and security) concerns in software designs, is also an approach with a big potential at the hardware level. This chapter addresses the challenging problems of engineering such strategies in a generic way via metaprogramming, and supporting their subsequent instantiation and deployment on specific hardware designs through open compilation. This shows that promoting a clear separation of concerns in hardware designs and producing a library of generic, but reusable, hardware fault and intrusion tolerance mechanisms is a feasible reality today.

Increasing the Dependability of VLSI Systems through Early Detection of Fugacious Faults

Gil

2015

Abstract-Technology advances provide a myriad of advantages for VLSI systems, but also increase the sensitivity of the combinational logic to different fault profiles. Shorter and shorter faults which up to date had been filtered, named as fugacious faults, require new attention as they are considered a feasible sign of warning prior to potential failures. Despite their increasing impact on modern VLSI systems, such faults are not largely considered today by the safety industry. Their early detection is however critical to enable an early evaluation of potential risks for the system and the subsequent deployment of suitable failure avoidance mechanisms. For instance, the early detection of fugacious faults will provide the necessary means to extend the mission time of a system thanks to the temporal avoidance of aging effects. Because classical detection mechanisms are not suited to cope with such fugacious faults, this paper proposes a method specifically designed to detect and diagnose them. Reported experiments will show the feasibility and interest of the proposal.

Tolerating multiple faults with proximate manifestations in FPGA-based critical designs for harsh environments

Ruíz

et al. 2012

Field-Programmable Gate Arrays (FPGA) have proven their value over time as final implementation targets. Their sin gular architecture renders them sensitive to a wide range of faults, specially to those causing multiple and non-simulta neous errors, that can result in silent data corruption and also in structural changes in the hardware implementation. This papers presents and tests an approach to enable the confi dent use of conventional (low-cost) FPGAs in hostile envi ronments. The design combines spatial and temporal redun dancy with partial dynamic reconfiguration to increase the resilience of designs. The goal is to tolerate the occurrence of single and multiple faults, even during the reconfigura tion process of FPGAs, while minimizing the impact of the recovery process on system availability. Fault injection tech niques are used to experimentally evaluate various features of the approach. Results are very promising and lead us to state that, although many research is still required, the old idea of self-repairing HW designs is closer today.

Modeling RTL fault models behavior to increase the confidence on TSIM-based fault injection

Hernández

Abella

2016

Abstract-Future high-performance safety-relevant applications require microcontrollers delivering higher performance than the existing certified ones. However, means for assessing their dependability are needed so that they can be certified against safety critical certification standars (e.g ISO26262). Dependability assessment analyses performed at high level of abstraction inject single faults to investigate the effects these have in the system. In this work we show that single faults do not comprise the whole picture, due to fault multiplicities and reactivations. Later we prove that, by injecting complex fault models that consider multiplicities and reactivations in higher levels of abstraction, results are substantially different, thus indicating that a change in the methodology is needed. I. INTRODUCTIONTechnology scaling has allowed processor's semiconductors industry achieving performance numbers beyond Gigaflops with reduced power budget by including a vast number of computational nodes in the same chip [1]. In the safety critical domain the huge amount of available processing power is expected to fullfil the high demands for performance guarantees of new applications like autonomous driving systems [10]. However, hardware systems targeting safety critical applications need to go through a certification process step validating its compliance with the standards [9] [5].The need for a thorough verification and test process that certification standards poses to hardware systems may preclude the use of complex hardware platforms in the context of critical applications. Currently, the verification and test process takes between 50% and 70% of the design effort for a simple microcontroller [8]. Therefore, if high-complexity processors are to be considered for highly critical applications, like ASIL-D in the automotive domain, new methodologies and tools for the robustness verification step have to be devised [2].Safety-critical systems industry uses simulation-based verification as a way to reduce the costs associated with the verification and validation of complex designs. Simulation-based verification allows reducing verification and validation costs as design threats and certification mismatches can be detected before manufacturing. Simulation-based robustness verification is typically carried out at RTL and gate-levels by performing extensive fault-injection campaigns that require huge computation effort to achieve meaningful results. Further increasing the level of abstraction of simulation-based fault injection will allow reducing verification costs of current designs and affording the simulation effort of high-complexity processor designs. In the case of microcontrollers, the instruction set simulator (a.k.a. TSIM) has been regarded as a potential candidate to carry out the robustness verification at a high level of abstraction. However, for TSIM's to be used in the robustness verification process it must be proven that by leveraging them accurate results can be achieved.