Abstract-Negative Bias Temperature Instability (NBTI) is an important lifetime reliability problem in microprocessors. SRAM-based structures within the processor are especially susceptible to NBTI since one of the PMOS devices in the memory cell always has an input of '0'. Previously proposed recovery techniques for SRAM cells aim to balance the degradation of the two PMOS devices by attempting to keep their inputs at a logic '0' exactly 50% of the time. However, one of the devices is always in the negative bias condition at any given time. In this paper, we propose a technique called Recovery Boosting that allows both PMOS devices in the memory cell to be put into the recovery mode by slightly modifying the design of conventional SRAM cells. We present the circuit-level design of an issue queue that uses such cells and perform SPICElevel simulations to verify its functionality and quantify area and power consumption. We then conduct an architecturelevel evaluation of the performance and reliability of using an area-neutral design of such an issue queue using the M5 simulator and the SPEC CPU2000 benchmark suite. We show that recovery boosting provides a 56% improvement in the static noise margin for the issue queue while having very little impact on power consumption and a negligible loss in performance.
NBTI is one of the most important silicon reliability problems facing processor designers today. The impact of NBTI can be mitigated at both the circuit and microarchitecture levels. In this paper, we propose a multi-level optimization approach, combining techniques at the circuit and microarchitecture levels, for reducing the impact of NBTI on the functional units (FUs) of a highperformance processor core. We perform SPICE simulations to evaluate the impact of circuit-level design optimizations to reduce the NBTI guardband in terms of area, delay, and power. We then propose a set of NBTI-aware dynamic instruction scheduling policies at the microarchitecture level and quantify their impact on application performance and guardband reduction through executiondriven simulation. We show that carefully combining techniques at both these levels provides the most attractive solution to reducing the guardband while imposing the least overhead in terms of area, power, delay, and application performance.
Silicon reliability is a key challenge facing the microprocessor industry. Processors need to be designed such that they are resilient against both soft errors and lifetime reliability phenomena. However, techniques developed to address one class of reliability problems may impact other aspects of silicon reliability. In this paper, we show that Redundant Multi-Threading (RMT), which provides soft error protection, exacerbates lifetime reliability. We then explore two different architectural approaches to tackle this problem, namely, Dynamic Voltage Scaling (DVS) and partial RMT. We show that each approach has certain strengths and weaknesses with respect to performance, soft error coverage, and lifetime reliability. We then propose and evaluate a hybrid approach that combines DVS and partial RMT. We show that this approach provides better improvement in lifetime reliability than DVS or partial RMT alone, buys back a significant amount of performance that is lost due to DVS, and provides nearly complete soft error coverage.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.