“…Then, we inject timing errors based on these TERs to applications using Multi2Sim simulator. During the error injection process, we let the FUs return a random value each time they have timing errors, similar to [12].…”
With the continuous scaling of CMOS technology, microelectronic circuits are increasingly susceptible to microelectronic variations such as variations in operating conditions. Such variations can cause delay uncertainty in microelectronic circuits, leading to timing errors. Circuit designers typically combat these errors using conservative guardbands in the circuit and architectural design, which can, however, cause significant loss of operational efficiency. In this paper, we propose TEVoT, a supervised learning model that can predict the timing errors of functional units (FUs) under different operating conditions, clock speeds, and input workload. We perform dynamic timing analysis to characterize the delay variations of FUs under different conditions, based on which we collect training data. We then extract useful features from training data and apply supervised learning methods to establish TEVoT. Across 100 different operating conditions, 4 widely-used FUs, 3 clocking speeds, and 3 datasets, TEVoT achieves an average prediction accuracy at 98.25% and is 100X faster than gate-level simulation. We further use TEVoT to estimate application output quality under different operating conditions by exposing circuit-level timing errors to application level. TEVoT achieves an average estimation accuracy at 97% for two image processing applications across 100 operating conditions.
“…Then, we inject timing errors based on these TERs to applications using Multi2Sim simulator. During the error injection process, we let the FUs return a random value each time they have timing errors, similar to [12].…”
With the continuous scaling of CMOS technology, microelectronic circuits are increasingly susceptible to microelectronic variations such as variations in operating conditions. Such variations can cause delay uncertainty in microelectronic circuits, leading to timing errors. Circuit designers typically combat these errors using conservative guardbands in the circuit and architectural design, which can, however, cause significant loss of operational efficiency. In this paper, we propose TEVoT, a supervised learning model that can predict the timing errors of functional units (FUs) under different operating conditions, clock speeds, and input workload. We perform dynamic timing analysis to characterize the delay variations of FUs under different conditions, based on which we collect training data. We then extract useful features from training data and apply supervised learning methods to establish TEVoT. Across 100 different operating conditions, 4 widely-used FUs, 3 clocking speeds, and 3 datasets, TEVoT achieves an average prediction accuracy at 98.25% and is 100X faster than gate-level simulation. We further use TEVoT to estimate application output quality under different operating conditions by exposing circuit-level timing errors to application level. TEVoT achieves an average estimation accuracy at 97% for two image processing applications across 100 operating conditions.
“…Timing Error Propagation (TEP) An alternative approach is to exploit algorithmic noise tolerance and simply allow timing errors to propagate to subsequent stages of computation instead of re-executing inputs that cause errors [18,20,42]. Recent work [20] has demonstrated that TEP causes the classification accuracy of DNNs to drop sharply for timing error rates as low as 0.1%. Our empirical evaluations of TEP in Section 4.2 reach the same conclusion.…”
Section: Timing Speculation For Dnn Acceleratorsmentioning
confidence: 99%
“…Even assuming an ideal single-cycle recovery penalty, a 50% global timing error rate imposes significant performance (and energy) overheads (see Section 2 for more detailed evaluation). On the other hand, although TEP does not have a performance penalty, prior work [20] and our own empirical evaluations in Section 4.2 show that simply allowing timing-induced errors to propagate results in significant drops in classification accuracy even at timing error rates as low as 0.1%.…”
Section: Introductionmentioning
confidence: 96%
“…The idea is based on the observation that worstcase timing critical paths in digital logic are rarely exercised, making it possible to run digital circuits at supply voltages lower than the nominal voltage if timing errors can be dealt with. Two broad techniques are proposed in literature to cope with timing errors: (i) timing error detection and recovery (TED) [9,[12][13][14]; and (ii) timing error propagation (TEP) [20,23,28,42,43]. TED detects timing errors (using Razor flip-flops [12], for instance) and recovers by safely re-executing the offending input.…”
Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling of high-performance DNN accelerators without compromising classification accuracy even in the presence of high timing error rates. Using post-synthesis timing simulations of a DNN accelerator modeled on the Google TPU, we show that Thundervolt enables between 34%-57% energy savings on stateof-the-art speech and image recognition benchmarks with less than 1% loss in classification accuracy and no performance loss. Further, we show that Thundervolt is synergistic with and can further increase the energy efficiency of commonly used run-time DNN pruning techniques like Zero-Skip.
“…(E-mail: jy1989@mail.tsinghua.edu.cn). popular targets for voltage scaling seeking energy savings [30], [4], [10]. While effective, voltage scaling has the disadvantage of changing circuit delay, which causes timing errors that can lead to degradation of application quality.…”
As Moore's Law comes to an end and transistor scaling increasingly falls short in improving energy efficiency, alternative computing paradigms are direly needed. This need is further highlighted by the overwhelming increase in computing demand posed by emerging applications such as multimedia and data analysis. Fortunately, such driving workloads also present new opportunities since, thanks to their inherent error tolerance, they do not require completely accurate computations. Thus, by trading off accuracy for better performance or improved efficiency, approximate computing promises tremendous growth for future computing. Various approximation methods demonstrate the effectiveness of voltage scaling in functional units (FUs) for exploring this energy-error trade-off. Yet, while an accurate error model is critical for assessing the error behavior of voltagescaled FUs and its effects on application quality, existing error models of voltage-scaled FUs overlook the effects of input data and error rate disparity among different bits. To tackle this challenge, we propose LEVAX, an input-aware learning-based error model of voltage-scaled FUs that can predict the timing error rate (TER) for each output bit. This model is trained using random forest methods, with input features and output labels extracted from gate-level simulations. To validate its effectiveness and demonstrate its prediction accuracy, we use LEVAX on various FUs. Across all bit positions, voltage levels, and FUs, LEVAX achieves, on average, a relative error of 1.20%. LEVAX also achieves an average per-voltage Root Mean Square Error (RMSE) of 1.03% and per-bit RMSE of 1.17%. Exposing this error rate even up to the application level, LEVAX can estimate the quality of four image processing applications under voltage scaling with an average accuracy of 97.9%. To the best of our knowledge, LEVAX is the first voltage scaling error model of FUs that can incorporate the effects of input data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.