Abstract-Energy-efficiency is a critical concern for many systems, ranging from IoT objects and mobile devices to highperformance computers. Moreover, after 40 years of prosperity, Moore's law is starting to show its economic and technical limits. Noticing that many circuits are over-engineered and that many applications are error-resilient or require less precision than offered by the existing hardware, approximate computing has emerged as a potential solution to pursue improvements of digital circuits. In this regard, a technique to systematically trade off accuracy in exchange for area, power and delay savings in digital circuits is proposed: Gate-Level Pruning. A CAD tool is build and integrated into a standard digital flow to offer a wide range of costs-accuracy tradeoffs for any conventional design. The methodology is first demonstrated on adders, achieving up to 78 % energy-delay-area reduction for 10 % mean relative error. It is then detailed how this methodology can be applied on a more complex system composed of a multitude of arithmetic blocks and memory: the Discrete Cosine Transform (DCT), which is a key building block for image and video processing applications. Even though arithmetic circuits represent less than 4 % of the entire DCT area, it is shown that the Gate-Level Pruning technique can lead to 21 % energy-delay-area savings over the entire system for a reasonable image quality loss of 24 dB. This significant saving is achieved thanks to the pruned arithmetic circuits which sets some nodes at constant values, enabling the synthesis tool to further simplify the circuit and memory.
The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. To this end, various precision-scalable MAC architectures optimized for neural networks have recently been proposed. Yet, it has been hard to comprehend their differences and make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. To overcome this, this work exhaustively reviews the state-of-the-art precision-scalable MAC architectures and unifies them in a new taxonomy. Subsequently, these different topologies are thoroughly benchmarked in a 28 nm commercial CMOS process, across a wide range of performance targets, and with precision ranging from 2 to 8 bits. Circuits are analyzed for each precision as well as jointly in practical use cases, highlighting the impact of architectures and scalability in terms of energy, throughput, area and bandwidth, aiming to understand the key trends to reduce computation costs in neural-network processing.Index Terms-ASIC, deep neural networks, precision-scalable circuits, configurable circuits, MAC, multiply-accumulate units. I. INTRODUCTIONE MBEDDED deep learning has gained a lot of attention nowadays due to its broad application prospects and vast potential market. However, the main challenge to embrace this era of edge intelligence comes from the supply-anddemand gap between the limited energy budget of embedded devices, often battery powered, and the computationallyintensive deep-learning algorithms, requiring billions of Multiply-Accumulate (MAC) operations and data movements.To alleviate this unbalanced relationship, many approaches have been investigated at different levels of abstraction. At algorithmic level, researchers have introduced hardware-
Abstract-Inexact and approximate circuit design is a promising approach to improve performance and energy efficiency in technology-scaled and low-power digital systems. Such strategy is suitable for error-tolerant applications involving perceptive or statistical outputs. This paper presents a novel architecture of an Inexact Speculative Adder with optimized hardware efficiency and advanced compensation technique with either error correction or error reduction. This general topology of speculative adders improves performance and enables precise accuracy control. A brief design methodology and comparative study of this speculative adder are also presented herein, demonstrating power savings up to 26 % and energy-delay-area reductions up to 60 % at equivalent accuracy compared to the state-of-the-art. I. INTRODUCTIONAs mobile devices become ubiquitous, the power efficiency of digital systems has become a primary concern. Unfortunately, achieving low-power and Process-Voltage-Temperature (PVT) robustness requires complex and conflicting design constraints and safety margins. Typically, integrated circuits are designed to always ensure accurate operation. In order to avoid faulty results, they finish most computations earlier than the worstcase permitted delay or with more accuracy than needed for normal operation. This results in an inefficient use of resources and leads to "over-engineered" circuits.Inexact and approximate circuit design [1] is a radical approach to trade this counterproductive quest for perfection for substantial gains in power, speed, area and yield. The primary challenge, however, is to determine where and how to let an error or an approximation occur in the circuits without compromising the functionality or the user experience. With ever-increasing amount of data being processed, a wide variety of applications could tolerate inaccuracies. For example, in multimedia processing, a small proportion of errors is not perceptible to humans, and in highly computational algorithms such as data mining, search or recognition, the outcome is not required to be a single result but an adequate match. A promising approach to design inexact circuits is to use speculation to trade circuit accuracy for better power and speed. Taking advantage of such circuits would help to realize extremely energy-efficient and high-performance DSPs and hardware accelerators at lower integration cost and with higher speed, data rate or duty-cycling.The main contribution of this paper is to introduce a novel speculative adder: the Inexact Speculative Adder (ISA). The ISA improves performance, energy efficiency and error management through an optimized speculative path and with a versatile dual-direction error compensation technique. A brief design methodology is presented along with results and a comparative analysis of adder architectures.
Abstract-The floating-point unit is one of the most common building block in any computing system and is used for a huge number of applications. By combining two state-of-the-art techniques of imprecise hardware, namely Gate-Level Pruning and Inexact Speculative Adder, and by introducing a novel Inexact Speculative Multiplier architecture, three different approximate FPUs and one reference IEEE-754 compliant FPU have been integrated in a 65 nm CMOS process within a low-power multi-core processor. Silicon measurements show up to 27 % power, 36 % area and 53 % power-area product savings compared to the IEEE-754 single-precision FPU. Accuracy loss has been evaluated with a high-dynamic-range image tone-mapping algorithm, resulting in small but non-visible errors with image PSNR value of 90 dB. I. INTRODUCTIONWith the forecasted end of Moore's law and the increasing complexity to design and fabricate integrated circuits, power and reliability have become the main challenges to technology scaling. Power has definitely emerged as a critical issue due to the poor scaling of V DD and V th , while transistor miniaturization reaching the nanoscopic scale has led to extreme Process-Voltage-Temperature (PVT) variations. Unfortunately, achieving low power and robustness against PVT variations requires complicated and conflicting design constraints. As a consequence, designers are being pushed to seek for new energy-efficient circuit design and computing techniques to meet the exploding demand of data processing from mobile devices and cloud services.Approximate computing [1, 2] has emerged as a promising solution to sustain computing advancement and overcome the limitations in technology scaling. This approach explores a new trade-off between energy or circuit costs versus application accuracy. A myriad of applications could tolerate trading off a little bit of accuracy without compromising their functionality or user experience. In multimedia applications for instance, a small proportion of errors remains imperceptible to humans.To design approximate systems, several approaches have been investigated at different hardware levels, such as voltagefrequency over-scaling [3] at physical level or significancebased memory protection [4] at algorithmic level. At circuit level, an interesting approach is to perform computations using approximate arithmetic operators, such as adders and multipliers, allowing a controlled and limited amount of errors against significant power saving or performance increase. This paper focuses on two of these techniques: Gate-level Pruning [5] and Inexact Speculative Adder [6], which have both demonstrated significant savings simultaneously in energy, delay and area at the cost of reasonable errors.
Abstract-Inexact or approximate circuits show great ability to reduce power consumption at the cost of occasional errors in comparison to their conventional counterparts. Even though the benefits of such circuits have been proven for many applications, they are not wide spread owing to the absence of a clear design methodology and the required CAD tools. In this regard, this paper presents a methodology to automatically generate inexact circuits starting from a conventional design by adding only one small step in the digital design flow. Further, this paper also demonstrates that achieving pruning at gate-level can lead to substantial savings in terms of power consumption, critical path delay and silicon area. An order of magnitude area and power savings is demonstrated for a 64-bit gate-level pruned high-speed adder for a 10 % relative error magnitude.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.