A High-Accuracy Hardware-Efficient Multiply–Accumulate (MAC) Unit Based on Dual-Mode Truncation Error Compensation for CNNs

Tang, Song-Nien; Han, Yu-Shin

doi:10.1109/access.2020.3040366

Cited by 6 publications

(2 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This approach reduces the area and power consumption of the used PEs. The authors of [50] proposed a hardware-efficient implementation of MAC unit based on the Booth multiplier with dual-mode truncation error compensation for convolutional neural networks. Work [51] presents a low-power MAC unit with integration of additions into the partial products reduction.…”

Section: B Processing Elements Modificationmentioning

confidence: 99%

Modern Trends in Improving the Technical Characteristics of Devices and Systems for Digital Image Processing

Nagornov,

Lyakhov,

Bergerman

et al. 2024

IEEE Access

View full text Add to dashboard Cite

The technology development greatly increases the amount of digital visual information. Existing devices cannot efficiently process such huge amounts of data. The technical characteristics of digital image processing (DIP) devices and systems are being actively improved to resolve this contradiction in science and technology. The state-of-the-art methodology includes a huge number of very diverse approaches at the mathematical, software, and hardware implementation levels. We have analyzed all modern trends to improve the technical characteristics of DIP devices and systems. The main distinguishing feature of this review is that we are not limited to considering various aspects of neural network image processing, to which the vast majority of both review and research papers on the designated topic are devoted. Review papers on the subject under consideration are analyzed. Various mathematical and arithmetic-logical methods for improving the characteristics of image processing devices are described in detail. Original and significant architectural and structural solutions are analyzed. Promising neural network models of visual data processing are characterized. Hardware platforms for the design and operation of DIP systems that are efficient in terms of resource costs are considered. The most significant improvements achieved through the hardware implementation of models and methods on field-programmable gate arrays and application-specific integrated circuits are noted.INDEX TERMS High-performance computing, low-area design, low-power device, energy-efficient architecture, neural network, hardware accelerator, FPGA, ASIC.

show abstract

Section: B Processing Elements Modificationmentioning

confidence: 99%

Modern Trends in Improving the Technical Characteristics of Devices and Systems for Digital Image Processing

Nagornov,

Lyakhov,

Bergerman

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Multipliers are widely used in many digital operation systems. To limit bit-width increases in data paths, fixed-width multipliers are accordingly employed as arithmetic modules for digital signal processing, communication baseband operations, and neural network acceleration [1][2][3][4]. L-bit fixed-width multipliers generate the same L-bit output width as the L-bit operand, of which the Baugh-Wooley (array) multiplier and Booth multiplier are two of the most popular types.…”

Section: Introductionmentioning

confidence: 99%

An Accuracy-Improved Fixed-Width Booth Multiplier Enabling Bit-Width Adaptive Truncation Error Compensation

et al. 2021

Self Cite

View full text Add to dashboard Cite

Fixed-width Booth multipliers (FWBMs) generate a product with the same bit width as the operand and have been extensively employed in many digital systems. Various truncation error compensation (TEC) schemes have been presented for FWBM designs, aiming to reduce hardware costs while preserving operation accuracy. In general, the existing TEC methods function adequately for an exact bit width of the operand but fail to consider the TEC effect for FWBM inputs with various bit-width levels. To address this issue, we propose a bit-width adaptive TEC (BWATEC) scheme for providing high-accuracy TEC functions that are adaptive to the multiple Lʹ-bit numerical ranges of input data for an L-bit FWBM (Lʹ ≤ L). We also present adjustable architecture for a 16-bit FWBM to enable the proposed BWATEC scheme and evaluate the hardware performance, using the TSMC 40 nm standard cell library. Relative to the contrast 16-bit FWBM approaches that use state-of-the-art TEC methods, the proposed BWATEC-enabled FWBM design can achieve reductions in the area-delay-error product of 7.9%–50.9%, 17.1%–69.5%, 29.9%–82.2%, and 100% for the 14-bit, 12-bit, 10-bit, and 8-bit inputs, respectively. Moreover, the resultant 16-bit FWBM with BWATEC was verified by using the field-programmable gate array for convolutional neural network acceleration.

show abstract

Low‐latency and power‐efficient row‐based binary‐weighted compensator for fixed‐width Booth multiplier

Li,

Hu,

Chen

et al. 2024

Circuit Theory & Apps

View full text Add to dashboard Cite

Fixed‐width Booth multiplier (FWBM) plays a significant role in the arouse of approximate computing (AC) field. In this paper, a row‐based binary‐weighted compensator (RBC) for fixed‐width Booth multiplication is proposed. The derived binary‐weighted close‐form minimizes the conversion loss and hardware cost. With the proposed close‐form, the partial product array can be reduced dramatically. Consequently, the compact FWBM with the proposed RBC not only shortens the critical path to at least 24% but also minimizes the power dissipation to at least 44%. Moreover, the proposed RBC outperforms the state‐of‐art with a maximum merit improvement of 39%. By implementing the proposed RBC‐FWBM in the FIR filter, we manage to demonstrate the practicality of the proposed design with a significant reduction in power‐dissipation and delay while maintaining high accuracy.

show abstract

A High-Accuracy Hardware-Efficient Multiply–Accumulate (MAC) Unit Based on Dual-Mode Truncation Error Compensation for CNNs

Cited by 6 publications

References 41 publications

Modern Trends in Improving the Technical Characteristics of Devices and Systems for Digital Image Processing

Modern Trends in Improving the Technical Characteristics of Devices and Systems for Digital Image Processing

An Accuracy-Improved Fixed-Width Booth Multiplier Enabling Bit-Width Adaptive Truncation Error Compensation

Low‐latency and power‐efficient row‐based binary‐weighted compensator for fixed‐width Booth multiplier

Contact Info

Product

Resources

About