Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Camus, Vincent; Mei, Linyan; Enz, Christian; Verhelst, Marian

doi:10.1109/jetcas.2019.2950386

Cited by 59 publications

(35 citation statements)

References 31 publications

(55 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Content may change prior to final publication. (12). For a MAC operation that uses TEC-enabled FWBMs, the formulation in (12) can generate a high-accuracy bias term for each FWBM stage.…”

Section: ) Derivation For Mode 1 Tec Schemementioning

confidence: 99%

“…(12). For a MAC operation that uses TEC-enabled FWBMs, the formulation in (12) can generate a high-accuracy bias term for each FWBM stage. In terms of MP, TPmajor, and TPminor, the MAC operation in (2) can be rewritten as (13).…”

Section: ) Derivation For Mode 1 Tec Schemementioning

confidence: 99%

“…In terms of MP, TPmajor, and TPminor, the MAC operation in (2) can be rewritten as (13). The proposed TEC scheme (Mode 1) truncates the N TPminor terms in (13) and uses their estimated value (expressed in (12)) as a substitute. In (13), the original (Eacc+L)-bit MAC result in (8) can be revised as (14), where a global bias (i.e., BM1) is introduced in the TEC operation using (12).…”

Section: ) Derivation For Mode 1 Tec Schemementioning

confidence: 99%

“…The proposed TEC scheme (Mode 1) truncates the N TPminor terms in (13) and uses their estimated value (expressed in (12)) as a substitute. In (13), the original (Eacc+L)-bit MAC result in (8) can be revised as (14), where a global bias (i.e., BM1) is introduced in the TEC operation using (12). In practice, the BM1 value in (14) requires only fractional precision for the w digit using (15), in which a Fw{.}…”

Section: ) Derivation For Mode 1 Tec Schemementioning

confidence: 99%

“…Accordingly, Mode 2 TEC scheme is proposed to provide high-accuracy biasing for positive patterns, while disable biasing for zero patterns in each FWBM. Such a function can be directly offered using (12) Table II can be approximated, as shown in (16), based on the probability distribution for nonzero dj in Table I. Consequently, the original E[TPminor] value in (12) can be approximated using (17), where δj indicates whether dj is a nonzero term (i.e., δj = 1 for nonzero dj), which can be mapped to the exclusive-OR operation of (b1, b0) bits in Table I.…”

Section: ) Derivation For Mode 2 Tec Schemementioning

confidence: 99%

See 4 more Smart Citations

A High-Accuracy Hardware-Efficient Multiply–Accumulate (MAC) Unit Based on Dual-Mode Truncation Error Compensation for CNNs

Tang

Han

2020

IEEE Access

View full text Add to dashboard Cite

This paper presents a multiply-accumulate (MAC) unit that enables a dual-mode truncation error compensation (TEC) scheme based on a fixed-width Booth multiplier (FWBM) for convolutional neural network (CNN) inference operations. The proposed tailored TEC schemes of Modes 1 and 2 can achieve high MAC accuracy for a general or rectified linear unit-based CNN model with general (Mode 1) or positive/zero (Mode 2) input patterns. By pre-calculating the pre-known CNN model coefficients, the proposed dual-mode TEC scheme can be realized using minimal partial product operations with high hardware efficiency using a software-hardware codesign approach. Further, a reconfigurable architecture of the resultant MAC unit is presented to realize the proposed dual-mode TEC scheme. By evaluating the accuracy for 9-N and 25-N MAC operations (N denotes the number of times MAC is performed), a MAC operation using the proposed TEC scheme can achieve the highest accuracy for Modes 1 and 2, relative to contrast samples that directly employ the FWBM with a conventional TEC function. The hardware performances of 9-N and 25-N MAC units are also evaluated using the TSMC 40-nm standard cell library. Compared with the contrast TEC-enabled designs, the proposed MAC unit exhibits higher hardware efficiency in terms of area, delay, and power consumption and achieves a minimum reduction of more than 40% in both area-delay-error and power-delay-error products. Moreover, the resultant 9-N and 25-N MAC units are verified using a system-on-chip field-programmable gate array platform to test a CNN model for handwritten digit classification.

show abstract

“…Content may change prior to final publication. (12). For a MAC operation that uses TEC-enabled FWBMs, the formulation in (12) can generate a high-accuracy bias term for each FWBM stage.…”

Section: ) Derivation For Mode 1 Tec Schemementioning

confidence: 99%