2018
DOI: 10.1109/tcad.2018.2857019
|View full text |Cite
|
Sign up to set email alerts
|

XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference

Abstract: XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op BNN Inference CODES'18, October 2018, Torino, Italy for k_out in range(0, N_out): for k_in in range(0, N_in): for i in range(0, h_out): for j in range(0, w_out): y[k_out,i,j] = 0 for u_i in range(0, fs): for u_j in range(0, fs): y[k_out,i,j] += W[k_out,k_in,u_i,u_j] * x[k_in,i+u_i,j+u_j]

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
91
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
3

Relationship

1
9

Authors

Journals

citations
Cited by 108 publications
(91 citation statements)
references
References 45 publications
0
91
0
Order By: Relevance
“…By considering technology scaling, we see that the energy efficiency (in terms of TOP/s/W) of PPAC is comparable to that of the two fully-digital designs in [23], [24] but 7.9× and 2.3× lower than that of the mixed-signal designs in [6] and [19], respectively, where the latter is implemented in a comparable technology node as PPAC. As noted in Section III-D, mixedsignal designs are particularly useful for tasks that are resilient to noise or process variation, such as neural network inference.…”
Section: B Comparison With Existing Acceleratorsmentioning
confidence: 97%
“…By considering technology scaling, we see that the energy efficiency (in terms of TOP/s/W) of PPAC is comparable to that of the two fully-digital designs in [23], [24] but 7.9× and 2.3× lower than that of the mixed-signal designs in [6] and [19], respectively, where the latter is implemented in a comparable technology node as PPAC. As noted in Section III-D, mixedsignal designs are particularly useful for tasks that are resilient to noise or process variation, such as neural network inference.…”
Section: B Comparison With Existing Acceleratorsmentioning
confidence: 97%
“…Each lane consists of a First-In First-Out (FIFO) queue to buffer read and write data. An address generator based on the one presented by Schuiki et al [8] and Conti et al [9] assigns memory addresses to the stream-based accesses performed by the core. The lane can be put into read mode, in which case the address generator is used to fetch data from memory and store it in the FIFO.…”
Section: Data Movermentioning
confidence: 99%
“…Umuroglu et al [67] have created FINN, a framework for binarized Field Programmable Gate Array (FPGA) accelerators, which was further expanded to larger models by Fraser et al [23]. Other binarized accelerators have been proposed, both targeting FPGAs [49,51,72,76], Application-specific integrated circuit (ASIC) [2,10,18,63], and in-memory compute [11,36]. Yang et al [71] have developed BMXNet, an extension of MXNet [13] based on the binarized GEMM kernel.…”
Section: Binarized Neural Networkmentioning
confidence: 99%