FP-BNN: Binarized neural network on FPGA

Liang, Shuang; Yin, Shouyi; Liu, Leibo; Luk, Wayne; Wei, Shaojun

doi:10.1016/j.neucom.2017.09.046

Cited by 233 publications

(149 citation statements)

References 14 publications

Supporting

Mentioning

148

Contrasting

Unclassified

Order By: Relevance

“…By comparing Wang et al [144] and Zhao et al's [158] CIFAR-10-targe ing CNN implementations with the Going Deeper [111], fpgaConvNet [142] and FP-BNN [83] ImageNet CNNs, all of which used FPGAs of similar scales, we can observe that, as precision is reduced, linear or even superlinear throughput increases can be achieved. Superlinear increases can be explained using the roo ine modelling in Section 3.…”

Section: Throughputmentioning

confidence: 99%

Deep Neural Network Approximation for Custom Hardware

et al. 2019

Self Cite

View full text Add to dashboard Cite

LondonDeep neural networks have proven to be particularly e ective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardwareoriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy e ciency. Application-tailored accelerators, when co-designed with approximation-based network training methods, transform large, dense and computationally expensive networks into small, sparse and hardware-e cient alternatives, increasing the feasibility of network deployment. In this article, we provide a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their e ectiveness for custom hardware implementation. We also include proposals for future research based on a thorough analysis of current trends. is article represents the rst survey providing detailed comparisons of custom hardware accelerators featuring approximation for both convolutional and recurrent neural networks, through which we hope to inspire exciting new developments in the eld.

show abstract

Section: Throughputmentioning

confidence: 99%

Deep Neural Network Approximation for Custom Hardware

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [19], the authors proposed architectural changes to the Intel ALM carry-chains such that large compressors like (6:2) and (7:2) can be efficiently mapped to single ALMs. Although their proposed compressor is very efficient, for modern applications such as BNN popcounting [13], these compressors would be significantly underutilized. Similarly, Kim et.…”

Section: Related Workmentioning

confidence: 99%

“…One example of interest is that compressor trees and GPCs can be used to accelerate the XnorPopcount operations within binarized neural networks (BNNs) [1], which forms the critical path of the model's execution. BNNs enable neural networks to be utilized in resource constrained applications and can be deployed efficiently on FPGAs [13,34]; our optimizations would improve their performance further.…”

Section: Introductionmentioning

confidence: 99%

Luxor

Rasoulinezhad

Siddhartha

Zhou

et al. 2020

Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

View full text Add to dashboard Cite

We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the first tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and we refer to it as LUXOR. We also consider a secondary tier of vendor-specific modifications to both Xilinx and Intel FPGAs, which we refer to as X-LUXOR+ and I-LUXOR+ respectively. We demonstrate that compressor tree synthesis using generalized parallel counters (GPCs) is further improved with the proposed modifications. Using both the Intel adaptive logic module and the Xilinx slice at the 65nm technology node for a comparative study, it is shown that the silicon area overhead is less than 0.5% for LUXOR and 5-6% for LUXOR+, while the delay increments are 1-6% and 3-9% respectively. We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.

show abstract

“…Since M has to be an odd number, by choosing M equal to the kernel size, which is also an odd number, and applying majority logic on the pairs placed in the same channel in a row or a column, folding is possible. Also, three is the most common kernel size for Conv layers in modern BNNs [6].…”

Section: B Xnormaj Techniquementioning

confidence: 99%

MajorityNets: BNNs Utilising Approximate Popcount for Improved Efficiency

Rasoulinezhad¹,

Fox²,

Zhou³

et al. 2019

2019 International Conference on Field-Programmable Technology (ICFPT)

View full text Add to dashboard Cite

Binarized neural networks (BNNs) have shown exciting potential for utilising neural networks in embedded implementations where area, energy and latency constraints are paramount. With BNNs, multiply-accumulate (MAC) operations can be simplified to XnorPopcount operations, leading to massive reductions in both memory and computation resources. Furthermore, multiple efficient implementations of BNNs have been reported on field-programmable gate array (FPGA) implementations. This paper proposes a smaller, faster, more energy-efficient approximate replacement for the XnorPopcount operation, called XNorMaj, inspired by state-of-the-art FPGA look-up table schemes which benefit FPGA implementations. We show that XNorMaj is up to 2× more resource-efficient than the XnorPopcount operation. While the XNorMaj operation has a minor detrimental impact on accuracy, the resource savings enable us to use larger networks to recover the loss.

show abstract

FP-BNN: Binarized neural network on FPGA

Cited by 233 publications

References 14 publications

Deep Neural Network Approximation for Custom Hardware

Deep Neural Network Approximation for Custom Hardware

Luxor

MajorityNets: BNNs Utilising Approximate Popcount for Improved Efficiency

Contact Info

Product

Resources

About