LUTNet: Rethinking Inference in FPGA Soft Logic

Wang, Erwei; Davis, James J.; Cheung, Peter Y. K.; Constantinides, George A.

doi:10.1109/fccm.2019.00014

Cited by 46 publications

(52 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We suspect that this is due to our current restriction on the form of the function g m in (3), i.e. {−1, 1} K−P ×{−1, 1} P → {−1, 1} rather than {−1, 1} K−P × {−1, 1} P → N. This makes (9) insoluble whenĉ d andp (m,t) are restricted to binary values. We can overcome this, and potentially make even more efficient use of the underlying FPGA fabric, by learning the popcount circuitry along with our XNOR substitutes, replacing the summation as well as w n x n in (1).…”

Section: Limitationsmentioning

confidence: 99%

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

Wang

Davis

Cheung

et al. 2020

IEEE Trans. Comput.

Self Cite

View full text Add to dashboard Cite

Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantised down to binary values. Network binarisation on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, is capable of implementing far more than an XNOR: it can perform any K-input Boolean operation. Inspired by this observation, we propose LUTNet, an end-to-end hardware-software framework for the construction of area-efficient FPGA-based neural network accelerators using the native LUTs as inference operators. We describe the realisation of both unrolled and tiled LUTNet architectures, with the latter facilitating smaller, less power-hungry deployment over the former while sacrificing area and energy efficiency along with throughput. For both varieties, we demonstrate that the exploitation of LUT flexibility allows for far heavier pruning than possible in prior works, resulting in significant area savings while achieving comparable accuracy. Against the state-of-the-art binarised neural network implementation, we achieve up to twice the area efficiency for several standard network models when inferencing popular datasets. We also demonstrate that even greater energy efficiency improvements are obtainable. IndexTerms-Deep neural network, hardware architecture, field-programmable gate array, lookup table. ! 1. https://github.com/awai54st/LUTNet

show abstract

Section: Limitationsmentioning

confidence: 99%

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

Wang

Davis

Cheung

et al. 2020

IEEE Trans. Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

“…(6) getTanh(double) is similar but uses an array of doubles. (7) BNNKernel is a small binarised neural network [27].…”

Section: Benchmarksmentioning

confidence: 99%

Combining Dynamic & Static Scheduling in High-level Synthesis

Cheng

Josipović

Constantinides

et al. 2020

Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Self Cite

View full text Add to dashboard Cite

A central task in high-level synthesis is scheduling: the allocation of operations to clock cycles. The classic approach to scheduling is static, in which each operation is mapped to a clock cycle at compile-time, but recent years have seen the emergence of dynamic scheduling, in which an operation's clock cycle is only determined at run-time. Both approaches have their merits: static scheduling can lead to simpler circuitry and more resource sharing, while dynamic scheduling can lead to faster hardware when the computation has non-trivial control flow. In this work, we seek a scheduling approach that combines the best of both worlds. Our idea is to identify the parts of the input program where dynamic scheduling does not bring any performance advantage and to use static scheduling on those parts. These statically-scheduled parts are then treated as black boxes when creating a dataflow circuit for the remainder of the program which can benefit from the flexibility of dynamic scheduling. An empirical evaluation on a range of applications suggests that by using this approach, we can obtain 74% of the area savings that would be made by switching from dynamic to static scheduling, and 135% of the performance benefits that would be made by switching from static to dynamic scheduling. CCS CONCEPTS • Hardware → High-level and register-transfer level synthesis; Logic synthesis; Modeling and parameter extraction.

show abstract

“…This is a natural choice because the underlying architecture is actually built of small physical Boolean lookup tables, each programmable to implement any one of the functions in B K , together with programmable interconnect able to connect these lookup tables in an effectively arbitrary topology (K = 6 is common). Wang et al [33] have recently begun to explore the potential for making use of the additional flexibility provided by these lookup tables. In this initial work -which we call LUTNet -we begin by taking a reasonably traditional approach, following [34]: some standard DNN benchmarks from the literature are quantised to use single-bit weights from {−1, +1}, and retrained to improve classification accuracy.…”

Section: The Discrete-continuous Divide: Preliminary Workmentioning

confidence: 99%

“…5). Using SGD in this discrete setting requires a lifting to a continuous interpolation, as described in detail in [33]. This is one way direction in which to cross the discrete-continuous divide; some possible approaches to crossing in the opposite direction are explored in Section 6.…”

Section: The Discrete-continuous Divide: Preliminary Workmentioning

confidence: 99%

Rethinking arithmetic for deep neural networks

Constantinides

2020

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

We consider efficiency in deep neural networks. Hardware accelerators are gaining interest as machine learning becomes one of the drivers of high-performance computing. In these accelerators, the directed graph describing a neural network can be implemented as a directed graph describing a Boolean circuit. We make this observation precise, leading naturally to an understanding of practical neural networks as discrete functions, and show that so-called binarised neural networks are functionally complete. In general, our results suggest that it is valuable to consider Boolean circuits as neural networks, leading to the question of which circuit topologies are promising. We argue that continuity is central to generalisation in learning, explore the interaction between data coding, network topology, and node functionality for continuity, and pose some open questions for future research. As a first step to bridging the gap between continuous and Boolean views of neural network accelerators, we present some recent results from our work on LUTNet, a novel Field-Programmable Gate Array inference approach. Finally, we conclude with additional possible fruitful avenues for research bridging the continuous and discrete views of neural networks. NotationR denotes the reals, and B = {⊥, } the set of Boolean truth values, where ⊥ denotes false and denotes true. ReLU : R → R is used to denote the rectified linear unit function x → max(0, x). σ : R → R denotes the sigmoid function x → 2 1+exp(−x) − 1. We denote function composition by •. B K denotes the set of all functions from B K to B. The set of integers is denoted by Z, and the set of integers bounded in absolute value n by Z n = {i ∈ Z| − n ≤ i ≤ n}.The following Boolean connectives are used: ¬ denotes negation, ∧ denotes conjunction, ∨ denotes disjunction, and ⊕ denotes exclusive or (XOR).

show abstract

LUTNet: Rethinking Inference in FPGA Soft Logic

Cited by 46 publications

References 12 publications

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

Combining Dynamic & Static Scheduling in High-level Synthesis

Rethinking arithmetic for deep neural networks

Contact Info

Product

Resources

About