2018 28th International Conference on Field Programmable Logic and Applications (FPL) 2018
DOI: 10.1109/fpl.2018.00035
|View full text |Cite
|
Sign up to set email alerts
|

Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA

Abstract: Neural network accelerators with low latency and low energy consumption are desirable for edge computing. To create such accelerators, we propose a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes. This flow covers both network training and FPGA-based network deployment, which facilitates the design space exploration and simplifies the tradeoff between network accuracy and computation efficiency. Using this flow helps hardware d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
61
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 90 publications
(62 citation statements)
references
References 16 publications
0
61
0
1
Order By: Relevance
“…Our design achieves a per-image latency of 2.28 ms, which is among the lowest across all the designs. In addition, compared with some of the most recent works [40,47], our design outperforms them by 5.64× and 3.26×, respectively, in term of energy efficiency. Additionally, compared to an implementation which achieves comparable low latency [29], our implementation has 9.29x higher energy efficiency.…”
Section: Comparing To Prior Fpga Acceleratorsmentioning
confidence: 73%
See 1 more Smart Citation
“…Our design achieves a per-image latency of 2.28 ms, which is among the lowest across all the designs. In addition, compared with some of the most recent works [40,47], our design outperforms them by 5.64× and 3.26×, respectively, in term of energy efficiency. Additionally, compared to an implementation which achieves comparable low latency [29], our implementation has 9.29x higher energy efficiency.…”
Section: Comparing To Prior Fpga Acceleratorsmentioning
confidence: 73%
“…Our model in Table 2 is significantly smaller and all weights (including weights in batch normalization layers) are quantized to power of two numbers. Our accuracy is 50.84% (about 2% worse than nearest competitive designs [40] in terms of energy efficiency). However, our implementation has at least 3x higher energy efficiency.…”
Section: Comparing To Prior Fpga Acceleratorsmentioning
confidence: 76%
“…In [49], the optimized solution of a network is chosen layer by layer to avoid an exponential design space exploration. Wang et al [64] try to use large bit-width for only the rst and last layer and quantize the middle layers to ternary or binary. e method needs to increase the network size to keep high accuracy but still brings hardware performance improvement.…”
Section: Data Antizationmentioning
confidence: 99%
“…The whole task requires designers to have a deep understanding of both DNN algorithms and hardware design. In response to the intense demands and challenges of designing DNN accelerators, we have seen rapid development of high-level synthesis (HLS) design flow [22][23][24][25] and DNN design automation frameworks [16,[26][27][28][29][30] that improve the hardware design efficiency by allowing DNN accelerator design from high-level algorithmic descriptions and using pre-defined high-quality hardware IPs. Still, they either rely on hardware experts to trim down the large design space (e.g., use pre-defined/fixed architecture templates and explore other factors [16,29]) or conduct merely limited design exploration and optimization, hindering the development of optimal DNN accelerators that can be deployed into various platforms.…”
Section: Introductionmentioning
confidence: 99%