TinBiNN: Tiny Binarized Neural Network Overlay in about 5,000 4-LUTs and 5mW

Lemieux, Guy; Vandergriendt, Joel; Severance, Aaron; Iaco, Ryan De; Raouf, Abdullah; Osman, Hussein Al; Tom, Watzka,; Singh, Satwant

doi:10.48550/arxiv.1903.06630

Cited by 3 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some acceleration approaches include using mobile GPUs [1], customdesigned application-specific integrated circuits (ASICs) [25][26][27][28], as well as FPGAs [2]. The FPGA acceleration of embedded neural network acceleration is of special interest since it can bring together the power performance benefits of dedicated circuits and the capability to deploy microarchitectures optimized for the target neural network model [29][30][31][32][33][34]. FPGA neural network accelerators demonstrated orders of magnitude higher power efficiency compared to general purpose processing units when applied to complex networks, such as image or video recognition [35,36].…”

Section: Embedded Neural Network Accelerationmentioning

confidence: 99%

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

Lim

Jun

2022

Electronics

View full text Add to dashboard Cite

Although neural network quantization is an imperative technology for the computation and memory efficiency of embedded neural network accelerators, simple post-training quantization incurs unacceptable levels of accuracy degradation on some important models targeting embedded systems, such as MobileNets. While explicit quantization-aware training or re-training after quantization can often reclaim lost accuracy, this is not always possible or convenient. We present an alternative approach to compressing such difficult neural networks, using a novel variant of the ZFP lossy floating-point compression algorithm to compress both model weights and inter-layer activations and demonstrate that it can be efficiently implemented on an embedded FPGA platform. Our ZFP variant, which we call ZFPe, is designed for efficient implementation on embedded accelerators, such as FPGAs, requiring a fraction of chip resources per bandwidth compared to state-of-the-art lossy compression accelerators. ZFPe-compressing the MobileNet V2 model with an 8-bit budget per weight and activation results in significantly higher accuracy compared to 8-bit integer post-training quantization and shows no loss of accuracy, compared to an uncompressed model when given a 12-bit budget per floating-point value. To demonstrate the benefits of our approach, we implement an embedded neural network accelerator on a realistic embedded acceleration platform equipped with the low-power Lattice ECP5-85F FPGA and a 32 MB SDRAM chip. Each ZFPe module consumes less than 6% of LUTs while compressing or decompressing one value per cycle, requiring a fraction of the resources compared to state-of-the-art compression accelerators while completely removing the memory bottleneck of our accelerator.

show abstract

Section: Embedded Neural Network Accelerationmentioning

confidence: 99%

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

Lim

Jun

2022

Electronics

View full text Add to dashboard Cite

show abstract

“…For 8-bits data retrieval, a custom extension based on the ISA RV32IMC and named LVE is defined in [14] and applied in [15]. It contains instructions for 16 and 8-bits data allocation as vector array in the scratchpad (SP) memory.…”

Section: A Optimize Dram Usagementioning

confidence: 99%

“…Almost all the implementations apply parallel processing one way or another. The RISC-V factor is evident in the custom extension for SIMD to manage parallel instructions, e. g. [15], [34], [29]. [35] introduces parallel processing for two RISC-V processors, one in a simplified setting for general purpose activity, and the second a more capable processor for the CNN execution.…”

Section: E Cpu-accelerator Configurationmentioning

confidence: 99%

A review of CNN accelerators for embedded systems based on RISC-V

Sanchez-Flores

Alvarez

Alorda

2022

2022 IEEE International Conference on Omni-Layer Intelligent Systems (COINS)

View full text Add to dashboard Cite

One of the great challenges of computing today is sustainable energy consumption. In the deployment of edge computing this challenge is particularly important considering the use of embedded equipment with limited energy and computation resources. In those systems, the energy consumption must be carefully managed to operate for long periods. Specifically, for embedded systems with machine learning capabilities in the Internet of Things (EMLIoT) era, the convolutional neural networks (CNN) model execution is energy challenging and requires massive data. Nowadays, high workload processing is designed separately into a host processor in charge of generic functions and an accelerator dedicated to executing the specific task. Open-hardware-based designs are pushing for new levels of energy efficiency. For achieving energy efficiency, open-source tools, such as the RISC-V ISA, have been introduced to optimize every internal stage of the system. This document aims to compare the EMLIoT accelerator designs based on RISC-V and highlights open topics for research.

show abstract

“…Finally, low-cost FPGAs enable professional practice. The iCE40UP5K powering the UPduino has 5280 logic elements, enough to implement interesting designs such as ARM or RISC-V processors [11], neural networks [12], audio synthesizers, or arcade games. Students develop their design with Verilog or VHDL (or the high-level HDL of your choice, such as Chisel, SpinalHDL, myHDL, etc.)…”

Section: Low-cost Fpgasmentioning

confidence: 99%

Reimagining the digital lab with $30 FPGAs

Bell

2023 ASEE Annual Conference &Amp; Exposition Proceedings

View full text Add to dashboard Cite

TinBiNN: Tiny Binarized Neural Network Overlay in about 5,000 4-LUTs and 5mW

Cited by 3 publications

References 0 publications

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

A review of CNN accelerators for embedded systems based on RISC-V

Reimagining the digital lab with $30 FPGAs

Contact Info

Product

Resources

About