Neural network accelerators with low latency and low energy consumption are desirable for edge computing. To create such accelerators, we propose a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes. This flow covers both network training and FPGA-based network deployment, which facilitates the design space exploration and simplifies the tradeoff between network accuracy and computation efficiency. Using this flow helps hardware designers to deliver a network accelerator in edge devices under strict resource and power constraints. We present the proposed flow by supporting hybrid ELB settings within a neural network. Results show that our design can deliver very high performance peaking at 10.3 TOPS and classify up to 325.3 image/s/watt while running large-scale neural networks for less than 5W using embedded FPGA. To the best of our knowledge, it is the most energy efficient solution in comparison to GPU or other FPGA implementations reported so far in the literature.
Deep neural network (DNN) accelerators with improved energy and delay are desirable for meeting the requirements of hardware targeted for IoT and edge computing systems. Convolutional neural networks (CoNNs) belong to one of the most popular types of DNN architectures. is paper presents the design and evaluation of an accelerator for CoNNs. e system-level architecture is based on mixed-signal, cellular neural networks (CeNNs). Speci cally, we present (i) the implementation of di erent layers, including convolution, ReLU, and pooling, in a CoNN using CeNN, (ii) modi ed CoNN structures with CeNN-friendly layers to reduce computational overheads typically associated with a CoNN, (iii) a mixed-signal CeNN architecture that performs CoNN computations in the analog and mixed signal domain, and (iv) design space exploration that identi es what CeNN-based algorithm and architectural features fare best compared to existing algorithms and architectures when evaluated over common datasets -MNIST and CIFAR-10. Notably, the proposed approach can lead to 8.7× improvements in energy-delay product (EDP) per digit classi cation for the MNIST dataset at iso-accuracy when compared with the state-of-the-art DNN engine, while our approach could o er 4.3× improvements in EDP when compared to other network implementations for the CIFAR-10 dataset.When considering application-speci c hardware to support neural networks, it is important that said hardware can implement networks that can be extensible to a large class of networks, and solve a large collection of application-level problems. Deep neural networks (DNNs) represent a class of such networks and have demonstrated their strength in applications such as playing the game of Go [54], image and video analysis [32], target tracking [31], etc. In this paper, we use convolutional neural network (CoNN) as a case study for DNNs due to its general prevalence. CoNNs are computationally intensive, which could lead to high latency and energy for inference and even higher latency/energy for training. e focus of this paper is on developing a low energy/delay mixed-signal system based on cellular neural networks (CeNNs) for realizing CoNN.A Cellular Nonlinear/Neural Network (CeNN) is an analog computing architecture [11] that could be well suited for many information processing tasks. In a CeNN, identical processing units (called cells) process analog information in a concurrent manner. Interconnection between cells is typically local (i.e., nearest neighbor) and space-invariant. For spatio-temporal applications, CeNNs can o er vastly superior performance and power e ciency when compared to conventional von Neumann architectures [47,61]. Using "CeNNs for CoNN" allows the bulk of the computation associated with a CoNN to be performed in the analog domain. Sensed information could immediately be processed with no analog-to-digital conversion (ADC). Also, inference-based processing tasks can tolerate lower precision (e.g., Google's TPU employs 8-bit integer matrix multiplies [24]) typically associa...
A Cellular Neural Network (CNN) is a powerful processor that can significantly improve the performance of spatiotemporal applications such as pattern recognition, image processing, motion detection, when compared to the more traditional von Neumann architecture. In this paper, we show how tunneling field effect transistors (TFETs) can be utilized to enhance the performance of CNNs. Specifically, power consumption of TFET-based CNNs can be significantly lower when compared to MOSFET-based CNNs due to improved voltage controlled current sources (VCCSs) -an important component in CNN systems. We demonstrate that CNNs can benefit from low power conventional linear VCCSs implemented via TFETs. We also show that TFETs can be useful to realize non-linear VCCSs, which are either not possible or exhibit degraded performance when implemented via CMOS. Such non-linear VCCSs help to improve the performance of certain CNN operations (e.g., global maximum/minimum). We provide two case studiesimage contrast enhancement and maximum row selectionthat illustrate the benefits of non-linear VCCSs (e.g., reduced computation time, energy dissipation, etc.) when compared to CMOS-based approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.