A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan X GPU

Li, Shuai; Luo, Yukui; Sun, Kuangyuan; Yadav, Nandakishor; Choi, Kyu-Myung

doi:10.1109/access.2020.3000009

Cited by 41 publications

(14 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The flexibility of the FPGA hardware is used to create a dynamic model generation system based on the dataset using different softcore processors. Further evidence of FPGA-based deep learning acceleration is reported in [11][12][13][14][15]. As stated in [11], the authors used an FPGA to increase the speed of stochastic gradient descent in matrix factorization operations.…”

Section: Introductionmentioning

confidence: 92%

“…The FPGA-based solution offered a 15.3× speed-up over the GPU implementation with a 60× less data dependency reduction. The work reported in [12][13][14][15] achieved greater performance in object detection inference [15] and a reduction in overall MAC operations per layer [14]. Alternatively, work performed in [12] uses an FPGA to handle the parallel context of echo data from received laser signals at high speed by using deep learning.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accelerated Diagnosis of Novel Coronavirus (COVID-19)—Computer Vision with Convolutional Neural Networks (CNNs)

et al. 2022

View full text Add to dashboard Cite

Early detection and diagnosis of COVID-19, as well as the exact separation of non-COVID-19 cases in a non-invasive manner in the earliest stages of the disease, are critical concerns in the current COVID-19 pandemic. Convolutional Neural Network (CNN) based models offer a remarkable capacity for providing an accurate and efficient system for the detection and diagnosis of COVID-19. Due to the limited availability of RT-PCR (Reverse transcription-polymerase Chain Reaction) tests in developing countries, imaging-based techniques could offer an alternative and affordable solution to detect COVID-19 symptoms. This paper reviewed the current CNN-based approaches and investigated a custom-designed CNN method to detect COVID-19 symptoms from CT (Computed Tomography) chest scan images. This study demonstrated an integrated method to accelerate the process of classifying CT scan images. In order to improve the computational time, a hardware-based acceleration method was investigated and implemented on a reconfigurable platform (FPGA). Experimental results highlight the difference between various approximations of the design, providing a range of design options corresponding to both software and hardware. The FPGA-based implementation involved a reduced pre-processed feature vector for the classification task, which is a unique advantage of this particular application. To demonstrate the applicability of the proposed method, results from the CPU-based classification and the FPGA were measured separately and compared retrospectively.

show abstract

Section: Introductionmentioning

confidence: 92%

Section: Introductionmentioning

confidence: 99%

Accelerated Diagnosis of Novel Coronavirus (COVID-19)—Computer Vision with Convolutional Neural Networks (CNNs)

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Implementing CNN accelerator efficiently on FPGA is challenging, as most of the workload is focused in the heavy and repetitive convolution layers of CNN. Most recent works ( [28], [29], [30]) revolve around optimizing the CNN loop (Refer to Equation 1). Such techniques study the effect of unrolling at different loop level, loop tiling and loop interchanging.…”

Section: E Related Workmentioning

confidence: 99%

Cryptensor: A Resource-Shared Co-processor to Accelerate Convolutional Neural Network and Polynomial Convolution

See¹,

Ng²,

Tan³

et al. 2022

Preprint

View full text Add to dashboard Cite

<p>Practical deployment of convolutional neural net?work (CNN) and cryptography algorithm on constrained devices are challenging due to the huge computation and memory requirement. Developing separate hardware accelerator for AI and cryptography incur large area consumption, which is not desirable in many applications. This paper proposes a viable solution to this issue by expressing the CNN and cryptography as Generic-Matrix-Multiplication (GEMM) operations and map them to the same accelerator for reduced hardware consumption. A novel systolic tensor array (STA) design was proposed to reduce the data movement, effectively reducing the operand registers by 2×. Two novel techniques, input layer extension and polynomial factorization, are proposed to mitigate the under-utilization issue found in existing STA architecture. Additionally, the Tensor Processing Element (TPE) is fused using DSP unit to reduce the Look-Up Table (LUT) and Flip-Flops (FF) consumption for implementing multipliers. On top of that, a novel memory efficient factorization technique is proposed to allow computation of polynomial convolution on the same STA. Experimental results show that Cryptensor achieved 22.3% better throughput for VGG-16 implementation on XC7Z020 FPGA; 95.0% lesser LUT when implementing on XC7Z045 compared to state-of-the-art result. Cryptensor can also flexibly support multiple security levels in NTRU scheme, with no additional hardware. The proposed hardware unifies the computation of two different domains that are critical for IoT applications, which greatly reduces the hardware consumption on edge nodes. </p>

show abstract

“…Lenet, Alexnet and VGGNet are the most popular CNNs used in the FPGA implementation. However, the power consumptions, in general, are compared with either processor, GPU, or PC implementations, which is not a fair comparison [16], [18], [19]. Since FPGAs are inherently energy-efficient devices, a fair comparison should be done between FPGA implementations.…”

Section: Introductionmentioning

confidence: 99%

An Energy-Efficient FPGA-based Convolutional Neural Network Implementation

Irmak

Alachiotis

Ziener

2021

2021 29th Signal Processing and Communications Applications Conference (SIU)

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) are a very popular class of artificial neural networks. Current CNN models provide remarkable performance and accuracy in image processing applications. However, their computational complexity and memory requirements are discouraging for embedded realtime applications. This paper proposes a highly optimized CNN accelerator for FPGA platforms. The accelerator is designed as a LeNet CNN architecture focusing on minimizing resource usage and power consumption. Moreover, the proposed accelerator shows more than 2x higher throughput in comparison with other FPGA LeNet accelerators with reaching up 14 K images/sec. The proposed accelerator is implemented on the Nexys DDR 4 board and the power consumption is less than 700 mW which is 3x lower than the current LeNet architectures. Therefore, the proposed solution offers higher energy efficiency without sacrificing the throughput of the CNN.

show abstract

A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan X GPU

Cited by 41 publications

References 30 publications

Accelerated Diagnosis of Novel Coronavirus (COVID-19)—Computer Vision with Convolutional Neural Networks (CNNs)

Accelerated Diagnosis of Novel Coronavirus (COVID-19)—Computer Vision with Convolutional Neural Networks (CNNs)

Cryptensor: A Resource-Shared Co-processor to Accelerate Convolutional Neural Network and Polynomial Convolution

An Energy-Efficient FPGA-based Convolutional Neural Network Implementation

Contact Info

Product

Resources

About