Jinwei Xu scite author profile

Deep convolutional neural networks (CNNs) have gained great success in various computer vision applications. State-of-the-art CNN models for large-scale applications are computation intensive and memory expensive and, hence, are mainly processed on high-performance processors like server CPUs and GPUs. However, there is an increasing demand of high-accuracy or real-time object detection tasks in large-scale clusters or embedded systems, which requires energy-efficient accelerators because of the green computation requirement or the limited battery restriction. Due to the advantages of energy efficiency and reconfigurability, Field-Programmable Gate Arrays (FPGAs) have been widely explored as CNN accelerators. In this article, we present an in-depth analysis of computation complexity and the memory footprint of each CNN layer type. Then a scalable parallel framework is proposed that exploits four levels of parallelism in hardware acceleration. We further put forward a systematic design space exploration methodology to search for the optimal solution that maximizes accelerator throughput under the FPGA constraints such as on-chip memory, computational resources, external memory bandwidth, and clock frequency. Finally, we demonstrate the methodology by optimizing three representative CNNs (LeNet, AlexNet, and VGG-S) on a Xilinx VC709 board. The average performance of the three accelerators is 424.7, 445.6, and 473.4GOP/s under 100MHz working frequency, which outperforms the CPU and previous work significantly.

show abstract

Discrimination between neoplastic and non-neoplastic lesions in cirrhotic liver using contrast-enhanced ultrasound

Lu²,

Liu³

et al. 2012

BJR

View full text Add to dashboard Cite

show abstract

A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs

Liu

Chow

et al. 2019

Electronics

View full text Add to dashboard Cite

Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU.

show abstract

Automatic code generation of convolutional neural networks in FPGA implementation

Liu

Dou

Jiang

et al. 2016

View full text Add to dashboard Cite

Personalized identification of abdominal wall hernia meshes on computed tomography

Pham¹,

Le²,

Xu³

et al. 2014

Computer Methods and Programs in Biomedicine

View full text Add to dashboard Cite

CaFPGA: An automatic generation model for CNN accelerator

Liu

Jiang

et al. 2018

Microprocessors and Microsystems

View full text Add to dashboard Cite

Comparison of resonant tunneling in a double-quantum-well three-barrier system and a single-quantum-well double-barrier system

Vv²,

1993

Phys. Rev. B

View full text Add to dashboard Cite

A theoretical analysis has been carried out to compare the tunneling processes in a double-quantumwell three-barrier {DQW-TB) system and a single-quantum-well double-barrier {SQW-DB) system. Based on a general WKB formula, it is shown that a symmetric DQW-TB system with transparencymatched barriers is far superior to the SQW-DB system in a number of aspects, including the peak current, the peak-to-valley ratio, and the speed limit.

show abstract

A Simplified Speaker Recognition System Based on FPGA Platform

Jiang

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Speaker recognition is a crucial bio-identification technology, which is extensively used in our daily life. With the development of deep learning, convolutional neural networks (CNNs) are applied to speaker recognition tasks given their excellent performance. However, in real life, speaker recognition systems are frequently deployed on end-devices. Therefore, while obtaining recognition accuracy, the model of speaker recognition is expected to be as simple as possible. Inspired by 1-max pooling CNN and Gaussian mixture model-universal background model (GMM-UBM), this study proposes a one dimension convolutional neural networks (1D CNN) on the basis of original 2D CNN. The proposed model reduces the computational complexity of ResNet20 by 64% and the amount of parameters by 53%. In comparison with the original ResNet20 models, the recognition accuracy will be reduced by about one percent on the 15s data set. Then, on the basis of the 1D CNN, we propose a pyramid layer-folding pipeline structure and implement it on the Xilinx VC709 platform. According to the time-dimension partition, the proposed pyramid pipeline structure can process speech data of various lengths. Moreover, our accelerator is 5.1× faster on 3s dataset and 6.8× quicker on 15s dataset than those of the CPU platform.INDEX TERMS Speaker recognition,1D convolution neural networks, pyramid pipeline, folding pipeline, FPGA.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jinwei Xu

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

Discrimination between neoplastic and non-neoplastic lesions in cirrhotic liver using contrast-enhanced ultrasound

A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs

Automatic code generation of convolutional neural networks in FPGA implementation

Personalized identification of abdominal wall hernia meshes on computed tomography

CaFPGA: An automatic generation model for CNN accelerator

Comparison of resonant tunneling in a double-quantum-well three-barrier system and a single-quantum-well double-barrier system

A Simplified Speaker Recognition System Based on FPGA Platform

Contact Info

Product

Resources

About