Yukui Luo scite author profile

Convolutional neural networks (CNNs) based deep learning algorithms require high data flow and computational intensity. For real-time industrial applications, they need to overcome challenges such as high data bandwidth requirement and power consumption on hardware platforms. In this work, we have analyzed in detail the data dependency in the CNN accelerator and propose specific pipelined operations and data organized manner to design a high throughput CNN accelerator on FPGA. Besides, we have optimized the kernel operations to obtain a high power efficiency. The proposed CNN accelerator supports image classification and real-time object detection with high accuracy. The evaluation results show that our CNNbased FPGA accelerator can achieve 740 Giga operations per second (GOPS) at 200 MHz with kernel power of 12.2 watts on Intel Arria 10 FPGA. For object detection tasks, our system can achieve 105 fps with 56.5 mAP or 25 fps with 73.6 mAP on VOC dataset. Since we use the mixed fixed-point data representation, the detection accuracy is comparable with the GPU-based YOLO V2 framework. The power efficiency of our system is ∼ 3.3× better than Titan X GPU and ∼ 418× better than Intel E5-2620 V4 CPU.

show abstract

A High-Performance and Secure TRNG Based on Chaotic Cellular Automata Topology

Luo

Wang

Best

et al. 2020

IEEE Trans. Circuits Syst. I

View full text Add to dashboard Cite

HILL: A Hardware Isolation Framework Against Information Leakage on Multi-Tenant FPGA Long-Wires

Luo

2019

View full text Add to dashboard Cite

Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

Sun

Luo

et al. 2020

Electronics

View full text Add to dashboard Cite

Standard convolutional neural networks (CNNs) have large amounts of data redundancy, and the same accuracy can be obtained even in lower bit weights instead of floating-point representation. Most CNNs have to be developed and executed on high-end GPU-based workstations, for which it is hard to transplant the existing implementations onto portable edge FPGAs because of the limitation of on-chip block memory storage size and battery capacity. In this paper, we present adaptive pointwise convolution and 2D convolution joint network (AP2D-Net), an ultra-low power and relatively high throughput system combined with dynamic precision weights and activation. Our system has high performance, and we make a trade-off between accuracy and power efficiency by adopting unmanned aerial vehicle (UAV) object detection scenarios. We evaluate our system on the Zynq UltraScale+ MPSoC Ultra96 mobile FPGA platform. The target board can get the real-time speed of 30 fps under 5.6 W, and the FPGA on-chip power is only 0.6 W. The power efficiency of our system is 2.8× better than the best system design on a Jetson TX2 GPU and 1.9× better than the design on a PYNQ-Z1 SoC FPGA.

show abstract

SGX-FPGA: Trusted Execution Environment for CPU-FPGA Heterogeneous Architecture

Xia

Luo

et al. 2021

View full text Add to dashboard Cite

NNReArch: A Tensor Program Scheduling Framework Against Neural Network Architecture Reverse Engineering

Luo¹,

Duan²,

Gongye³

et al. 2022

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yukui Luo

Mechanistic insights into organic carbon-driven water blackening and odorization of urban rivers

Heterogeneous system implementation of deep learning neural network for object detection in OpenCL framework

A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan X GPU

A High-Performance and Secure TRNG Based on Chaotic Cellular Automata Topology

HILL: A Hardware Isolation Framework Against Information Leakage on Multi-Tenant FPGA Long-Wires

Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

SGX-FPGA: Trusted Execution Environment for CPU-FPGA Heterogeneous Architecture

NNReArch: A Tensor Program Scheduling Framework Against Neural Network Architecture Reverse Engineering

Contact Info

Product

Resources

About