Jason Cong scite author profile

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, and fast development round, etc. Although current FPGA accelerators have demonstrated better performance over generic processors, the accelerator design space has not been well exploited. One critical problem is that the computation throughput may not well match the memory bandwidth provided an FPGA platform. Consequently, existing approaches cannot achieve best performance due to underutilization of either logic resource or memory bandwidth. At the same time, the increasing complexity and scalability of deep learning applications aggravate this problem. In order to overcome this problem, we propose an analytical design scheme using the roofline model. For any solution of a CNN design, we quantitatively analyze its computing throughput and required memory bandwidth using various optimization techniques, such as loop tiling and transformation. Then, with the help of roofline model, we can identify the solution with best performance and lowest FPGA resource requirement. As a case study, we implement a CNN accelerator on a VC707 FPGA board and compare it to previous approaches. Our implementation achieves a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

show abstract

Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

Zhang

Sun

Fang

et al. 2019

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

271

276

View full text Add to dashboard Cite

High-Level Synthesis for FPGAs: From Prototyping to Deployment

Cong

Liu

Neuendorffer

et al. 2011

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

626

266

View full text Add to dashboard Cite

Abstract-Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS methodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.Index Terms-Domain-specific design, field-programmable gate array (FPGA), high-level synthesis (HLS), quality of results (QoR).

show abstract

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs

Wei¹,

Zhang³

et al. 2017

320

163

View full text Add to dashboard Cite

Minimizing Computation in Convolutional Neural Networks

2014

View full text Add to dashboard Cite

CMP network-on-chip overlaid with multi-band RF-interconnect

Chang¹,

et al. 2008

View full text Add to dashboard Cite

show abstract

SACNN: Self-Attention Convolutional Neural Network for Low-Dose CT Denoising With Self-Supervised Perceptual Loss Network

Hsu

Xie

et al. 2020

IEEE Trans. Med. Imaging

197

129

View full text Add to dashboard Cite

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jason Cong

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

High-Level Synthesis for FPGAs: From Prototyping to Deployment

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs

Minimizing Computation in Convolutional Neural Networks

CMP network-on-chip overlaid with multi-band RF-interconnect

SACNN: Self-Attention Convolutional Neural Network for Low-Dose CT Denoising With Self-Supervised Perceptual Loss Network

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

Contact Info

Product

Resources

About