Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

Dua, Akshay; Li, Yixing; Ren, Fengbo

doi:10.1109/fccm48280.2020.00064

Cited by 10 publications

(1 citation statement)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Related work is shown by (Han et al, 2016), which results into reduced power consumption by reducing number of weights. More higher level of optimization is proposed by (Dua et al, 2020), which uses the OpenGL compiler for DNNs, such as VGG and AlexNet. Hah et al (2019) suggested framework for automatic conversion of deep neural network models into intermediate format (HLS) and then subsequent FPGA implementation.…”

Section: Related Workmentioning

confidence: 99%

Efficient Implementation of Stochastic Computing Based Deep Neural Network on Low Cost Hardware with Saturation Arithmetic

Bodiwala

Nanavati²

2020

Journal of Computer Science

View full text Add to dashboard Cite

This study presents an efficient and rapid implementation of Stochastic Computing (SC) based Deep Neural Network (DNN) on a lowcost hardware platform. The proposed technique uses bipolar signal encoding in stochastic computing which relatively gives low hardware footprint compared to binary computing. Thereinafter, stochastic max function is presented and subsequently used to approximate the hyperbolic tangent activation function in SC. In addition, saturation arithmetic is proposed to reduce down scaling parameters that can further affect precision in computation. In this study, we demonstrate our SC-based DNN feasibility through a hardware accelerator prototype with the AXI Stream interface on a PYNQ Z2 board which is equipped with a XILINX ZYNQ XC7Z020-1CLG400C. The validity of this study is demonstrated through a MNIST handwritten digit recognition task. The experimental result shows our SCbased DNN model can be easily deployed on the embedded devices. The SC-based accelerator with AXI Stream interface performs at 1.877 GOP/s processing throughput, achieves higher accuracy with minimum area and energy consumption, consuming only 0.61 mm 2 area and 1.89W power.

show abstract

Section: Related Workmentioning

confidence: 99%