UniWiG: Unified Winograd-GEMM Architecture for Accelerating CNN on FPGAs

Kala, S; Mathew, Jimson; Jose, Babita R.; Nalesh, S

doi:10.1109/vlsid.2019.00055

Cited by 20 publications

(8 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…FPGA implementations of Winograd convolution are presented in [81], which incorporate feature map caching using linebuff structure, data reuse, effective use of pipelining for PEs, and parallel processing of convolution operations. A unified architecture incorporating the Winograd and general matrix multiplication (GEMM) named UniWig is presented in [82]. Instead of using separate PEs for convolution and dense layers, UniWig utilizes the same set of PEs and blocked Winograd filtering to ensure proper resource utilization.…”

Section: ) Optimized Convolution In Cnnmentioning

confidence: 99%

Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review

et al. 2023

View full text Add to dashboard Cite

Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted in breakthroughs in many areas. However, deploying these highly accurate models for data-driven, learned, automatic, and practical machine learning (ML) solutions to end-user applications remains challenging. DL algorithms are often computationally expensive, power-hungry, and require large memory to process complex and iterative operations of millions of parameters. Hence, training and inference of DL models are typically performed on high-performance computing (HPC) clusters in the cloud. Data transmission to the cloud results in high latency, round-trip delay, security and privacy concerns, and the inability of real-time decisions. Thus, processing on edge devices can significantly reduce cloud transmission cost. Edge devices are end devices closest to the user, such as mobile phones, cyber-physical systems (CPSs), wearables, the Internet of Things (IoT), embedded and autonomous systems, and

show abstract

Section: ) Optimized Convolution In Cnnmentioning

confidence: 99%

Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review

et al. 2023

View full text Add to dashboard Cite

show abstract

“…As shown in Figure 6, according to different design concepts and requirements, FPGA-based neural network optimization technology can be roughly divided into optimization for data and operation, optimization for bandwidth, and optimization for memory and access, among others, which are introduced in detail below. [71][72][73][74][75][76][77][78], less computations [79][80][81], improve calculation speed [82][83][84][85], Winograd fast convolution algorithm [86][87][88][89][90][91], Im2col convolution optimization algorithm [92][93][94][95][96][97], pipelined design [98][99][100][101][102], Roofline model [103][104][105], ping-pong cache [106][107][108][109], input feature map reuse [110,111], filter reuse [111,112], convolutional reuse [110]…”

Section: Neural Network Optimization Technology Based On Fpgamentioning

confidence: 99%

A Review of the Optimal Design of Neural Networks Based on FPGA

Wang

Luo

2022

Applied Sciences

View full text Add to dashboard Cite

Deep learning based on neural networks has been widely used in image recognition, speech recognition, natural language processing, automatic driving, and other fields and has made breakthrough progress. FPGA stands out in the field of accelerated deep learning with its advantages such as flexible architecture and logic units, high energy efficiency ratio, strong compatibility, and low delay. In order to track the latest research results of neural network optimization technology based on FPGA in time and to keep abreast of current research hotspots and application fields, the related technologies and research contents are reviewed. This paper introduces the development history and application fields of some representative neural networks and points out the importance of studying deep learning technology, as well as the reasons and advantages of using FPGA to accelerate deep learning. Several common neural network models are introduced. Moreover, this paper reviews the current mainstream FPGA-based neural network acceleration technology, method, accelerator, and acceleration framework design and the latest research status, pointing out the current FPGA-based neural network application facing difficulties and the corresponding solutions, as well as prospecting the future research directions. We hope that this work can provide insightful research ideas for the researchers engaged in the field of neural network acceleration based on FPGA.

show abstract

“…In this paper, we advocated a paradigm in which an equal number of processing elements might be used to boost both Winograd-based convolution and GEMM [7]. The PE contains a multipliers adder, register, and FIFO'S.…”

Section: Hardware Implementationmentioning

confidence: 99%

“…Convolutions are reduced to generic element-wise matrix multiplications (GEMMs) in the usual method [7]. Convolution based on the Fast Fourier transform (FFT) is less computationally complex than the traditional technique.…”

Section: Introduction and Literature Surveymentioning

confidence: 99%

Assessment of Winograd-based processor element for CNN processor and ASIC implementation of convolution processor element

Sreya¹,

Satish²,

Sai³

et al. 2022

World J. Adv. Res. Rev.

View full text Add to dashboard Cite

The convolutional neural network (CNN) is the most widely used machine learning technique within the fields of image and video processing. It is primarily used to categorize images using vast datasets. This require a lot of calculations. The effectiveness of a field-programmable gate array (FPGA) as a hardware accelerator for CNNs will give excellent performance at low power budgets. The employment of the Winograd algorithm can reduce the number of processing stages in CNN. 2-D convolution is employed for the bulk of calculations in CNNs. The tactic for computing convolution for smaller filter sizes that uses Winograd minimum filtering is the handiest. The comparison of computation complexity for multiplications will be performed using Matlab. The architecture of the Winograd-based processing element, RTL coding in Verilog HDL, and the test bench are designed to examine the performance. The Xilinx Vivado / Cadence tool will be used to implement the processing element in the convolution unit on an FPGA or ASIC.

show abstract

UniWiG: Unified Winograd-GEMM Architecture for Accelerating CNN on FPGAs

Cited by 20 publications

References 11 publications

Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review

Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review

A Review of the Optimal Design of Neural Networks Based on FPGA

Assessment of Winograd-based processor element for CNN processor and ASIC implementation of convolution processor element

Contact Info

Product

Resources

About