2020
DOI: 10.1109/tcad.2019.2930577
|View full text |Cite
|
Sign up to set email alerts
|

DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators

Abstract: The convolutional neural network (CNN) has become a state-of-the-art method for several artificial intelligence domains in recent years. The increasingly complex CNN models are both computationbound and I/O-bound. FPGA-based accelerators driven by custom instruction set architecture (ISA) achieve a balance between generality and efficiency, but there is much on them left to be optimized. We propose the full-stack compiler DNNVM, which is an integration of optimizers for graphs, loops and data layouts, and an a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(18 citation statements)
references
References 33 publications
0
18
0
Order By: Relevance
“…To make the lightweight processing suitable for the on-device DCNN processing, the proposed DCNN architecture is further optimized by applying the quantization method [ 48 ] to represent each network parameter with an 8 bit fixed-point number. In addition, the layer-fusing method from [ 49 ] is utilized to merge two adjacent processing layers: one convolution layer and the following pooling layer, making a single processing layer associated with fewer parameters. Table 5 compares the proposed DCNN architecture with the previous method from [ 43 ], which provides the smallest model size among existing works, as summarized in Table 3 .…”
Section: Proposed Power Optimization Methodsmentioning
confidence: 99%
“…To make the lightweight processing suitable for the on-device DCNN processing, the proposed DCNN architecture is further optimized by applying the quantization method [ 48 ] to represent each network parameter with an 8 bit fixed-point number. In addition, the layer-fusing method from [ 49 ] is utilized to merge two adjacent processing layers: one convolution layer and the following pooling layer, making a single processing layer associated with fewer parameters. Table 5 compares the proposed DCNN architecture with the previous method from [ 43 ], which provides the smallest model size among existing works, as summarized in Table 3 .…”
Section: Proposed Power Optimization Methodsmentioning
confidence: 99%
“…However, depthwise separable convolution spends 95% computation time in Conv 1 × 1, which causes a large MAdds gap between two consecutive laysers (Conv 1 × 1 and Conv DW 3×3) [12]. This gap is unfriendly to embedded systems who load all weights of the network to perform convolution [24]: embedded systems need extra buffers for Conv 1 × 1.…”
Section: Variable Group Convolutionmentioning
confidence: 99%
“…Communication between off-chip memory and on-chip memory only happens on the start and the end of block computing when a block is grouped and computed together on embedded systems [24]. To limit the communication cost, VarGNet sets the number of output channels to be same as the number of input channels in the normal block.…”
Section: Blocks Of Variable Group Networkmentioning
confidence: 99%
“…The authors also employ a data quantization strategy that is implemented in a dynamic way across layers and takes place during the training phase. An extension of this work is presented in [24], where the authors propose an end-to-end compiler that integrates optimizers for graphs, loops and data layouts. The main optimization utilized targets fusion of graph parts, operations, layers, operations across different kernels, etc., and exploring effective fusion strategies.…”
Section: Related Workmentioning
confidence: 99%
“…This framework allows the user to customize the design of an equivalent CNN, and generates both a synthesizable C++ code and ready-to-use scripts for Xilinx Vivado. In [24] the authors propose an end-to-end compiler that integrates optimizers for graphs, loops and data layouts and aim at generating more smart instructions. The authors in [29] propose a uniformed mathematical representation for efficient FPGA acceleration of all layers in CNN/DNN models and a framework that finds the optimal mapping of this representation to the specialized accelerator based on roofline model.…”
Section: Related Workmentioning
confidence: 99%