Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2020
DOI: 10.1145/3373087.3375321
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Optimization of Deep Learning Applications

Abstract: The irregularity of recent Convolutional Neural Network (CNN) models such as less data reuse and parallelism due to the extensive network pruning and simplification creates new challenges for FPGA acceleration. Furthermore, without proper optimization, there could be significant overheads when integrating FPGAs into existing machine learning frameworks like TensorFlow. Such a problem is mostly overlooked by previous studies. However, our study shows that a naive FPGA integration into TensorFlow could lead to u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 37 publications
(13 citation statements)
references
References 16 publications
0
13
0
Order By: Relevance
“…However, they assume that the performance/area changes monotonically by modifying an individual design parameter, which is not a valid assumption as we explained in Challenge 2 of Section 1. To increase the accuracy of the estimation model, a number of other studies restrict the target application to those that have a welldefined accelerator micro-architecture template [9,14,15,40,45,58], a specific application [55,61], or a particular computation pattern [10,28,37]; hence, they lose generality.…”
Section: Model-based Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…However, they assume that the performance/area changes monotonically by modifying an individual design parameter, which is not a valid assumption as we explained in Challenge 2 of Section 1. To increase the accuracy of the estimation model, a number of other studies restrict the target application to those that have a welldefined accelerator micro-architecture template [9,14,15,40,45,58], a specific application [55,61], or a particular computation pattern [10,28,37]; hence, they lose generality.…”
Section: Model-based Techniquesmentioning
confidence: 99%
“…The main enabler of this feature is the ability to iteratively re-optimize the micro-architecture quickly just by inserting synthesis directives in the form of pragmas instead of re-writing the low-level behavioral description of the design. Because of the reduced code development cycle and the shorter turn-around times, HLS has been rapidly adopted by both academia and industry [3,20,30,45,49,65]. In fact, Code 1 shows an intuitive HLS C implementation of one forward path of a Convolutional Neural Network (CNN) on Xilinx FPGAs.…”
Section: Introductionmentioning
confidence: 99%
“…Cafeine [39] combined both on-chip and of-chip data reorganizations for the convolutional matrixmultiplication representation to maximize the underlying memory bandwidth utilization. FlexCNNe [26] further optimized data layout optimizations on the concatenation layers. However, all these works [6]- [26] are based on the computation and memory access pattern in the inference phase which only has FP.…”
Section: Related Workmentioning
confidence: 99%
“…FlexCNNe [26] further optimized data layout optimizations on the concatenation layers. However, all these works [6]- [26] are based on the computation and memory access pattern in the inference phase which only has FP. The training phase involves FP, BP, and WU where their data access pattern for output features, input features, and weights are diferent.…”
Section: Related Workmentioning
confidence: 99%
“…Contrary, FINN uses an High-level Synthesis (HLS) hardware library [20] of hardware layers and components that are used to generate streaming architectures customized for each network. Other tools for automatic hardware generation are FlexCNN [21], integrating an FPGA implementation framework into Tensorflow and DNNBuilder [22], which uses software-hardware co-design to perform an end-to-end optimization of deep learning applications.…”
Section: B Automatic Hardware Generation and Hardware Architectures F...mentioning
confidence: 99%