Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2016
DOI: 10.1145/2847263.2847276
|View full text |Cite
|
Sign up to set email alerts
|

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
256
0
7

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 459 publications
(275 citation statements)
references
References 10 publications
2
256
0
7
Order By: Relevance
“…[8] and [23] explore in-memory-processing to accelerate CNNs. [28] develop an OpenCLbased HLS tool to implement CNN accelerators that use different modules for different kinds of layers, but all convolutional layers are computed with a single CLP.…”
Section: Related Workmentioning
confidence: 99%
“…[8] and [23] explore in-memory-processing to accelerate CNNs. [28] develop an OpenCLbased HLS tool to implement CNN accelerators that use different modules for different kinds of layers, but all convolutional layers are computed with a single CLP.…”
Section: Related Workmentioning
confidence: 99%
“…The intersection of the roofline curve with a vertical line for a particular arithmetic intensity, gives the theoretical peak performance point, which is either compute-bound or memory-bound. In particular, we consider the binarized [31,21] and 8-bit fixed-point [25] implementations of the popular AlexNet [14], both of which require 1.4 billion operations (GOPS) to classify one image.…”
Section: Estimating Performance Using Rooflinesmentioning
confidence: 99%
“…In this cases, the majority of reported results include GOp/s when a favourable batch size is used. In [16], an OpenCL-based high-throughput accelerator is proposed which employs batch processing in order to sustain a high resource utilisation and hide the hostaccelerator communication overhead. In [17], Chen et al used batch processing to maximise weights reuse in ConvNet layers across multiple inputs.…”
Section: Performance Comparisonmentioning
confidence: 99%