2017
DOI: 10.1145/3079758
|View full text |Cite
|
Sign up to set email alerts
|

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

Abstract: Deep convolutional neural networks (CNNs) have gained great success in various computer vision applications. State-of-the-art CNN models for large-scale applications are computation intensive and memory expensive and, hence, are mainly processed on high-performance processors like server CPUs and GPUs. However, there is an increasing demand of high-accuracy or real-time object detection tasks in large-scale clusters or embedded systems, which requires energy-efficient accelerators because of the green computat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
48
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 87 publications
(48 citation statements)
references
References 15 publications
0
48
0
Order By: Relevance
“…Liu et al [170] proposed a parallel framework for FPGAbased CNN accelerators that exploits four levels of parallelism; task level, layer level, loop level, and operator level. Task-level parallelism involves executing multiple image prediction tasks simultaneously.…”
Section: Fpga-based Acceleratorsmentioning
confidence: 99%
See 1 more Smart Citation
“…Liu et al [170] proposed a parallel framework for FPGAbased CNN accelerators that exploits four levels of parallelism; task level, layer level, loop level, and operator level. Task-level parallelism involves executing multiple image prediction tasks simultaneously.…”
Section: Fpga-based Acceleratorsmentioning
confidence: 99%
“…We also like to acknowledge Dr. Blair P. Bremberg and Ms. Sumaiya Hussain Sadiq for their help in professional English editing of this manuscript. VOLUME 4, 2018 NeuFlow [143] Memory-Centric Accelerator [146] nn-X [148] Roofline-based FPGA Accelerator [55] Embedded FPGA Accelerator [98] DeepBurning [155] OpenCL-based FPGA Accelerator [80] Caffeine [153], [162] fpgaConvNet [165] Loop Unrolling [78], [168] Throughput-Optimized FPGA Accelerator [170] FP-DNN [171] FINN [181] Customized CONV Loop Accelerator [83] Latency-Driven Design for FPGA-based CNNs [183] DLA [188] Winograd-based CNN Accelerator [189] OpenCL-based Architecture for Accelerating CNNs [190] Multi-CLP Accelerator for CNNs [192] Automated Systolic Array Architecture for CNN [195] End-to-End Scalable FPGA Accelerator [196] DLAU [197] An Automatic RTL Compiler for High-Throughput Deep CNNs [199] Intel's DLA [200] Angel-Eye [60] Optimizing the CONV Operation to Accelerate DNNs on FPGA [204] Loop Unrolling El-Maleh's research interests are in the areas of synthesis, testing, and verification of digital systems. In addition, he has research interests in defect and soft-error tolerance design, VLSI design, design automation and efficient FPGA implementations of deep learning algorithms and data compression techniques.…”
Section: Acknowledgmentmentioning
confidence: 99%
“…DNN Accelerator Performance Prediction. For designing FPGA-based DNN accelerators, current practice usually relies on roofline models [10] or customized analytical tools [13,16] to estimate the achievable performance. For ASIC-based accelerators, recently published designs [21,34,35] introduce various performance prediction methods.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Recently, fieldprogrammable gate arrays (FPGAs) have become a particularly attractive option for accelerating large-scale matrix multiplication due to their reconfigurability and abundant logic resources. Previous studies [3][4][5][6][7][8][9] have primarily focused on accelerating matrix multiplication on FPGA by using an efficient architecture, i.e. the one-dimensional systolic array.…”
Section: Introductionmentioning
confidence: 99%