2020
DOI: 10.1109/access.2020.2988311
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs

Abstract: Field Programmable Gate Array (FPGA) has become an efficient accelerator for convolutional neural network (CNN) inference due to its high performance and flexibility. To further improve the performance of CNN inference on FPGAs, an Intellectual Property core (IP core) called Deep Learning Processor Unit (DPU) is released by Xilinx. Unlike previous FPGA-based hardware designs focusing on specific functions or CNNs, the DPU IP supports ample basic functions of deep learning, and the developers can take advantage… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(30 citation statements)
references
References 31 publications
0
14
0
Order By: Relevance
“…The previously proposed approaches targeting heterogeneous hardware utilization for CNN inferences are similar to our work in that they use multiple available resources (CPU and FPGAs in [2], CPU, GPU, and FPGAs in [23], and CPU and multiple DPUs in [24]); however, the previously proposed approaches can only be employed when simultaneously executing multiple CNN inferences, thus limiting their applicability. On the contrary, our proposed technique can be applied to the single CONV layer acceleration, which has wider applicability as compared to [23], [2], and [24]. In addition, those coarsegrained task partitioning may not work well in resourceconstrained edge devices because it is very rare to execute a large batch of images together in edge devices.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The previously proposed approaches targeting heterogeneous hardware utilization for CNN inferences are similar to our work in that they use multiple available resources (CPU and FPGAs in [2], CPU, GPU, and FPGAs in [23], and CPU and multiple DPUs in [24]); however, the previously proposed approaches can only be employed when simultaneously executing multiple CNN inferences, thus limiting their applicability. On the contrary, our proposed technique can be applied to the single CONV layer acceleration, which has wider applicability as compared to [23], [2], and [24]. In addition, those coarsegrained task partitioning may not work well in resourceconstrained edge devices because it is very rare to execute a large batch of images together in edge devices.…”
Section: Related Workmentioning
confidence: 99%
“…In [24], a task assignment technique is proposed for multi-CNN acceleration, which utilizes multiple deep learning processing units (DPUs) for CNN inference while CPU is responsible for task initialization. The previously proposed approaches targeting heterogeneous hardware utilization for CNN inferences are similar to our work in that they use multiple available resources (CPU and FPGAs in [2], CPU, GPU, and FPGAs in [23], and CPU and multiple DPUs in [24]); however, the previously proposed approaches can only be employed when simultaneously executing multiple CNN inferences, thus limiting their applicability. On the contrary, our proposed technique can be applied to the single CONV layer acceleration, which has wider applicability as compared to [23], [2], and [24].…”
Section: Related Workmentioning
confidence: 99%
“…Recently, Xilinx has released Deep Learning Processing Unit (DPU), a configurable computation engine for CNNs [ 27 ]. The parallelism that can be achieved in DPU is dependent on the target device and application.…”
Section: Related Workmentioning
confidence: 99%
“…It can accelerate convolution computing and achieve efficient object recognition, detection and classification. The DPU computing core is designed on a full pipeline structure, and integrates a large number of convolution operators, adders and non-linear Pulling/ReLu operators, which can support quantization methods with different dynamic precision [30].…”
Section: Simulation Environmentmentioning
confidence: 99%