The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS) 2020
DOI: 10.1109/icpads51040.2020.00036
|View full text |Cite
|
Sign up to set email alerts
|

An Effective Design to Improve the Efficiency of DPUs on FPGA

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…Nevertheless, running more complex CNN models, such as VGG-16, on Zynq-7,000 platforms using DPU IP remains a challenge. Even with the quantization of the VGG-16 model with Xilinx DNNDK, the model size is 132 MB [20]. To complete the large amount of calculation needed for VGG-16, a DPU core with the maximum size realized on the ZCU102 board was used, which is the B4096 core [20].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Nevertheless, running more complex CNN models, such as VGG-16, on Zynq-7,000 platforms using DPU IP remains a challenge. Even with the quantization of the VGG-16 model with Xilinx DNNDK, the model size is 132 MB [20]. To complete the large amount of calculation needed for VGG-16, a DPU core with the maximum size realized on the ZCU102 board was used, which is the B4096 core [20].…”
Section: Resultsmentioning
confidence: 99%
“…Even with the quantization of the VGG-16 model with Xilinx DNNDK, the model size is 132 MB [20]. To complete the large amount of calculation needed for VGG-16, a DPU core with the maximum size realized on the ZCU102 board was used, which is the B4096 core [20]. Compared to the available resources on Zynq-7,000 platforms, this DPU core uses over 3.2× of DSP, 1.12× of LUT, and 1.9× of BRAM.…”
Section: Resultsmentioning
confidence: 99%
“…The use of cloud computing is inconvenient when operational critical apparatus must be monitored, due to the low reliability and high latency of remote connections which requires enough bandwidth to guarantee real-time operations; general purpose platforms, using CPUs and GPUs have got silicon sizes, prices and energy costs which are incompatible with the integration into the apparatus to be monitored [5]. Similar limitations affect devoted processors, such as the Xilinx Deep Learning Processor Unit (DPU) core [20], introduced to accelerate CNN inference on FPGAs. Although it is a configurable soft core engine supporting various basic DL features (convolution, max and average pooling, etc.…”
Section: Related Workmentioning
confidence: 99%
“…It is worthwhile to note that if we lower the operating frequency to set the ODR to 1 kHz, like the alternatives in Table IV, the power consumption of our proposal, 107 mW, remains significantly lower. With reference to Application Processing Units (APU) built with FPGAs, that from some years have become very attractive to setup highly customizable platforms [39], an interesting solution is the Xilinx DPU [20] to implement high performance NNs, including GoogLeNet, ResNet and MobileNet, on Xilinx Zynq SoC devices. The DPU IP provides some possible configurations regarding the DSP slice, LUT, block RAM, UltraRAM, the number of DPU cores, the convolution architecture, etc., to meet various types of constraints.…”
Section: A Fpgamentioning
confidence: 99%
See 1 more Smart Citation