2021
DOI: 10.1109/access.2021.3119516
|View full text |Cite
|
Sign up to set email alerts
|

System-Level Communication Performance Estimation for DMA-Controlled Accelerators

Abstract: The performance of a hardware accelerator is often limited by the communication bandwidth between local on-chip memories and DRAM across on-chip bus. In this paper, a system-level performance estimation algorithm is newly proposed for evaluating the communication performance of direct memory access (DMA) controlled accelerators. The proposed algorithm can estimate the communication performance accurately for both DRAM-limited and bus-limited cases. In detail, the communication performance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 35 publications
(64 reference statements)
0
3
0
Order By: Relevance
“…Another popular method of reducing memory access row conflicts in convolutional operation is to adjust the number of columns during loop tiling [33] [34] [35] [36]. However, several state-of-the-art solutions empirically determine the number of columns in the loop tiling to reduce memory row conflicts, or directly set the number of columns in the loop tiling to the number of columns of the output feature map when sufficient on-chip buffer is available.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Another popular method of reducing memory access row conflicts in convolutional operation is to adjust the number of columns during loop tiling [33] [34] [35] [36]. However, several state-of-the-art solutions empirically determine the number of columns in the loop tiling to reduce memory row conflicts, or directly set the number of columns in the loop tiling to the number of columns of the output feature map when sufficient on-chip buffer is available.…”
Section: Related Workmentioning
confidence: 99%
“…However, several state-of-the-art solutions empirically determine the number of columns in the loop tiling to reduce memory row conflicts, or directly set the number of columns in the loop tiling to the number of columns of the output feature map when sufficient on-chip buffer is available. For example, many related works [33] [34] [35] [36] can only provide empirical parameters such as T m, T n, T r, T c, etc. The sizes of convolutional layers for many DNN models such as YOLOv2 significantly differ from each other.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation