2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2018
DOI: 10.1109/ispass.2018.00034
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Performance Tradeoffs on the Radeon Open Compute Platform

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 26 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…Cabezas et al [55] showed a software solution, including programming interfaces, compiler support and runtime, to partition GPU kernels for multi-GPU execution in a single node. Finally, Sun et al [56] evaluated the potential performance benefit and tradeoffs of AMD's Radeon Open Compute (ROC) platform for Heterogeneous System Architecture (HSA).…”
Section: Related Workmentioning
confidence: 99%
“…Cabezas et al [55] showed a software solution, including programming interfaces, compiler support and runtime, to partition GPU kernels for multi-GPU execution in a single node. Finally, Sun et al [56] evaluated the potential performance benefit and tradeoffs of AMD's Radeon Open Compute (ROC) platform for Heterogeneous System Architecture (HSA).…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, HIP is claimed to achieve excellent performance still being compatible with Nvidia GPUs. Sun and coauthors (Sun et al, 2018) evaluated performance options of the ROCm platform using general CPU-GPU benchmarks and machine learning benchmarks. They found HIP to be best performing high-level framework for AMD devices and confirmed that HIP has close to zero overhead over CUDA on Nvidia GPU and thus provides both performance and portability.…”
Section: Related Workmentioning
confidence: 99%
“…Convolutional layers dominate the overall DNN training time. In particular, the convolutional layers alone can contribute to approximately 90% of the training time [21], [30]. Therefore, in this paper we are focused on improving GEMM-based convolutional layer performance, given its dominance on training performance.…”
Section: Characterization Of Sparse Matrix Operationsmentioning
confidence: 99%