2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing 2013
DOI: 10.1109/ccgrid.2013.12
|View full text |Cite
|
Sign up to set email alerts
|

CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
51
0
4

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 82 publications
(58 citation statements)
references
References 6 publications
3
51
0
4
Order By: Relevance
“…Moreover, the programmer must use thread-safe functions, eliminate inter-thread data dependencies, avoid pointer aliasing, and manage access to shared variables. In addition, the highlevel abstraction of directive-based programming can come with a performance penalty in comparison with low-level programming models such as OpenCL and CUDA [18], [19], [20], [21]. Domain-specific libraries, such as MAGMA, PARALU-TION and ViennaCL, provide both abstraction and high performance for a set of computation kernels and algorithms in a specific domain.…”
Section: Resultsmentioning
confidence: 99%
“…Moreover, the programmer must use thread-safe functions, eliminate inter-thread data dependencies, avoid pointer aliasing, and manage access to shared variables. In addition, the highlevel abstraction of directive-based programming can come with a performance penalty in comparison with low-level programming models such as OpenCL and CUDA [18], [19], [20], [21]. Domain-specific libraries, such as MAGMA, PARALU-TION and ViennaCL, provide both abstraction and high performance for a set of computation kernels and algorithms in a specific domain.…”
Section: Resultsmentioning
confidence: 99%
“…the possible performance is cut in half. Additionally the performance available via directive based languages is known to be lower than that of CUDA [16,23]. Restructuring the code into subroutines using kernel calls to CUDA, e.g.…”
Section: Parallel Performance In Homogeneous Setupsmentioning
confidence: 99%
“…Tetsuya et al [4] used two micro benchmarks and one real-world application to compare CUDA and OpenACC. The performance was compared among four different compilers: PGI, Cray, HMPP and CUDA with different optimization technologies, i.e.…”
Section: Related Workmentioning
confidence: 99%