CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application

Hoshino, Tyuji; Maruyama, Naoya; Matsuoka, Satoshi; Takaki, Ryoji

doi:10.1109/ccgrid.2013.12

Cited by 82 publications

(58 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Moreover, the programmer must use thread-safe functions, eliminate inter-thread data dependencies, avoid pointer aliasing, and manage access to shared variables. In addition, the highlevel abstraction of directive-based programming can come with a performance penalty in comparison with low-level programming models such as OpenCL and CUDA [18], [19], [20], [21]. Domain-specific libraries, such as MAGMA, PARALU-TION and ViennaCL, provide both abstraction and high performance for a set of computation kernels and algorithms in a specific domain.…”

Section: Resultsmentioning

confidence: 99%

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

Helal¹,

Tech²,

Sathre³

et al. 2016

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Abstract-To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both within a compute node and across compute nodes. As such, we present MetaMorph, a library framework designed to (automatically) extract as much computational capability as possible from HPC systems. Its design centers around three core principles: abstraction, interoperability, and adaptivity. To demonstrate its efficacy, we present a case study that uses the structured grids design pattern, which is heavily used in computational fluid dynamics. We show how MetaMorph significantly reduces the development time, while delivering performance and interoperability across an array of heterogeneous devices, including multicore CPUs, Intel MICs, AMD GPUs, and NVIDIA GPUs.

show abstract

Section: Resultsmentioning

confidence: 99%

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

Helal¹,

Tech²,

Sathre³

et al. 2016

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

show abstract

“…the possible performance is cut in half. Additionally the performance available via directive based languages is known to be lower than that of CUDA [16,23]. Restructuring the code into subroutines using kernel calls to CUDA, e.g.…”

Section: Parallel Performance In Homogeneous Setupsmentioning

confidence: 99%

Two-level parallelization of a fluid mechanics algorithm exploiting hardware heterogeneity

2015

View full text Add to dashboard Cite

“…Tetsuya et al [4] used two micro benchmarks and one real-world application to compare CUDA and OpenACC. The performance was compared among four different compilers: PGI, Cray, HMPP and CUDA with different optimization technologies, i.e.…”

Section: Related Workmentioning

confidence: 99%

Untitled

2016

IJCSEA

View full text Add to dashboard Cite

show abstract

CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application

Cited by 82 publications

References 6 publications

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

Two-level parallelization of a fluid mechanics algorithm exploiting hardware heterogeneity

Untitled

Contact Info

Product

Resources

About