Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Davis, Joshua Hoke; Daley, Chris; Pophale, Swaroop; Huber, Thomas; Chandrasekaran, Sunita; Wright, Nicholas J.

doi:10.1007/978-3-030-74224-9_2

Cited by 14 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A significant challenge is to port existing applications to platforms with accelerators, including GPUs. The ultimate goal 39 is taking advantage of these powerful platforms without having to learn the hardware details or significantly change the application codes. Numerous programming models and environments have been developed, including CUDA, 11 HIP, 12 OpenCL, 24 and Kokkos 40 .…”

Section: Related Workmentioning

confidence: 99%

“…An alternative option is directive‐based models such as OpenACC 41 and OpenMP 42 . They offer 39 an abstraction layer over different hardware types with a unified interface that allows reducing the work needed to accelerate applications, requiring only some "hints" or annotations to be added for the compiler. The OpenMP model incorporated support for offloading code to accelerators from version 4.0 (released in 2013) and has upgraded various features in subsequent versions (till version 5.2).…”

Section: Related Workmentioning

confidence: 99%

“…The major advantage of this approach is applying the same programming interface for the whole application. Currently, offloading code to NVIDIA GPUs is supported by using GCC, Clang, IBM XL, HPE, and NVIDIA HPC compilers 39,51 . In this work, the Clang compiler is used.…”

Section: Adaptation Of Solidification Modeling Application To Nvidia ...mentioning

confidence: 99%

See 2 more Smart Citations

Single‐ and multi‐GPU computing on NVIDIA‐ and AMD‐based server platforms for solidification modeling application

Halbiniak,

Meyer,

Rojek

2023

Concurrency and Computation

View full text Add to dashboard Cite

SummaryThis work explores the performance of single‐ and multi‐GPU computing on state‐of‐the‐art NVIDIA‐ and AMD‐based server‐class hardware using various programming interfaces to accelerate a real‐world scientific application for solidification modeling based on the phase‐field method. The main computations of this memory‐bound application correspond to 20 stencils computed across grid nodes. We investigate the application's scalability for two basic schemes of organizing computation: without and with hiding data transfers behind computation, combined with using either peer‐to‐peer inter‐GPU data transfers through NVIDIA NVLink and AMD Infinity interconnects or communication over the PCIe and main memory. Among the studied programming interfaces is CUDA, HIP, and OpenMP Accelerator Model. While the first two are designed to write the codes for a specific hardware platform, OpenMP enables code portability between NVIDIA and AMD GPUs. The resulting performance is experimentally assessed on computing platforms containing NVIDIA V100 (up to 8 GPUs) and A100 (one GPU), as well as AMD MI210 (one device) and MI250 (up to 8 logical GPUs).

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Single‐ and multi‐GPU computing on NVIDIA‐ and AMD‐based server platforms for solidification modeling application

Halbiniak,

Meyer,

Rojek

2023

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…For many programming models, there are studies that evaluate these for CPUs or GPUs. In [7] the authors study OpenMP offload on NVIDIA V100 with a few mini-apps and various compilers, observe performance variations, and provide some OpenMP optimization techniques. In [8], the authors present the computebound mini-app miniBUDE and evaluate various programming models, including offload to GPUs.…”

Section: Related Workmentioning

confidence: 99%

Evaluating GPU Programming Models for the LUMI Supercomputer

Markomanolis

Alpay

Young

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct™ accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.

show abstract

“…The offloading model is beginning to mature as we speak based on the validation and verification findings [27], [28]. We also note that the model is being used on mini-applications [29] and other applications including Pseudo-Spectral Direct Numerical Simulation-Combined Compact Difference (PSDNS-CCD3D) [30] and Quicksilver [31] among others [32], [33], [34], [30], [35]. We chose to go with OpenACC instead of the OpenMP offloading model.…”

Section: Directive-based Programming With Openaccmentioning

confidence: 99%