2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2016
DOI: 10.1109/ipdpsw.2016.70
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model

Abstract: Abstract-Although the OpenMP 4.0 standard has been available since 2013, support for GPUs has been absent up until very recently, with only a handful of experimental compilers available. In this work we evaluate the performance of Cray's new NVIDIA GPU targeting implementation of OpenMP 4.0, with the mini-apps TeaLeaf, CloverLeaf and BUDE. We successfully port each of the applications, using a simple and consistent design throughout, and achieve performance on an NVIDIA K20X that is comparable to Cray's OpenAC… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
27
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 34 publications
(30 citation statements)
references
References 13 publications
2
27
0
Order By: Relevance
“…Therefore, future work will focus on understanding this difference and investigating solutions to improving task performance on this platform. This work builds upon the success of previous mini-app work within the HPC group at the University of Bristol [16], demonstrating that mini-apps are powerful tools to both compare and test different programming models, as well investigate different language features that can lead to increased performance for a more general set of applications.…”
Section: Resultsmentioning
confidence: 93%
“…Therefore, future work will focus on understanding this difference and investigating solutions to improving task performance on this platform. This work builds upon the success of previous mini-app work within the HPC group at the University of Bristol [16], demonstrating that mini-apps are powerful tools to both compare and test different programming models, as well investigate different language features that can lead to increased performance for a more general set of applications.…”
Section: Resultsmentioning
confidence: 93%
“…The concept of attempting to represent a cross section of applications or application characteristics is becoming well recognised (e.g. Martineau et al, 2016), and others are investigating applicability to environmental models (e.g Stone et al, 2012), and noting the importance of representativeness in the selection of the mini-apps.…”
Section: Related Workmentioning
confidence: 99%
“…Maintaining data resident on a device is generally one of the most important considerations for offloading to accelerators. We previously discussed the difficulties that are encountered when attempting to copy data to and from the device using the structured target enter data directive [2]. With OpenMP 4.0, the initial copying of resident data into the device data environment would be approached as shown in Listing 1.1.…”
Section: Structured and Unstructured Data Regionsmentioning
confidence: 99%
“…We have previously shown that it is not yet possible to write a single homogeneous line of directives to achieve performance portability with OpenMP [6] [2]. Standardisation of the compiler implementations is important for future performance portability, for instance, the newest Clang implementation automatically chooses optimal team and thread counts, so that the developer does not have to list architecture-specific values.…”
Section: Homogeneous Directivesmentioning
confidence: 99%