OpenACC — First Experiences with Real-World Applications

Wienke, Sandra; Springer, Paul; Terboven, Christian; Mey, Dieter an

doi:10.1007/978-3-642-32820-6_85

Cited by 216 publications

(122 citation statements)

References 8 publications

Supporting

Mentioning

120

Contrasting

Unclassified

Order By: Relevance

“…Previous studies, like [8,24,13,17,32,14,22,10] also evaluate directive-based compilers that generate code for accelerators. The main difference is that this work covers more programs and includes a study of transformations.…”

Section: Related Workmentioning

confidence: 99%

Directive-Based Compilers for GPUs

Ghike

Tejero

Garzarán

et al. 2015

Languages and Compilers for Parallel Computing

View full text Add to dashboard Cite

Abstract. General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a conventional and machine-independent notation extended with directives and use compilers to generate GPU code automatically. These compilers enable portability and increase programmer productivity and, if effective, would not impose much penalty on performance. This paper evaluates two such compilers, PGI and Cray. We first identify a collection of standard transformations that these compilers can apply. Then, we propose a sequence of manual transformations that programmers can apply to enable the generation of efficient GPU kernels. Lastly, using the Rodinia Benchmark suite, we compare the performance of the code generated by the PGI and Cray compilers with that of code written in CUDA. Our evaluation shows that the code produced by the PGI and Cray compilers can perform well. For 6 of the 15 benchmarks that we evaluated, the compiler generated code achieved over 85% of the performance of a hand-tuned CUDA version.

show abstract

Section: Related Workmentioning

confidence: 99%

Directive-Based Compilers for GPUs

Ghike

Tejero

Garzarán

et al. 2015

Languages and Compilers for Parallel Computing

View full text Add to dashboard Cite

show abstract

“…5. Compute W rong A in parallel (line [14][15][16], which indicates the incorrect elements of array A. An element is incorrect only when it is written by at least one misspeculated iteration.…”

Section: Irregular Memory Accessesmentioning

confidence: 99%

“…Several programming models have been proposed for GPU computing including OpenCL [15], CUDA [12], OpenACC [16], PGI Accelerator [17], OmpSs [2] which is based upon OpenMP standard [4], and Par4All [1]. None of these programming models support speculative parallelization for GPU computing.…”

Section: Related Workmentioning

confidence: 99%

Optimistic Parallelism on GPUs

Feng¹,

Gupta

Bhuyan

2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, computation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data parallelism, the latter three phases represent overhead costs of using speculation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our programming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.

show abstract

“…Also, recent tools such as Kernelgen [34], Polly [24], hiCUDA [25], and the GPSME toolkit [57] are gaining in popularity and successful use-cases. Successful examples of semi-automatically parallelized programs range from medicine [56] to physics simulations [33]. Some of the tools only make use of the raw computational power of accelerators, while other tools offer implicit or explicit support for optimizing the usage of the GPU's complex memory hierarchy and hence minimizing communication [6].…”

Section: Automatic Parallelizationmentioning

confidence: 99%

Evaluating automatically parallelized versions of the support vector machine

Codreanu

Droge

Williams

et al. 2014

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYThe Support Vector Machine (SVM) is a supervised learning algorithm used for recognizing patterns in data. It is a very popular technique in Machine Learning and has been successfully used in applications such as image classification, protein classification, and handwriting recognition. However, the computational complexity of the kernelized version of the algorithm grows quadratically with the number of training examples. To tackle this high computational complexity we have developed a directive-based approach that converts a gradient-ascent based training algorithm for the CPU to an efficient GPU implementation. We compare our GPU-based SVM training algorithm to the standard LibSVM CPU implementation, a highly-optimized GPU-LIBSVM implementation, as well as to a directive-based OpenACC implementation. The results on different handwritten digit classification datasets demonstrate an important speed-up for the current approach when compared to the CPU and OpenACC versions. Furthermore, our solution is almost as fast and sometimes even faster than the highly optimized CUBLAS-based GPU-LIBSVM implementation, without sacrificing the algorithm's accuracy.

show abstract

OpenACC — First Experiences with Real-World Applications

Cited by 216 publications

References 8 publications

Directive-Based Compilers for GPUs

Directive-Based Compilers for GPUs

Optimistic Parallelism on GPUs

Evaluating automatically parallelized versions of the support vector machine

Contact Info

Product

Resources

About