Proceedings of the 4th ACM SIGPLAN Workshop on Functional High-Performance Computing 2015
DOI: 10.1145/2808091.2808092
|View full text |Cite
|
Sign up to set email alerts
|

Meta-programming and auto-tuning in the search for high performance GPU code

Abstract: Writing high performance GPGPU code is often difficult and timeconsuming, potentially requiring laborious manual tuning of lowlevel details. Despite these challenges, the cost in ignoring GPUs in high performance computing is increasingly large.Auto-tuning is a potential solution to the problem of tedious manual tuning. We present a framework for auto-tuning GPU kernels which are expressed in an embedded DSL, and which expose compile-time parameters for tuning. Our framework allows for kernels to be polymorphi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 20 publications
0
6
0
Order By: Relevance
“…Therefore, it is important to propose and evaluate different strategies for helping to choose a better number of threads per block configuration. The available literature does not agree on the optimal strategy for choosing the number of threads per block 20,21 . The most common strategies involve using the maximum number of threads per block supported by the GPU 22 or a fixed number of threads per block chosen by the programmer 21,23 .…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Therefore, it is important to propose and evaluate different strategies for helping to choose a better number of threads per block configuration. The available literature does not agree on the optimal strategy for choosing the number of threads per block 20,21 . The most common strategies involve using the maximum number of threads per block supported by the GPU 22 or a fixed number of threads per block chosen by the programmer 21,23 .…”
Section: Methodsmentioning
confidence: 99%
“…The available literature does not agree on the optimal strategy for choosing the number of threads per block. 20,21 The most common strategies involve using the maximum number of threads per block supported by the GPU 22 or a fixed number of threads per block chosen by the programmer. 21,23 While using fixed numbers like the warp size or the maximum number of threads supported by the GPU are useful to identify bottlenecks in a GPU kernel, these numbers may not be the best possible configuration for the thread block size.…”
Section: Strategies For Choosing the Number Of Threads Per Blockmentioning
confidence: 99%
See 1 more Smart Citation
“…The approaches in both [12] and [24] generate OpenCL GPU code from data-parallel software inputs, profiling to map computations onto CPU or GPU targets. In [22], Haskell meta-programs tune GPU kernel launch parameters for designs expressed in an embedded DSL. In [23] automatic source-to-source transformations optimise CUDA stencil computations.…”
Section: Related Workmentioning
confidence: 99%
“…One of the most prominent examples can be found in Haskell, where the Accelerate, Obsidian and Nikola libraries (just to name a few) provide GPU utilization primitives. These usually encode array operations in an EDSL way giving variable number of primitives and usually code generation to low level constructs or GPU intermediate language (IL) [2][3][4][5]. Dedicated FP languages were proposed like NOVA, from NVIDIA [6].…”
Section: Related Workmentioning
confidence: 99%