2012 Innovative Parallel Computing (InPar) 2012
DOI: 10.1109/inpar.2012.6339595
|View full text |Cite
|
Sign up to set email alerts
|

Auto-tuning a high-level language targeted to GPU codes

Abstract: Determining the best set of optimizations to apply to a kernel to be executed on the graphics processing unit (GPU) is a challenging problem. There are large sets of possible optimization configurations that can be applied, and many applications have multiple kernels. Each kernel may require a specific configuration to achieve the best performance, and moving an application to new hardware often requires a new optimization configuration for each kernel.In this work, we apply optimizations to GPU code using HMP… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
154
0
1

Year Published

2013
2013
2019
2019

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 355 publications
(160 citation statements)
references
References 28 publications
(16 reference statements)
0
154
0
1
Order By: Relevance
“…We use the GPU version of the popular Polybench benchmark suite [10]. This suite contains data-parallel applications written in OpenCL.…”
Section: Benchmark Applicationsmentioning
confidence: 99%
“…We use the GPU version of the popular Polybench benchmark suite [10]. This suite contains data-parallel applications written in OpenCL.…”
Section: Benchmark Applicationsmentioning
confidence: 99%
“…Previous studies, like [8,24,13,17,32,14,22,10] also evaluate directive-based compilers that generate code for accelerators. The main difference is that this work covers more programs and includes a study of transformations.…”
Section: Related Workmentioning
confidence: 99%
“…While we perform our evaluation using the Rodinia benchmark suite, which contains applications from different domains, most previous works only experiment with 1 or 2 applications, except for the project discussed by Grauer et al [10] and Lee and Vetter [22]. The work by Grauer et al uses the PolyBench collection which contains regular kernels mostly from the linear algebra domain.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Each case calls the portability of accelerator performance into question. The PolyBench/GPU project [46] attempts to address these issues 13 through auto-tuning, establishing the lack of native performance portability in the process. Our work also attempts to address this issue by dynamically assigning appropriate amounts of work regardless of device performance.…”
Section: Early Hardware Asymmetrymentioning
confidence: 99%