2013 42nd International Conference on Parallel Processing 2013
DOI: 10.1109/icpp.2013.16
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Runtime Selection for GPU

Abstract: It is often hard to predict the performance of a statically generated code. Hardware availability, hardware specification and problem size may change from one execution context to another. The main contribution of this work is an entirely automatic method aiming to predict execution times of semantically equivalent versions of affine loop nests on GPUs; then, to run the best performing one on GPU or CPU.To make accurate predictions, our framework relies on three consecutive stages: a static code generation, an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…Ref. [37] discusses the principles and methods of hybrid programming, where MATLAB integrates with other languages such as Visual C++. The results show that mixed programming with different tasks can be achieved by compiling different MATLAB programs, making the necessary settings and replacing the corresponding C++ code.…”
Section: Related Workmentioning
confidence: 99%
“…Ref. [37] discusses the principles and methods of hybrid programming, where MATLAB integrates with other languages such as Visual C++. The results show that mixed programming with different tasks can be achieved by compiling different MATLAB programs, making the necessary settings and replacing the corresponding C++ code.…”
Section: Related Workmentioning
confidence: 99%
“…Other techniques improve application performance on GPUs through addressing the problems of data transfer [58,59], thread divergence [60], data placement [61], synchronization overhead [62] and configuration tuning [63,64]. GPU resource sharing has been studied at both system [65,66] and architecture levels [67,68] to address the resource contention and performance interference.…”
Section: Scheduling On Acceleratormentioning
confidence: 99%
“…Although StarPU has the capability to schedule tasks to run on multi-core CPUs and GPUs simultaneously, when a task is submitted to SkePU, the best performing device for the given input size is selected, but only one device will execute the job. The work presented in [12] also considers different devices without the programmer's intervention, selecting the best device to run the computation but never resorting to CPU/GPU wide computations. StreamIt [6] and Lime [7] provide linguistics constructions to express task and data-parallel computations.…”
Section: Related Workmentioning
confidence: 99%
“…The work distinguishes itself from the current state of the art by supporting the execution of arbitrary multi-kernel compound computations, having in mind data locality requirements. The current state of the art either exposes the heterogeneity to the programmer [11,5] or selectively directs the computations exclusively to one of the available CPU or GPU back-ends [1,2,3,4,12,13]. In turn, the proposals that tackle the transparent conjoint use of both CPUs and GPUs either restrict their scope to the execution of single kernels [14,15,16] or require previous knowledge on the computation to run [17].…”
Section: Introductionmentioning
confidence: 99%