2013
DOI: 10.1145/2400682.2400713
|View full text |Cite
|
Sign up to set email alerts
|

Polyhedral parallel code generation for CUDA

Abstract: This article addresses the compilation of a sequential program for parallel execution on a modern GPU. To this end, we present a novel source-to-source compiler called PPCG. PPCG singles out for its ability to accelerate computations from any static control loop nest, generating multiple CUDA kernels when necessary. We introduce a multilevel tiling strategy and a code generation scheme for the parallelization and locality optimization of imperfectly nested loops, managing memory and exposing concurrency accord… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
153
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 269 publications
(153 citation statements)
references
References 37 publications
0
153
0
Order By: Relevance
“…An annotation approach is described in [6], based on the Platform-Neutral Compute Intermediate Language [4]. This used the code generator in [35] to generate CUDA and OpenCL code for multiple compute platforms.…”
Section: Related Workmentioning
confidence: 99%
“…An annotation approach is described in [6], based on the Platform-Neutral Compute Intermediate Language [4]. This used the code generator in [35] to generate CUDA and OpenCL code for multiple compute platforms.…”
Section: Related Workmentioning
confidence: 99%
“…This model has been well studied and numerous source-to-source compilation tools have evolved, such as PluTo [5], PPCG [27], Par4ALL [25], or the ROSE compiler infrastructure [23] with its PolyOpt/C optimizer. These frameworks traditionally aim for an automatic OpenMP and SIMD parallelization of sequential CPU codes; some (e.g., PPCG) are also capable of generating CUDA or OpenCL code for GPUs.…”
Section: B Parallelization Toolsmentioning
confidence: 99%
“…Existing tools, such as Par4all, PIPS, and PluTo, are able to parallelize sequential program parts at certain conditions [3], [6], [25], [27]. For instance, PluTo is capable of transforming a nested loop if it is polyhedral, i.e., all array accesses within the loop are affine functions of the loop iterators (for details see Section III-0c).…”
Section: Introductionmentioning
confidence: 99%
“…The most important thing is to organize the available resources of GPU properly. When the GPU resource is well organized, CPU can launch a kernel function to GPU to start computing [32].…”
Section: Some Optimization Principles For Gpu Programmingmentioning
confidence: 99%
“…For threads in different block, the communication must go through the global memory, which will slow down the calculation speed. The general optimization principles of GPU programming can be summarized as [23,32]: -More threads are better, so as to deemphasize the memory access delay; -Avoid access of global memory; -Try to organize threads within one block. Note the number of threads within one block should be an integer multiple of the number within one warp; -Try to reduce the communication between device and host to avoid long delay.…”
Section: Some Optimization Principles For Gpu Programmingmentioning
confidence: 99%