Proceedings of the 29th ACM on International Conference on Supercomputing 2015
DOI: 10.1145/2751205.2751213
|View full text |Cite
|
Sign up to set email alerts
|

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations

Abstract: A GPU's computing power lies in its abundant memory bandwidth and massive parallelism. However, its hardware thread schedulers, despite being able to quickly distribute computation to processors, often fail to capitalize on program characteristics effectively, achieving only a fraction of the GPU's full potential. Moreover, current GPUs do not allow programmers or compilers to control this thread scheduling, forfeiting important optimization opportunities at the program level. This paper presents a transformat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 80 publications
(24 citation statements)
references
References 51 publications
0
20
0
Order By: Relevance
“…Our work explicitly finds occupancy bounds across a range of GPUs in Section 5.1. Many instances of previous work (which use the persistent thread model) also acknowledge this bound [5,6,11,13,19,21,25,[30][31][32], adding to the empirical evidence for this execution model.…”
Section: Occupancy-bound Execution Modelmentioning
confidence: 98%
“…Our work explicitly finds occupancy bounds across a range of GPUs in Section 5.1. Many instances of previous work (which use the persistent thread model) also acknowledge this bound [5,6,11,13,19,21,25,[30][31][32], adding to the empirical evidence for this execution model.…”
Section: Occupancy-bound Execution Modelmentioning
confidence: 98%
“…In the literature, a large collection of program transformations to improve performance of GPU applications is available, see e.g. [2,13,17,20]. We argue that such transformations should be specified formally, and whenever the transformation is applied to the code, the corresponding specifications also should be transformed, in such a way that the resulting program can be verified again (provided the original program could be verified).…”
Section: Correctness and Compiler Optimisationsmentioning
confidence: 99%
“…This allows GPU resources to be simultaneously shared among kernels. However, if this feature is not effectively employed, some resources might remain underutilized while a kernel is running [28]. Therefore, it is beneficial to allocate these unused resources to other kernels.…”
Section: Gpu Architecture and Programming Modelsmentioning
confidence: 99%