Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2010
DOI: 10.1145/1693453.1693470
|View full text |Cite
|
Sign up to set email alerts
|

An adaptive performance modeling tool for GPU architectures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
103
0
2

Year Published

2010
2010
2015
2015

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 197 publications
(105 citation statements)
references
References 15 publications
0
103
0
2
Order By: Relevance
“…The value, ITILP, models the possibility of inter-thread instruction-level parallelism in GPGPUs. The concept of ITILP was introduced in Baghsorkhi et al [11]. In particular, instructions may issue from multiple warps on a GPGPU; thus, we consider global ILP (i.e., ILP among warps) rather than warp-local ILP (i.e., the ILP of one warp).…”
Section: Execution Time Modelingmentioning
confidence: 99%
See 2 more Smart Citations
“…The value, ITILP, models the possibility of inter-thread instruction-level parallelism in GPGPUs. The concept of ITILP was introduced in Baghsorkhi et al [11]. In particular, instructions may issue from multiple warps on a GPGPU; thus, we consider global ILP (i.e., ILP among warps) rather than warp-local ILP (i.e., the ILP of one warp).…”
Section: Execution Time Modelingmentioning
confidence: 99%
“…To provide a formal framework to study this problem, Baghsorkhi et al introduced the concept of balanced GPGPU computation [11]. This model represents a GPGPU computation using the computation carried by an average warp.…”
Section: Other Performance Modeling Techniques and Tools 61mentioning
confidence: 99%
See 1 more Smart Citation
“…GPU performance modeling has been tackled in some valuable research works [1,5,20], but none of them deals with data transfers between CPU and GPU and the use of streams. To the best of our knowledge, there is only one research work focused on CUDA streams performance [8].…”
Section: Introductionmentioning
confidence: 99%
“…Acceleration of applications by GPU shared memory is developed on single GPU [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22], multi-GPU [23] and GPU cluster [24][25][26]. Since the capacity of GPU shared memory is restricted, accessing them by multiple threads often leads to the problem of bank conflict, which is one of key factors to deteriorate the performance of CUDA kernels.…”
mentioning
confidence: 99%