Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2008
DOI: 10.1145/1345206.1345220
|View full text |Cite
|
Sign up to set email alerts
|

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
395
0
9

Year Published

2009
2009
2017
2017

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 663 publications
(407 citation statements)
references
References 13 publications
2
395
0
9
Order By: Relevance
“…Because many programs contain loops, we perform a value analysis to determine loop bounds (if possible). The value analysis is also used to analyze memory access patterns, which have a significant impact on performance on GPUs [26].…”
Section: Static Code Feature Extractionmentioning
confidence: 99%
“…Because many programs contain loops, we perform a value analysis to determine loop bounds (if possible). The value analysis is also used to analyze memory access patterns, which have a significant impact on performance on GPUs [26].…”
Section: Static Code Feature Extractionmentioning
confidence: 99%
“…CUDA-enabled GPU architecture is memory-bound architecture, so reasonable data layout on CUDA and memory optimization is critical for performance improvement [12,11].…”
Section: Optimizationmentioning
confidence: 99%
“…GPUs are tuned for data parallelism, implementing the SIMD (Single Instruction -Multiple Data) processing model, allowing the execution of thousands of threads in parallel. GPUs have proven to be extremely efficient with matrix-style computations [9], providing a convincing speed-up of 2-3 orders of magnitude.…”
Section: Performance Considerationsmentioning
confidence: 99%