2011 IEEE 17th International Symposium on High Performance Computer Architecture 2011
DOI: 10.1109/hpca.2011.5749745
|View full text |Cite
|
Sign up to set email alerts
|

A quantitative performance analysis model for GPU architectures

Abstract: We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs. Our model identifies GPU program bottlenecks and quantitatively analyzes performance, and thus allows programmers and architects to predict the benefits of potential program optimizations and architectural improvements. In particular, we use a microbenchmark-based approach to develop a throughput model for three major components of GPU execution time: the instruction pipeline, shared memory access, and global memory access.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
104
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 202 publications
(105 citation statements)
references
References 16 publications
1
104
0
Order By: Relevance
“…Zhang et al [38] developed a microbenchmark-based performance model that allows programmers and architects to identify GPU program bottlenecks and predict the benefits of potential program optimizations and architectural improvements. Our work focuses on real GPU applications instead of microbenchmarks.…”
Section: G Discussionmentioning
confidence: 99%
“…Zhang et al [38] developed a microbenchmark-based performance model that allows programmers and architects to identify GPU program bottlenecks and predict the benefits of potential program optimizations and architectural improvements. Our work focuses on real GPU applications instead of microbenchmarks.…”
Section: G Discussionmentioning
confidence: 99%
“…Zhang et al [23] have presented a quantitative performance analysis model, based on micro-benchmarks for NVIDIA GeForce 200-series GPUs. They have developed a throughput model for three components of GPU execution time: the instruction pipeline, shared memory access, and global memory access.…”
Section: It Is Common To Present the Parameters Of The Bsp Model As Amentioning
confidence: 99%
“…Recently, graphics cards or graphics processing units (GPU), introduced primarily for high-end gaming requiring high resolution, are now intensively being used, as a co-processor to the CPU, for general purpose computing [2,3]. The GPU itself is a multi-core processor having support for thousands of threads [4] running concurrently. GPUs are result of dozens of streaming processors with hundreds of core aligned in a particular way forming a single hardware unit.…”
Section: Introductionmentioning
confidence: 99%
“…The high powerful Quadro-6000 and GTX-260 is well suited for desktops with power requirement of 204W and 182W respectively. 4 . One difference between CUDA and OpenCL is that CUDA is specific for GPU devices whereas OpenCL is heterogeneous and targets all devices conforming its specification [5], [6].…”
Section: Introductionmentioning
confidence: 99%