2012
DOI: 10.1145/2370036.2145819
|View full text |Cite
|
Sign up to set email alerts
|

A performance analysis framework for identifying potential benefits in GPGPU applications

Abstract: Tuning code for GPGPU and other emerging many-core platforms is a challenge because few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this paper, we present a performance analysis framework that can help shed light on such bottlenecks for GPGPU applications. Although a handful of GPGPU profiling tools exist, most of the traditional tools, unfortunately, simply provide programmers with a variety of measurements and metrics obtained by running applications, and it is often … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 48 publications
(21 citation statements)
references
References 19 publications
(16 reference statements)
0
19
0
Order By: Relevance
“…In addition to the memory requirements of each stage, information about the computational characteristics of each stage is required. The estimated runtime could be inferred from hardware performance models [178,145,28,9]. Approaches as those described in Section 4.1.2 can be applied to estimate the suitability of different agent update stages for execution on a certain accelerator.…”
Section: Computational Profilingmentioning
confidence: 99%
“…In addition to the memory requirements of each stage, information about the computational characteristics of each stage is required. The estimated runtime could be inferred from hardware performance models [178,145,28,9]. Approaches as those described in Section 4.1.2 can be applied to estimate the suitability of different agent update stages for execution on a certain accelerator.…”
Section: Computational Profilingmentioning
confidence: 99%
“…Since very little information about the underlying GPU architecture is disclosed, it becomes very unlikely to build accurate simulators for each new GPU generation. Luckily, the results [6,10,17,20] show that we can have very good approximation of GPU performance using analytical approaches. However existing GPU performance models all rely on certain level of an application's implementation (C++ code, PTX code, assembly code.…”
Section: Introductionmentioning
confidence: 95%
“…Bakhoda et al [5] developed a detailed GPU simulator and the simulator also uses the PTX code as input. Recently, Sim et al [17] extended the MWP-CWP model and utilize the assembly code of CUDA kernel to predict performance. The quantitative GPU performance model proposed by Zhang and Owens [20] is also based on the native assembly code.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, we explain a model based on the approach of Sim et al [85]. Much performance modeling has been done for both CPUs and GPUs.…”
Section: Instruction-level Analysis and Tuningmentioning
confidence: 99%