Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2012
DOI: 10.1145/2145816.2145819
|View full text |Cite
|
Sign up to set email alerts
|

A performance analysis framework for identifying potential benefits in GPGPU applications

Abstract: Tuning code for GPGPU and other emerging many-core platforms is a challenge because few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this paper, we present a performance analysis framework that can help shed light on such bottlenecks for GPGPU applications. Although a handful of GPGPU profiling tools exist, most of the traditional tools, unfortunately, simply provide programmers with a variety of measurements and metrics obtained by running applications, and it is often … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
71
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 154 publications
(72 citation statements)
references
References 19 publications
0
71
0
Order By: Relevance
“…In some recent papers [34,35], the influence on execution time of the implementation factors, such as processor occupancy, thread synchronization, organization of memory accesses etc. is analyzed.…”
Section: Introductionmentioning
confidence: 99%
“…In some recent papers [34,35], the influence on execution time of the implementation factors, such as processor occupancy, thread synchronization, organization of memory accesses etc. is analyzed.…”
Section: Introductionmentioning
confidence: 99%
“…Focuses only on Real-Time Embedded System MARTE [7] Supports shed light of bottlenecks of GPGPU applications. Supports programmers in measurements as well as metrics during run time it assumes that a memory instruction is always followed by consecutive dependent instructions; hence, MLP is always one.…”
Section: Resultsmentioning
confidence: 99%
“…This approach enables to obtain performance measures such as throughout and response time throughout software life-cycle. Moreover, Sim, Jaewoong, et al [7] proposed a framework in order to analyze the performance, which supports shed light of bottlenecks of GPGPU applications. In addition, this framework helps GPGPU Profile tools and supports programmers in measurements as well as metrics during run time.…”
Section: Primary Studiesmentioning
confidence: 99%
“…17 candidate features were assembled from a previous study of performance counters [34], and computed theoretical values [35]. For each candidate feature they compute its coarsening delta, reflecting the change in each feature value caused by coarsening: f ∆ = ( f a f ter − f be f ore )/ f be f ore , adding it to the feature set.…”
Section: Case Study B: Opencl Thread Coarsening Factormentioning
confidence: 99%