2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors 2012
DOI: 10.1109/asap.2012.19
|View full text |Cite
|
Sign up to set email alerts
|

A Performance Model for Memory Bandwidth Constrained Applications on Graphics Engines

Abstract: Abstract-Graphics engines are excellent execution platforms for high-throughput computations that exploit a large degree of available parallelism. The achieved performance is, however, highly dependent on the access patterns that the application imposes on the memory subsystem. Here, we propose an analytic model that helps improve the understanding of the performance of memory-limited kernels that employ random memory access schemes, especially as impacted by cache and various configuration parameters that can… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 16 publications
0
14
0
Order By: Relevance
“…The model presented in [19], which we will extend in the subsection that follows, characterizes algorithm performance in terms of the following factors: algorithmic complexity, f app , caching, f cache , and scheduling, f sched . The algorithmic complexity factor is expressed via a function…”
Section: B Calibrated Modeling Of Runtimementioning
confidence: 99%
See 3 more Smart Citations
“…The model presented in [19], which we will extend in the subsection that follows, characterizes algorithm performance in terms of the following factors: algorithmic complexity, f app , caching, f cache , and scheduling, f sched . The algorithmic complexity factor is expressed via a function…”
Section: B Calibrated Modeling Of Runtimementioning
confidence: 99%
“…If B r > B a × P/Q, multiple passes are needed to consume all the requested blocks of work. From [19], the number of active blocks B a is described in equation (4) in terms of the shared memory required by the application S B , the quantity of shared memory on each multiprocessor Z, the number of threads requested per thread block T r , the processor registers required by the application R T ×T r , the quantity of registers available per multiprocessor R, the maximally allowed thread blocks B max , and the maximally allowed threads T maxM P .…”
Section: B Calibrated Modeling Of Runtimementioning
confidence: 99%
See 2 more Smart Citations
“…A number of highperformance GPU algorithms have been developed, such as sorting [1], hashing [2], dynamic programming [3], graph algorithms [4], and other classic algorithms [5]. Many performance studies have also been conducted [6], [7] to understand the performance of GPU applications.…”
Section: Introductionmentioning
confidence: 99%