2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors 2014
DOI: 10.1109/asap.2014.6868641
|View full text |Cite
|
Sign up to set email alerts
|

Performance modeling for highly-threaded many-core GPUs

Abstract: Highly-threaded many-core GPUs can provide high throughput for a wide range of algorithms and applications. Such machines hide memory latencies via the use of a large number of threads and large memory bandwidth. The achieved performance, therefore, depends on the parallelism exploited by the algorithm, the effectiveness of latency hiding, and the utilization of multiprocessors (occupancy). In this paper, we extend previously proposed analytical models, jointly addressing parallelism, latency-hiding, and occup… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
5
3
2

Relationship

2
8

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…The large memory bandwidth can also be used to hide memory latency. The achieved performance, therefore, depends on the parallelism exploited by the algorithm, the effectiveness of latency hiding, and the utilization of multiprocessors (occupancy) [63]. Based on the results in Table 6, observe that our CUDA implementation exhibits better performance and a shorter runtime on the Titan than on the GTX 960.…”
Section: Evaluation 2: Performances On Different Gpusmentioning
confidence: 92%
“…The large memory bandwidth can also be used to hide memory latency. The achieved performance, therefore, depends on the parallelism exploited by the algorithm, the effectiveness of latency hiding, and the utilization of multiprocessors (occupancy) [63]. Based on the results in Table 6, observe that our CUDA implementation exhibits better performance and a shorter runtime on the Titan than on the GTX 960.…”
Section: Evaluation 2: Performances On Different Gpusmentioning
confidence: 92%
“…A new model by Ma, Chamberlain & Agrawal (2014b) has recently been suggested for analyzing the complexities of parallel algorithms on graphics processors. This model is obtained from the combination of asymptotic and calibrated models.…”
Section: Complexity Analysismentioning
confidence: 99%
“…These four metrics help to identify performance bottlenecks. Ma et al [29] design an analysis framework for many-core architectures, bridging the gap between asymptotic models and calibrated models that quantitatively predict runtime. The framework jointly addresses parallelism, latency-hiding, and occupancy; helps to reduce the configuration space for tuning kernels; and reflects performance trends as the problem size and other parameters scale.…”
Section: Algorithmsmentioning
confidence: 99%