2012 IEEE 10th International Symposium on Parallel and Distributed Processing With Applications 2012
DOI: 10.1109/ispa.2012.92
|View full text |Cite
|
Sign up to set email alerts
|

Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs

Abstract: Abstract-The NVIDIA graphics processing units (GPUs) are playing an important role as general purpose programming devices. The implementation of parallel codes to exploit the GPU hardware architecture is a task for experienced programmers. The threadblock size and shape choice is one of the most important user decisions when a parallel problem is coded. The threadblock configuration has a significant impact on the global performance of the program. While in CUDA parallel programming model it is always necessar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
6
0
2

Year Published

2012
2012
2015
2015

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 11 publications
(12 reference statements)
1
6
0
2
Order By: Relevance
“…OpenCL programs are compiled just in time for execution and can be used together with Mi-AccLib or other run-time libraries. These works [16][17][18] experienced a performance penalty on the NVIDIA GPU, due to the OpenCL abstraction layer. Thus, we have disabled OpenCL support as it is not optimized for GPUs at the moment, and real gains on GPUs can only be seen through optimized code as there are additional overheads from data movement.…”
Section: Related Workmentioning
confidence: 99%
“…OpenCL programs are compiled just in time for execution and can be used together with Mi-AccLib or other run-time libraries. These works [16][17][18] experienced a performance penalty on the NVIDIA GPU, due to the OpenCL abstraction layer. Thus, we have disabled OpenCL support as it is not optimized for GPUs at the moment, and real gains on GPUs can only be seen through optimized code as there are additional overheads from data movement.…”
Section: Related Workmentioning
confidence: 99%
“…Programming an application on the GPU is a non-trivial task, and many key factors must be exploited in order for the application to achieve optimal performance [20]. One of the factors is to have every work-item within a warp execute the same set of instructions that follow the same execution path.…”
Section: Gpu Architecturementioning
confidence: 99%
“…Caches a®ect application performance in a signi¯-cant manner, as con¯rmed by several researchers. [2][3][4][5][6][7][8][9][10][11] This makes management of GPU caches extremely important. While CPU cache management has been studied over years, GPU cache management is a relatively new research¯eld.…”
Section: Introductionmentioning
confidence: 99%