2023
DOI: 10.1145/3570638
|View full text |Cite
|
Sign up to set email alerts
|

Optimization Techniques for GPU Programming

Abstract: In the past decade, Graphics Processing Units have played an important role in the field of high-performance computing and they still advance new fields such as IoT, autonomous vehicles, and exascale computing. It is therefore important to understand how to extract performance from these processors, something that is not trivial. This survey discusses various optimization techniques found in 450 papers published in the last 14 years. We analyze the optimizations from different perspectives which shows that the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(19 citation statements)
references
References 313 publications
0
19
0
Order By: Relevance
“…Each call to the CUDA kernel creates a new Grid, which is composed of multiple Blocks. Each Block is composed of up to 1024 separate Threads (Hijma et al 2023). As shown in Figure 6, Grid can control the number of Blocks by setting three-dimensions: gridDim x .…”
Section: Folding and Integratingmentioning
confidence: 99%
“…Each call to the CUDA kernel creates a new Grid, which is composed of multiple Blocks. Each Block is composed of up to 1024 separate Threads (Hijma et al 2023). As shown in Figure 6, Grid can control the number of Blocks by setting three-dimensions: gridDim x .…”
Section: Folding and Integratingmentioning
confidence: 99%
“…Shared memory gives us much room for software optimization because of its programmability. For example, the access latency of global memory is 100 times lower than that of Shared memory in some GPU architectures [9]. Reasonable use of programmable Shared memory can significantly improve the performance of the computation, and related techniques are used in this paper.…”
Section: Gpu Hierarchy Memorymentioning
confidence: 99%
“…However, there is a lack of support and data for GPU hardware. In addition, the identification of residues in this algorithm does not consider that SIMD instructions [9] can further improve performance. The uneven allocation of computational resources in the Integration step leads to many wasted resources in the early stage of the algorithm process and a tight computational resource situation in the later stage.…”
Section: Introductionmentioning
confidence: 99%
“…One of the main barriers to entry for other researchers may be the perceived difficulty of GPU programming. Writing efficient code using CUDA or OpenCL requires careful consideration of memory access, balancing resource usage when mapping parallel processes to the hardware, and interaction between the CPU and GPU (Hijma et al 2022). There have been efforts to make GPU-accelerated value iteration more accessible.…”
Section: Introductionmentioning
confidence: 99%