Proceedings of the 36th ACM International Conference on Supercomputing 2022
DOI: 10.1145/3524059.3532392
|View full text |Cite
|
Sign up to set email alerts
|

Toward accelerated stencil computation by adapting tensor core unit on GPU

Abstract: The Tensor Core Unit (TCU) has been increasingly adopted on modern high performance processors, specialized in boosting the performance of general matrix multiplication (GEMM). Due to its highly optimized hardware design, TCU can significantly accelerate GEMM-based operations widely used in scientific as well as deep learning applications. However, there is few work exploiting TCU to accelerate non-GEMM operations such as stencil computation that is also important in the field of high performance computing. To… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(2 citation statements)
references
References 40 publications
0
2
0
Order By: Relevance
“…Nearly a decade later, with the surge of Artificial Intelligence (AI), the community realized that the performance of GPUs was not high enough to properly handle the new Deep Learning models being developed. For this reason, near 2017, NVIDIA introduced tensor cores [3][4][5][6][7][8][9][10][11][12] inside the chip to further accelerate the performance of all AI applications. GPU Tensor cores are Application Specific Integrated Circuits (ASICs), or simply specific-purpose cores that perform fast matrix multiply accumulate (MMA) operations.…”
Section: From General Purpose To Specific Purposementioning
confidence: 99%
See 1 more Smart Citation
“…Nearly a decade later, with the surge of Artificial Intelligence (AI), the community realized that the performance of GPUs was not high enough to properly handle the new Deep Learning models being developed. For this reason, near 2017, NVIDIA introduced tensor cores [3][4][5][6][7][8][9][10][11][12] inside the chip to further accelerate the performance of all AI applications. GPU Tensor cores are Application Specific Integrated Circuits (ASICs), or simply specific-purpose cores that perform fast matrix multiply accumulate (MMA) operations.…”
Section: From General Purpose To Specific Purposementioning
confidence: 99%
“…Successful research has been done in the recent years. In the case of tensor cores, new ways have been proposed to further accelerate arithmetic reductions [16,13,[5][6][7][8][9][10][11][12][17][18][19][20][21] prefix sum [4-12, 17-21, 22-29] Fast Fourier Transform [22], [10], [23], [5], stencil computations for PDE simulations [11] and even fractals [14,. In general, all of these works achieve significant higher performance when compared to doing it traditionally in GPU.…”
Section: New Research Opportunitiesmentioning
confidence: 99%