2023
DOI: 10.1137/21m1465032
|View full text |Cite
|
Sign up to set email alerts
|

Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores

Abstract: In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower-precision matrices, and a matrix product is formed by multiplying the constituents in low precision. We investigate the use of multiword arithmetic for improving the performance-accuracy tradeoff of matrix multiplication with mixed precision block fused multiply-add (FMA) hardware, focusing especially on the tensor cores available on NVIDIA GPUs. Building on a general block FMA framework, we develop a comprehensive err… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 27 publications
0
9
0
Order By: Relevance
“…It is possible that new operations recently introduced in the IEEE 754 Standard for FP arithmetic [27] and brie y presented in Section 1.3.2 replace these building blocks in a near future. Double-word arithmetic (and more generally, pair and multiple-word arithmetics) is slowly yet steadily gaining importance among numerical methods [32,43,35,17]: this makes a careful study of its error useful.…”
Section: Double-word and Pair Arithmeticsmentioning
confidence: 99%
See 1 more Smart Citation
“…It is possible that new operations recently introduced in the IEEE 754 Standard for FP arithmetic [27] and brie y presented in Section 1.3.2 replace these building blocks in a near future. Double-word arithmetic (and more generally, pair and multiple-word arithmetics) is slowly yet steadily gaining importance among numerical methods [32,43,35,17]: this makes a careful study of its error useful.…”
Section: Double-word and Pair Arithmeticsmentioning
confidence: 99%
“…The proof is given in the supplementary materials. Let us now compare the bounds of Algorithm 10 and Algorithm 11, i.e., the bounds ( 15) and (17), with the bound of Graillat et al 's algorithm [18] (derived from the relative error bound 3u 2 of Algorithm 5), namely…”
Section: Blockwise Computation Of the Sum Of Squaresmentioning
confidence: 99%
“…For instance, [23] expores the use of low precision tensor cores on Nvidia GPUs to do cascading matrices with whatever low precision the tensor core supports. Similarly, in [11] it is also shown how to perform multi-precision GEMM. Both extend the ideas in [13].…”
Section: A B Cmentioning
confidence: 99%
“…Early work on breaking up higher precision GEMM in terms of lower precision GEMM was given in [13] and extended in [11]. We view present paper as encompassing much of that work, although explained using the example of cascading FP64x2 in terms of FP64 matrix multiplication.…”
Section: A B Cmentioning
confidence: 99%
See 1 more Smart Citation