Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores

Fasi, Massimiliano; Higham, Nicholas J.; Lopez, Florent; Mary, Théo; Mikaitis, Mantas

doi:10.1137/21m1465032

Cited by 7 publications

(9 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is possible that new operations recently introduced in the IEEE 754 Standard for FP arithmetic [27] and brie y presented in Section 1.3.2 replace these building blocks in a near future. Double-word arithmetic (and more generally, pair and multiple-word arithmetics) is slowly yet steadily gaining importance among numerical methods [32,43,35,17]: this makes a careful study of its error useful.…”

Section: Double-word and Pair Arithmeticsmentioning

confidence: 99%

See 1 more Smart Citation

Accurate Calculation of Euclidean Norms Using Double-word Arithmetic

Lefèvre

Louvet

Muller

et al. 2023

ACM Trans. Math. Softw.

View full text Add to dashboard Cite

We consider the computation of the Euclidean (or L2) norm of an n -dimensional vector in floating-point arithmetic. We review the classical solutions used to avoid spurious overflow or underflow and/or to obtain very accurate results. We modify a recently published algorithm (that uses double-word arithmetic) to allow for a very accurate solution, free of spurious overflows and underflows. To that purpose, we use a double-word square-root algorithm of which we provide a tight error analysis. The returned L2 norm will be within very slightly more than 0.5 ulp from the exact result, which means that we will almost always provide correct rounding.

show abstract

Section: Double-word and Pair Arithmeticsmentioning

confidence: 99%

“…The proof is given in the supplementary materials. Let us now compare the bounds of Algorithm 10 and Algorithm 11, i.e., the bounds ( 15) and (17), with the bound of Graillat et al 's algorithm [18] (derived from the relative error bound 3u 2 of Algorithm 5), namely…”

Section: Blockwise Computation Of the Sum Of Squaresmentioning

confidence: 99%

Accurate Calculation of Euclidean Norms Using Double-word Arithmetic

Lefèvre

Louvet

Muller

et al. 2023

ACM Trans. Math. Softw.

View full text Add to dashboard Cite

show abstract

“…For instance, [23] expores the use of low precision tensor cores on Nvidia GPUs to do cascading matrices with whatever low precision the tensor core supports. Similarly, in [11] it is also shown how to perform multi-precision GEMM. Both extend the ideas in [13].…”

Section: A B Cmentioning

confidence: 99%

“…Early work on breaking up higher precision GEMM in terms of lower precision GEMM was given in [13] and extended in [11]. We view present paper as encompassing much of that work, although explained using the example of cascading FP64x2 in terms of FP64 matrix multiplication.…”

Section: A B Cmentioning

confidence: 99%

“…The advent of processors with new floating point precisions, including various low precision formats (like half-precision), has raised the question of how to leverage the higher performance of low-precision arithmetic into high performance for high-precision arithmetic [1,11,13]. In this paper, we gain insight into possibilities by primarily focusing on one end of the spectrum: the implementation of double-double precision (FP64x2) matrix-matrix multiplication (GEMM) in terms of high-performance double precision (FP64) GEMM.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Cascading GEMM: High Precision from Low Precision

Parikh¹,

Geijn²,

Henry³

2023

Preprint

View full text Add to dashboard Cite

This paper lays out insights and opportunities for implementing higher-precision matrix-matrix multiplication (GEMM) from (in terms of) lower-precision high-performance GEMM. The driving case study approximates double-double precision (FP64x2) GEMM in terms of double precision (FP64) GEMM, leveraging how the BLAS-like Library Instantiation Software (BLIS) framework refactors the Goto Algorithm. With this, it is shown how approximate FP64x2 GEMM accuracy can be cast in terms of ten "cascading" FP64 GEMMs. Promising results from preliminary performance and accuracy experiments are reported. The demonstrated techniques open up new research directions for more general cascading of higher-precision computation in terms of lower-precision computation for GEMM-like functionality.

show abstract

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs

Habib,

Finkelstein,

Niklasson

2024

J. Chem. Theory Comput.

View full text Add to dashboard Cite

In recent years, a new kind of accelerated hardware has gained popularity in the artificial intelligence (AI) community which enables extremely high-performance tensor contractions in reduced precision for deep neural network calculations. In this article, we exploit Nvidia Tensor cores, a prototypical example of such AI-hardware, to develop a mixed precision approach for computing a dense matrix factorization of the inverse overlap matrix in electronic structure theory, S –1. This factorization of S –1, written as ZZ T = S –1, is used to transform the general matrix eigenvalue problem into a standard matrix eigenvalue problem. Here we present a mixed precision iterative refinement algorithm where Z is given recursively using matrix–matrix multiplications and can be computed with high performance on Tensor cores. To understand the performance and accuracy of Tensor cores, comparisons are made to GPU-only implementations in single and double precision. Additionally, we propose a nonparametric stopping criteria which is robust in the face of lower precision floating point operations. The algorithm is particularly useful when we have a good initial guess to Z, for example, from previous time steps in quantum-mechanical molecular dynamics simulations or from a previous iteration in a geometry optimization.

show abstract

Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores

Cited by 7 publications

References 27 publications

Accurate Calculation of Euclidean Norms Using Double-word Arithmetic

Accurate Calculation of Euclidean Norms Using Double-word Arithmetic

Cascading GEMM: High Precision from Low Precision

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs

Contact Info

Product

Resources

About