Massimiliano Fasi scite author profile

We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: (1) accurately simulate NVIDIA tensor cores on conventional hardware; (2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and (3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensor cores as well as similar accelerators from other vendors, as they become available. Moreover, we identify a non-monotonicity issue affecting floating point multi-operand adders if the intermediate results are not normalized after each step.

show abstract

Stochastic rounding: implementation, error analysis and applications

Croci

Fasi

Higham

et al. 2022

R. Soc. open sci.

View full text Add to dashboard Cite

Stochastic rounding (SR) randomly maps a real number x to one of the two nearest values in a finite precision number system. The probability of choosing either of these two numbers is 1 minus their relative distance to x . This rounding mode was first proposed for use in computer arithmetic in the 1950s and it is currently experiencing a resurgence of interest. If used to compute the inner product of two vectors of length n in floating-point arithmetic, it yields an error bound with constant n u with high probability, where u is the unit round-off. This is not necessarily the case for round to nearest (RN), for which the worst-case error bound has constant nu . A particular attraction of SR is that, unlike RN, it is immune to the phenomenon of stagnation, whereby a sequence of tiny updates to a relatively large quantity is lost. We survey SR by discussing its mathematical properties and probabilistic error analysis, its implementation, and its use in applications, with a focus on machine learning and the numerical solution of differential equations.

show abstract

Computing the Weighted Geometric Mean of Two Large-Scale Matrices and Its Inverse Times a Vector

Fasi¹,

Iannazzo²

2018

SIAM J. Matrix Anal. & Appl.

View full text Add to dashboard Cite

Abstract. We investigate different approaches for the computation of the action of the weighted geometric mean of two large-scale positive definite matrices on a vector. We derive several algorithms, based on numerical quadrature and the Krylov subspace, and compare them in terms of convergence speed and execution time. By exploiting an algebraic relation between the weighted geometric mean and its inverse, we show how these methods can be used for the solution of large linear system whose coefficient matrix is a weighted geometric mean. We derive two novel algorithms, based on Gauss-Jacobi quadrature, and tailor an existing technique based on contour integration. On the other hand, we adapt several existing Krylov subspace techniques to the computation of the weighted geometric mean. According to our experiments, both classes of algorithms perform well on some problems but there is no clear winner, while some problem-dependent recommendations are provided.

show abstract

An Arbitrary Precision Scaling and Squaring Algorithm for the Matrix Exponential

Fasi¹,

Higham²

2019

SIAM J. Matrix Anal. Appl.

View full text Add to dashboard Cite

The most popular algorithms for computing the matrix exponential are those based on the scaling and squaring technique. For optimal efficiency these are usually tuned to a particular precision of floating-point arithmetic. We design a new scaling and squaring algorithm that takes the unit roundoff of the arithmetic as input and chooses the algorithmic parameters in order to keep the forward error in the underlying Padé approximation below the unit roundoff. To do so, we derive an explicit expression for all the coefficients in an error expansion for Padé approximants to the exponential and use it to obtain a new bound for the truncation error. We also derive a new technique for selecting the internal parameters used by the algorithm, which at each step decides whether to scale or to increase the degree of the approximant. The algorithm can employ diagonal Padé approximants or Taylor approximants and can be used with a Schur decomposition or in transformation-free form. Our numerical experiments show that the new algorithm performs in a forward stable way for a wide range of precisions and that the most accurate of our implementations, the Taylor-based transformation-free variant, is superior to existing alternatives.

show abstract

Multiprecision Algorithms for Computing the Matrix Logarithm

Fasi¹,

Higham

2018

SIAM J. Matrix Anal. & Appl.

View full text Add to dashboard Cite

Abstract. Two algorithms are developed for computing the matrix logarithm in floating point arithmetic of any specified precision. The backward error-based approach used in the state of the art inverse scaling and squaring algorithms does not conveniently extend to a multiprecision environment, so instead we choose algorithmic parameters based on a forward error bound. We derive a new forward error bound for Padé approximants that for highly nonnormal matrices can be much smaller than the classical bound of Kenney and Laub. One of our algorithms exploits a Schur decomposition while the other is transformation-free and uses only the computational kernels of matrix multiplication and the solution of multiple right-hand side linear systems. For double precision computations the algorithms are competitive with the state of the art algorithm of Al-Mohy, Higham, and Relton implemented in logm in MATLAB. They are intended for computing environments providing multiprecision floating point arithmetic, such as Julia, MATLAB via the Symbolic Math Toolbox or the Multiprecision Computing Toolbox, or Python with the mpmath or SymPy packages. We show experimentally that the algorithms behave in a forward stable manner over a wide range of precisions, unlike existing alternatives.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.