2021
DOI: 10.2172/1814677
|View full text |Cite
|
Sign up to set email alerts
|

Advances in Mixed Precision Algorithms: 2021 Edition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…In 2020, the ECP community created a new multiprecision effort to design and develop new numerical algorithms that can exploit the speed provided by the lower-precision hardware while maintaining sufficient level of accuracy that is required by numerical modeling and simulations. Examples include: Mixed precision iterative refinement for a dense LU factorization in SLATE and a sparse LU factorization in SuperLU achieved 1.8× and 1.5× speedups, respectively; Mixed precision GMRES with iterative refinement in Trilinos achieved 1.4× speedup; Basis (CB) GMRES in Ginkgo achieved 1.4× speedup, and mixed precision sparse approximate inverse preconditioners achieved an average speedup of 1.2× [3]. These speedups coming from the mixed precision algorithms are "here to stay" as they will carry over to future hardware architectures.…”
Section: Algorithms: Then and Nowmentioning
confidence: 99%
See 1 more Smart Citation
“…In 2020, the ECP community created a new multiprecision effort to design and develop new numerical algorithms that can exploit the speed provided by the lower-precision hardware while maintaining sufficient level of accuracy that is required by numerical modeling and simulations. Examples include: Mixed precision iterative refinement for a dense LU factorization in SLATE and a sparse LU factorization in SuperLU achieved 1.8× and 1.5× speedups, respectively; Mixed precision GMRES with iterative refinement in Trilinos achieved 1.4× speedup; Basis (CB) GMRES in Ginkgo achieved 1.4× speedup, and mixed precision sparse approximate inverse preconditioners achieved an average speedup of 1.2× [3]. These speedups coming from the mixed precision algorithms are "here to stay" as they will carry over to future hardware architectures.…”
Section: Algorithms: Then and Nowmentioning
confidence: 99%
“…For this purpose, we describe several software projects that are rooted in mathematical libraries and application space, and investigate their performance improvements and sustainability: the xSDK (Extremescale Scientific Software Development Kit) and its con-Better Scientific Software stituent libraries such as Ginkgo, SLATE, SuperLU, and the laser-plasma modeling application WarpX. 3 We will describe critical facets of how software development methodologies and interdisciplinary teams have been transformed, leading to improvements in the software itself and why these advances are essential for nextgeneration science.…”
mentioning
confidence: 99%
“…For comparison, we also compute the matrix product in fp32 and fp64 arithmetics in hardware by using the Eigen C++ library. 1 For fp64 arithmetic, we use the default Eigen matrix multiplication implementation, for all other arithmetics we use the blocked FMA algorithm [6, Alg. 3.1] with a block FMA of dimension 1.…”
Section: P and Thusmentioning
confidence: 99%
“…The upcoming NVIDIA Hopper microarchitecture [32] adds yet more formats to the tensor cores (quarter precision): fp8-E5M2 (5 exponent and 2 significand bits) and fp8-E4M3 (4 exponent and 3 significand bits). Tensor cores provide a significant performance boost compared with standard floating-point units, and have been used with great success to accelerate numerical linear algebra algorithms [1], [5], [13], [14] [25]; see [20] for a survey of these algorithms. Other vendors also incorporate matrix arithmetic in their devices: for example, the accelerators in the AMD MI200 series contain units that can perform vector and matrix operations faster than their scalar counterparts [2], [3], [4].…”
mentioning
confidence: 99%