2021
DOI: 10.1002/spe.3041
|View full text |Cite
|
Sign up to set email alerts
|

Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS

Abstract: The roofline model not only provides a powerful tool to relate an application's performance with the specific constraints imposed by the target hardware but also offers a graphic representation of the balance between memory access cost and compute throughput. In this work, we present a strategy to break up the tight coupling between the precision format used for arithmetic operations and the storage format employed for memory operations. (At a high level, this idea is equivalent to compressing/decompressing th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 31 publications
0
6
0
Order By: Relevance
“…Anzt, Flegar, Grützmacher and Quintana-Ortí (2019b) propose this approach of decoupling the data storage format from the processing format, and they focus on storing the data at a lower precision than that at which the computations are performed. This approach is used in the papers mentioned at the end of Section 8.2 and for level 1 and level 2 BLAS by Grützmacher, Anzt and Quintana-Ortí (2021). Agullo et al (2020) propose a similar approach for flexible GMRES, using as compression either reduced precision or the lossy floating-point SZ compressor (Di and Cappello 2016).…”
Section: Decoupling Formats For Data Storage and Processingmentioning
confidence: 99%
“…Anzt, Flegar, Grützmacher and Quintana-Ortí (2019b) propose this approach of decoupling the data storage format from the processing format, and they focus on storing the data at a lower precision than that at which the computations are performed. This approach is used in the papers mentioned at the end of Section 8.2 and for level 1 and level 2 BLAS by Grützmacher, Anzt and Quintana-Ortí (2021). Agullo et al (2020) propose a similar approach for flexible GMRES, using as compression either reduced precision or the lossy floating-point SZ compressor (Di and Cappello 2016).…”
Section: Decoupling Formats For Data Storage and Processingmentioning
confidence: 99%
“…Orthogonally to all previous communication optimization efforts, our optimized variant of the GMRES algorithm reduces communication in the access to the Krylov basis during the iteration loop body. In more detail, our GMRES algorithm leverages Ginkgo's memory accessor, introduced in Anzt et al (2021) and Grützmacher et al (2021), to decouple the memory storage format from the arithmetic precision so as to maintain the Krylov basis vectors in a compact "reduced precision" format. This radically diminishes the memory access volume during the orthogonalization, while not affecting the convergence rate of the solver, yielding notable performance improvements.…”
Section: Introductionmentioning
confidence: 99%
“…This radically diminishes the memory access volume during the orthogonalization, while not affecting the convergence rate of the solver, yielding notable performance improvements. Concretely, we make the following contributions in our article:• We follow the ideas in Anzt et al (2021) and Grützmacher et al (2021) and use the therein presented “memory accessor” in order to decouple the memory storage format from the arithmetic precision, specifically applying this strategy to maintain the Krylov basis in reduced precision in memory while performing all arithmetic operations using full, hardware-supported ieee 64-bit double-precision (DP).• We analyze the benefits that result from casting the Krylov basis into different compact storage formats, including the natural ieee 32-bit single-precision (SP) and 16-bit half-precision (HP) as well as some other non- ieee fixed point-based alternatives enhanced with vector-wise normalization.• We integrate the mixed-precision GMRES algorithm into the Ginkgo sparse linear algebra library (https://ginkgo-project.github.io).• We provide strong practical evidence of the advantage of our approach by developing a high-performance realization of the solver for modern NVIDIA’s V100 GPUs and testing it on a considerable number of large-scale problems from the SuiteSparse Matrix Collection (Davis and Hu, 2011) (https://sparse.tamu.edu/).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The fourth paper titled, “Using Ginkgo's Memory Accessor for Improving the Accuracy of Memory‐Bound Low Precision Basic Linear Algebra Subprograms (BLAS)” by Quintana‐Ortí et al 4 demonstrates that memory‐bound applications operating on low precision data can increase their accuracy by relying on the memory accessor to perform all arithmetic operations in high precision. In particular, the authors demonstrate that memory‐bound BLAS operations (including the sparse matrix‐vector product) can be re‐engineered with the memory accessor and that the resulting accessor‐enabled BLAS routines achieve lower rounding errors while delivering the same performance as the fast low‐precision BLAS.…”
mentioning
confidence: 99%