2017 IEEE 24th Symposium on Computer Arithmetic (ARITH) 2017
DOI: 10.1109/arith.2017.20
|View full text |Cite
|
Sign up to set email alerts
|

High-Precision Anchored Accumulators for Reproducible Floating-Point Summation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 4 publications
0
8
0
Order By: Relevance
“…For instance, the Kulisch long accumulator, which is the cornerstone algorithm of ExBLAS, is designed to handle severe (ill-conditioned) cases with very broad dynamic ranges, while in practice "100 bits suffice for many HPC applications" as noted by David Bailey at ARITH-21 [14]. This idea inspired the ARM team (Lutz, Burgess et al) to design a mini long accumulator with the limited range [15,16]. Therefore, we foresee to explore this motivated-by-practice idea of moderately conditioned problems with moderate dynamic ranges in order to derive a lightweight algorithmic solution from ExBLAS.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, the Kulisch long accumulator, which is the cornerstone algorithm of ExBLAS, is designed to handle severe (ill-conditioned) cases with very broad dynamic ranges, while in practice "100 bits suffice for many HPC applications" as noted by David Bailey at ARITH-21 [14]. This idea inspired the ARM team (Lutz, Burgess et al) to design a mini long accumulator with the limited range [15,16]. Therefore, we foresee to explore this motivated-by-practice idea of moderately conditioned problems with moderate dynamic ranges in order to derive a lightweight algorithmic solution from ExBLAS.…”
Section: Introductionmentioning
confidence: 99%
“…It is also possible to improve numerical properties of summation when it is sufficient to use floating point numbers belonging to a smaller range. 8 The GNU MPFR 7 Library provides multiple-precision floating-point computations with correct rounding. Several papers show how to improve the performance of accurate summation of floating-point numbers using parallel processing.…”
Section: Related Workmentioning
confidence: 99%
“…The ExBLAS-based approach with its cornerstone Kulisch long accumulator Kulisch (2013) is robust but expensive since it is designed to cover severe (ill-conditioned) cases with very broad dynamic ranges. Motivated by “100 bits suffice for many HPC applications” as noted by David Bailey at ARITH-21 Bailey (2013) and a mini accumulator from the ARM team Lutz and Hinds (2017); Burgess et al (2019), we derive a faster but less generic version using FPEs, which is the other core algorithmic component in the ExBLAS approach, aiming to adjust the algorithm to the problem at hand. As a consequence, we also address the common issue of sparse iterative solvers—the accuracy while computing the residual—and propose to use solutions that offer reproducibility (and potentially correct-rounding) only while computing the corresponding dot products. Hence, we derive two hybrid (MPI + OpenMP tasks), reproducible, and accurate dot products using ExBLAS and FPEs. Finally, we demonstrate applicability and feasibility of the aforementioned idea with the ExBLAS- and FPE-based approaches in the hybrid MPI + OpenMP implementation of PCG on an example of a 3D Poisson’s equation with 27 stencil points as well as several test matrices from the SuiteSparse matrix collection. This extends our previous results with the pure MPI implementation of PGC Iakymchuk et al (2019a) to the more complex double-level dot products and reductions with dynamic scheduling of the tasks. …”
Section: Introductionmentioning
confidence: 99%
“…The ExBLAS-based approach with its cornerstone Kulisch long accumulator Kulisch (2013) is robust but expensive since it is designed to cover severe (ill-conditioned) cases with very broad dynamic ranges. Motivated by “100 bits suffice for many HPC applications” as noted by David Bailey at ARITH-21 Bailey (2013) and a mini accumulator from the ARM team Lutz and Hinds (2017); Burgess et al (2019), we derive a faster but less generic version using FPEs, which is the other core algorithmic component in the ExBLAS approach, aiming to adjust the algorithm to the problem at hand.…”
Section: Introductionmentioning
confidence: 99%