2017
DOI: 10.1007/978-3-319-58943-5_49
|View full text |Cite
|
Sign up to set email alerts
|

Reproducible, Accurately Rounded and Efficient BLAS

Abstract: Abstract. Numerical reproducibility failures rise in parallel computation because floating-point summation is non-associative. Massively parallel and optimized executions dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. We introduce our RARE-BLAS (Reproducible, Accurately Rounded and Efficient BLAS) that ben… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 14 publications
0
9
0
Order By: Relevance
“…2) Dot Product: Our implementation of parallel reproducible dot and nrm2 is presented in [9]. For parallel dot product, we distinguish three steps.…”
Section: Parallel Rare Blasmentioning
confidence: 99%
See 4 more Smart Citations
“…2) Dot Product: Our implementation of parallel reproducible dot and nrm2 is presented in [9]. For parallel dot product, we distinguish three steps.…”
Section: Parallel Rare Blasmentioning
confidence: 99%
“…Note that the local result is not rounded, the dot product is transformed to a sum of non-overlapping floating-point numbers. The process is done in different ways depending on the vector size (more details in [9]). (2) Afterwards, local results of transformation are gathered by master thread.…”
Section: Parallel Rare Blasmentioning
confidence: 99%
See 3 more Smart Citations