2013 IEEE 21st Symposium on Computer Arithmetic 2013
DOI: 10.1109/arith.2013.19
|View full text |Cite
|
Sign up to set email alerts
|

Accurate Parallel Floating-Point Accumulation

Abstract: Abstract-Using parallel associative reduction, iterative refinement, and conservative termination detection, we show how to use tree reduce parallelism to compute correctly rounded floating-point sums in O(log N ) depth at arbitrary throughput. Our parallel solution shows how we can continue to exploit Moore's Law scaling in transistor count to accelerate floatingpoint performance even when clock rates remain flat. Empirical evidence suggests our iterative algorithm only requires two tree reduce passes to conv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…6 Indeed, their methods can involve as many as O(n) passes over the data. Kadric et al [22] provide a parallel pipelined method that takes a similar approach to the algorithm of Leuprecht and Oberaigner, while improving its convergence in practice, but their method nevertheless depends on inherently sequential pipelining and iterative refinement operations. Recently, Demmel and Nguyen [11] present a parallel floatingpoint summation method based on using a superaccumulator, but, like the previous sequential superaccumulator methods cited above, their method does not utilize a carry-free intermediate representation; hence, it has an inherently sequential carry-propagation step as a part of its "inner loop" computation.…”
Section: Previous Related Resultsmentioning
confidence: 99%
“…6 Indeed, their methods can involve as many as O(n) passes over the data. Kadric et al [22] provide a parallel pipelined method that takes a similar approach to the algorithm of Leuprecht and Oberaigner, while improving its convergence in practice, but their method nevertheless depends on inherently sequential pipelining and iterative refinement operations. Recently, Demmel and Nguyen [11] present a parallel floatingpoint summation method based on using a superaccumulator, but, like the previous sequential superaccumulator methods cited above, their method does not utilize a carry-free intermediate representation; hence, it has an inherently sequential carry-propagation step as a part of its "inner loop" computation.…”
Section: Previous Related Resultsmentioning
confidence: 99%
“…Lot of questions and doubts have been arised repeatedly about the implementations and fruitful outcomes of AI because of distrustful results and unreliability in multiple segments like vision models, visual recognition, Natural Language Processing (NLP), etc. To protect future society, these hazards should be addressed fast [57]. Although recent researches of deep learning have been performed noticeably, there are lot of darknesses in the area of improvements.…”
Section: Artificial Intelligencementioning
confidence: 99%
“…In addition to these solutions, there are a number of adaptive methods for exactly summing 𝑛 floating point numbers using various other data structures for representing intermediate results, which do not consider the security or privacy of the data. Further, these methods, which include ExBLAS [17] and algorithms by Zhu and Hayes [52,53], Demmel and Hida [21,22], Rump et al [46], Priest [43], Malcolm [39], Leuprecht and Oberaigner [38], Kadric et al [35], and Demmel and Nguyen [23], are not amenable to conversion to secure protocols with few rounds.…”
Section: Introductionmentioning
confidence: 99%