The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2016
DOI: 10.1109/tc.2016.2584067
|View full text |Cite
|
Sign up to set email alerts
|

Area Efficient and Fast Combined Binary/Decimal Floating Point Fused Multiply Add Unit

Abstract: In this work we present a new 64-bit floating point Fused Multiply Add (FMA) unit that can perform both binary and decimal addition, multiplication, and fused-multiply-add operations. The presented FMA has 6% less delay than the fastest stand-alone decimal unit and 23% less area than both binary and decimal units together. These results were achieved by the use of: 1) column by column reduction to reduce the partial products in the multiplier tree, 2) a new leading zeros detector that produces its output in ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 18 publications
(35 reference statements)
0
9
0
Order By: Relevance
“…LZA is pre-corrected operand to calculate the number of leading zeros. LZA is composed of two vectors computation followed by leading zero detector(LZD) [13]. In order to make this fused FDP much faster pipeline concepts are implemented, by replacing traditional ripple Double based number system(DBNS) needs O( k logk ) addition operations to perform k-bit multiplication operation [14].…”
Section: Proposed Architecturesmentioning
confidence: 99%
“…LZA is pre-corrected operand to calculate the number of leading zeros. LZA is composed of two vectors computation followed by leading zero detector(LZD) [13]. In order to make this fused FDP much faster pipeline concepts are implemented, by replacing traditional ripple Double based number system(DBNS) needs O( k logk ) addition operations to perform k-bit multiplication operation [14].…”
Section: Proposed Architecturesmentioning
confidence: 99%
“…This effect is due to the fact that a set of finite radix-10 numbers becomes periodic when represented in radix-2 notation. Wahba et al [11] present a solution reducing by 6% percent the latency of an FP decimal unit compared to SoA solutions, and saving 23% of the total area compared to solutions that include two FP units (for binary and decimal support, respectively). Decimal FPUs are characterized by a longer critical path and a larger area than binary units since representing a decimal digit requires four bits.…”
Section: Alternative Formatsmentioning
confidence: 99%
“…However, in this state-of-the-art approach [ 24 ], the multipliers and the adder tree are still two separate computation components. On the other hand, some previous multiply-accumulate (MAC) designs [ 25 , 26 , 27 , 28 ] have tried to reduce the overheads caused by final additions of multiplications. However, since these MAC designs [ 25 , 26 , 27 , 28 ] assume that only one multiplier is used, their approaches cannot be directly applied to the design of 2-D convolver hardware circuit.…”
Section: Introductionmentioning
confidence: 99%