2010
DOI: 10.1109/tcsi.2010.2091191
|View full text |Cite
|
Sign up to set email alerts
|

A High-Speed, Energy-Efficient Two-Cycle Multiply-Accumulate (MAC) Architecture and Its Application to a Double-Throughput MAC Unit

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
35
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 47 publications
(35 citation statements)
references
References 20 publications
0
35
0
Order By: Relevance
“…The area of a PCM cell with an access transistor is ∼ 25 F 2 (where F corresponds to the minimum lithographic pitch in a technology node), which could be reduced to ∼ 6 F 2 with a suitable diode based access device 28 . On the other hand, one bit SRAM area is ≥ 120 F 2 and the area of a 16-bit multiply-accumulate (MAC) required for neural network architectures is at least three orders of magnitude higher 28,29 . This results in trade-offs between the number of parallel computing units and on-chip memory for hardware implementations of neural networks using conventional CMOS technology.…”
Section: Discussionmentioning
confidence: 99%
“…The area of a PCM cell with an access transistor is ∼ 25 F 2 (where F corresponds to the minimum lithographic pitch in a technology node), which could be reduced to ∼ 6 F 2 with a suitable diode based access device 28 . On the other hand, one bit SRAM area is ≥ 120 F 2 and the area of a 16-bit multiply-accumulate (MAC) required for neural network architectures is at least three orders of magnitude higher 28,29 . This results in trade-offs between the number of parallel computing units and on-chip memory for hardware implementations of neural networks using conventional CMOS technology.…”
Section: Discussionmentioning
confidence: 99%
“…The twin precision [Sjalander and Larsson-Edefors 2009] technique is used to optimize an n-bit multiplier, where the n-bit multiplier is used to compute two n/2-bit multiplications in parallel. A MAC architecture using twin precision multiplication is proposed in Hoang et al [2010]. A new MAC architecture using a radix-4 modified Booth algorithm for fixed point is proposed in Seo and Kim [2010].…”
Section: Mac Design From the Literaturementioning
confidence: 99%
“…The multiplier in the MAC unit uses the Baugh-Wooley multiplier algorithm to generate the partial products, and reduces and reorganizes the partial products based on the high-performance multiplier tree scheme. 3 After the partial products are generated, they are clocked into the first stage of the pipeline and then made available to the carry-save adder (CSA). The CSA sums the partial products with the value in one of five selected accumulation registers.…”
Section: Datapath Descriptionmentioning
confidence: 99%