2012
DOI: 10.1049/iet-cdt.2011.0132
|View full text |Cite
|
Sign up to set email alerts
|

FPGA accelerator for floating-point matrix multiplication

Abstract: Abstract:This study treats architecture and implementation of a FPGA accelerator for double-precision floating-point matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns the result blocks to the host processor as soon as they are computed. This avoids output buffering, and simplifies placement and routing on the chip. The authors show that such architecture is especially wel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0
1

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(33 citation statements)
references
References 19 publications
(28 reference statements)
0
32
0
1
Order By: Relevance
“…Theoretical analysis of an 800 × 800 matrix multiplication shows an execution time of 10 7 cycles. Jovanović and Milutinović [3] present an architecture of = 252 processing elements with local memories to store the input matrices. Large matrices are multiplied by sending blocks to the accelerator.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Theoretical analysis of an 800 × 800 matrix multiplication shows an execution time of 10 7 cycles. Jovanović and Milutinović [3] present an architecture of = 252 processing elements with local memories to store the input matrices. Large matrices are multiplied by sending blocks to the accelerator.…”
Section: Related Workmentioning
confidence: 99%
“…The efforts to optimize the I/O result in a significant performance increase to 6,295 MFLOPS for = 124 and = 3 , i.e. 93% of the theoretical performance 0 (124,3) using equation (3).…”
Section: Overlapping Computation and Communicationmentioning
confidence: 99%
See 1 more Smart Citation
“…This is due to the fact that having FPGAs with limited resources it is hardly possible to instantiate that many PEs. A recent work [17] describes an architecture of linear array PEs, similar to those in [16], but achieving an optimal latency of order O(n 2 ) by exploiting full duplex communication with the host processor and at the cost of having it involved during addition of intermediary values.…”
Section: Matrix Multiplication Tradeoffs On Fpgasmentioning
confidence: 99%
“…Recently, field programmable gate arrays (FPGAs) have become widely used as accelerators of software operations [1] [2]. However, since an FPGA is always used along with a single configuration context, its benefits are limited to its programmability.…”
mentioning
confidence: 99%