2012
DOI: 10.1109/tcsi.2011.2161389
|View full text |Cite
|
Sign up to set email alerts
|

An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 19 publications
(15 citation statements)
references
References 21 publications
0
15
0
Order By: Relevance
“…A similar design is proposed in [5], which employs the multipleinput-multiple-output multiply-accumulate unit and a reduction unit to process multiple rows at one clock cycle, however the serial reduction limits the performance. Song Sun, et al make use of the input pattern vector (IPV) and map table to implement SpMV without pipeline stall and excessive zero-paddings [11], however, the storage of IPV and map table limits the dimension of the sparse matrix. K. Nagar, et al [12] implemented SpMV for large-scale sparse matrices on the Convey HC-1 with a novel streaming multiply-accumulator and local vector cache.…”
Section: Related Workmentioning
confidence: 99%
“…A similar design is proposed in [5], which employs the multipleinput-multiple-output multiply-accumulate unit and a reduction unit to process multiple rows at one clock cycle, however the serial reduction limits the performance. Song Sun, et al make use of the input pattern vector (IPV) and map table to implement SpMV without pipeline stall and excessive zero-paddings [11], however, the storage of IPV and map table limits the dimension of the sparse matrix. K. Nagar, et al [12] implemented SpMV for large-scale sparse matrices on the Convey HC-1 with a novel streaming multiply-accumulator and local vector cache.…”
Section: Related Workmentioning
confidence: 99%
“…Depending on the implementation, the meta-data for CSR is either pre-loaded into the bitstream or dynamically accessed from external memory. While earlier designs were restricted to on-die memory capacities (e.g., [18]), more recent designs incorporate memory hierarchies that can handle large data sets exceeding the available onchip memories [24,25,26,11,10,27,9,28,29,30,14,23].…”
Section: Related Workmentioning
confidence: 99%
“…In this way, vitality decrease strategies must be connected in all outline levels of the framework. Besides, as the best plan choices are gotten from the building and framework level, a mind full outline at these levels can diminish the force utilization extensively [8].…”
Section: Introductionmentioning
confidence: 99%