Sparse Matrix-Vector multiplication on FPGAs

Zhuo, Ling; Prasanna, Viktor K.

doi:10.1145/1046192.1046202

Cited by 180 publications

(125 citation statements)

References 10 publications

Supporting

Mentioning

125

Contrasting

Order By: Relevance

“…Zhuo, et al, employ a tree-based multiply accumulate unit and a reduction unit to perform multiple operations in parallel [10]. However, the structure of the reduction unit depends on the sparsity pattern and there are a large number of zero paddings to meet the alignment requirement of the adder tree.…”

Section: Related Workmentioning

confidence: 99%

A deeply-pipelined FPGA-based SpMV accelerator with a hardware-friendly storage scheme

Guo

Dou

Lei

et al. 2015

IEICE Electron. Express

View full text Add to dashboard Cite

This paper presents a high performance sparse matrix-vector multiplication (SpMV) accelerator on the field-programming gate array (FPGA). By exploiting a hardware-friendly storage scheme, named as Variable-Bit-Width Coordinate Block Quasi Compressed Sparse Row, the redundant computation and memory accesses can be reduced greatly through the nested block compression and variable-bit-width column-index encoding schemes. Based on the proposed compression scheme, a deeply-pipelined SpMV accelerator is implemented on a Xilinx Virtex XC7VX485T FPGA platform, which can handle sparse matrices with arbitrary size and sparsity pattern. Experimental results show that the proposed design can gain higher performance for most of the tested matrices and improve the utilization of the memory bandwidth up to 13×, compared with the previous works on the Convey platforms (HC-1 and HC-2ex) and Nvidia Tesla S1070 GPU platform.

show abstract

Section: Related Workmentioning

confidence: 99%

A deeply-pipelined FPGA-based SpMV accelerator with a hardware-friendly storage scheme

Guo

Dou

Lei

et al. 2015

IEICE Electron. Express

View full text Add to dashboard Cite

show abstract

“…In this paper, the sparse matrix is stored in a Compressed Sparse Row (CRS) format, in which only the nonzero matrix elements will be stored in contiguous memory locations. In CRS format, there are three vectors: val for nonzero matrix elements; col for the column index of the nonzero matrix elements; and ptr stores the locations in the val vector that start a new row (Zhuo and Prasanna, 2005). As an example, consider a simple SMVM operation with 5×5 sparse matrix A as follows: …”

Section: Sparse Matrix-vector Multiplicationmentioning

confidence: 99%

“…This algorithm is almost dominated by SMVM operations where the target matrix is extremely sparse, unsymmetrical and unstructured. This problem has also been investigated for acceleration with a FPGA solution in (McGettrick et al, 2008;Zhuo and Prasanna, 2005).…”

Section: Introductionmentioning

confidence: 99%

“…This challenge brings current applications and technology trends to motivate a paradigm shift in on-chip interconnect architectures from bus-based point-to-point network to packet-based switch network. This packet-based architecture is called Network-on-Chip (NoC) (Bertozzi and Benini, 2004;Wolf, 2004). The basic idea of the NoC is that we regard a System-on-Chip (SoC) device as a micro network of components.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Sparse matrix-vector multiplication on network-on-chip

et al. 2010

View full text Add to dashboard Cite

Abstract. In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.

show abstract

“…In [8], a block matrix multiplication algorithm is discussed for large n, and a floating-point MAC (Multiplier and ACcumulator) is implemented. In [6,22], FPGA-based designs for floatingpoint sparse matrix-vector multiplication are proposed and achieve high speedup over general-purpose processors. In [18], FPGA-based implementations of BLAS (Basic Linear Algebra Subprograms) operations are discussed.…”

Section: Linear Algebra On Fpgasmentioning

confidence: 99%