Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

Vuduc, Richard; Moon, Hyun-Jin

doi:10.1007/11557654_91

Cited by 88 publications

(49 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…They experiment on a K6, a Power3, and an Itanium II processor for a suite of 20 sparse matrices and validate the accuracy of the proposed performance model. Vuduc et al [26] extend the notion of blocking in order to exploit variable block shapes by decomposing the original matrix to a proper sum of submatrices storing each submatrix in a variation of the BCSR format. Their approach is tested on the Ultra2i, Pentium III-M, Power4, and Itanium II processors for a suite of 10 FEM matrices that contain dense subblocks.…”

Section: Related Workmentioning

confidence: 99%

“…For example, blocking implemented with the use of the Block Compressed Storage Row (BCSR) format was proposed by Im and Yelick [11] as a transformation to tame irregular accesses on the input vector and exploit its inherent reuse, like in dense matrix optimizations. Onedimensional blocking is also proposed by Pinar and Heath [20] in order to reduce indirect memory references, while quite recently, Buttari et al [5] and Vuduc and Moon [26] accentuate the merit of blocking (the latter with variable-sized blocks) as a transformation to reduce indirect references and enable register level blocking and unrolling. However, it is not clarified if the benefits of blocking can be actually attributed to better cache utilization, memory access reduction, or ILP improvement.…”

Section: Introductionmentioning

confidence: 97%

“…The great importance and the singular performance behavior of SpMxV have attracted intense scientific attention [1,5,8,11,12,15,[17][18][19][20][23][24][25][26][27]. A general conclusion is that SpMxV can be efficiently optimized by exploiting information regarding the matrix structure and the processor's architectural characteristics.…”

Section: Introductionmentioning

confidence: 98%

“…Sparse matrix-vector computations and, in particular, sparse matrix-vector multiplication (SpMxV) have been recently categorized as one of the "seven dwarfs," i.e., seven numerical methods that are believed to be important for science and engineering for at least the next decade [2]. SpMxV is generally reported to perform poorly on modern microprocessors (e.g., 10% of peak performance [26]) due to a number of issues concerning the algorithm itself, the storage formats, and the sparsity patterns of the matrices.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

et al. 2008

View full text Add to dashboard Cite

In this paper, we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization. In order to gain an insight into the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. In addition, we investigate the parallel version of the kernel and report on the corresponding performance results and their relation to each architecture's specific multithreaded configuration. Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 97%

Section: Introductionmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

et al. 2008

View full text Add to dashboard Cite

show abstract

“…However, as one of the "seven dwarfs" [1], SpMV is notorious for sustaining low fractions (less than 10% [2]) of the peak performance on the general purpose processors, mostly due to the inefficient use of the memory bandwidth. It results from the mismatches between the memory access patterns and the compression schemes of the sparse matrix.…”

Section: Introductionmentioning

confidence: 99%

A deeply-pipelined FPGA-based SpMV accelerator with a hardware-friendly storage scheme

Guo

Dou

Lei

et al. 2015

IEICE Electron. Express

View full text Add to dashboard Cite

This paper presents a high performance sparse matrix-vector multiplication (SpMV) accelerator on the field-programming gate array (FPGA). By exploiting a hardware-friendly storage scheme, named as Variable-Bit-Width Coordinate Block Quasi Compressed Sparse Row, the redundant computation and memory accesses can be reduced greatly through the nested block compression and variable-bit-width column-index encoding schemes. Based on the proposed compression scheme, a deeply-pipelined SpMV accelerator is implemented on a Xilinx Virtex XC7VX485T FPGA platform, which can handle sparse matrices with arbitrary size and sparsity pattern. Experimental results show that the proposed design can gain higher performance for most of the tested matrices and improve the utilization of the memory bandwidth up to 13×, compared with the previous works on the Convey platforms (HC-1 and HC-2ex) and Nvidia Tesla S1070 GPU platform.

show abstract

Bibliography

2020

Krylov Subspace Methods With Application in Incompressible Fluid Flow Solvers

View full text Add to dashboard Cite

Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

Cited by 88 publications

References 14 publications

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

A deeply-pipelined FPGA-based SpMV accelerator with a hardware-friendly storage scheme

Bibliography

Contact Info

Product

Resources

About