2013
DOI: 10.1109/tpds.2012.290
|View full text |Cite
|
Sign up to set email alerts
|

An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication

Abstract: Sparse matrix-vector multiplication (SpM Â V) has been characterized as one of the most significant computational scientific kernels. The key algorithmic characteristic of the SpM Â V kernel, that inhibits it from achieving high performance, is its very low flop:byte ratio. In this paper, we present a compressed storage format, called Compressed Sparse eXtended (CSX), that is able to detect and encode simultaneously multiple commonly encountered substructures inside a sparse matrix. Relying on aggressive compr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0
3

Year Published

2013
2013
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(36 citation statements)
references
References 27 publications
(54 reference statements)
0
33
0
3
Order By: Relevance
“…A delta unit in CSX is a sequence of column indices that can be represented by a specified number of bits (namely, 8, 16 or 32 bits). A detailed description and performance evaluation of CSX can be found in [16].…”
Section: Extending Csx To Symmetric Matrices a Overview Of The Cmentioning
confidence: 99%
See 3 more Smart Citations
“…A delta unit in CSX is a sequence of column indices that can be represented by a specified number of bits (namely, 8, 16 or 32 bits). A detailed description and performance evaluation of CSX can be found in [16].…”
Section: Extending Csx To Symmetric Matrices a Overview Of The Cmentioning
confidence: 99%
“…For the parallelization of the SpM×V routines and the preprocessing phase of CSX, we used explicit, native threading with the Pthreads library (NPTL 2.7) and bound the threads to specific logical processors using the Linux kernel's system call interface. Finally, for the NUMA-aware implementations, we used the numactl library, version 2.0.7, in conjunction with our low-level interleaved allocator [16].…”
Section: Experimental Evaluation a Experimental Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…The increases in the density and speed of field-programmable gate arrays (FPGAs) [1] make them attractive as flexible and high-speed alternatives to DSPs [3] and ASICs. It is a highly procedure oriented computation [6], there is only one way to multiply two matrices and it involves lots of multiplications and additions. But the simple part of matrix multiplication is that the evaluation of elements of the resultant elements can be done independent of the other, this point to distributed memory approach.…”
Section: Introductionmentioning
confidence: 99%