An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

Liu, Weifeng; Vinter, Brian

doi:10.1109/ipdps.2014.47

Cited by 96 publications

(73 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance rate is defined as the ratio of the arithmetic workload and the measured processing time. The arithmetic workload flops(A,B) is defined as twice the number of nontrivial scalar multiplications (to account for the additions) which can be computed as j∈âi: nnz(b j: ) for each result row c i: [12,31], whereâ i: denotes the nonzero indices of row a i: . All performance rate measurements were repeated 11 times and the median was used because of its robustness with respect to outliers.…”

Section: Performance Measurementsmentioning

confidence: 99%

“…Table 1, were taken from the University of Florida Sparse Matrix Collection [13]. Inspired by [31], we sorted the matrices into regular (the upper 10) and irregular matrices (the lower 11) and sorted these subsets alphabetically. Regular matrices result from problems involving mesh approximations, e.g., from finite element methods, while irregular matrices mostly result from network structures.…”

Section: Performance Measurementsmentioning

confidence: 99%

See 1 more Smart Citation

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

Gremse¹,

Höfter²,

Schwen³

et al. 2015

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using subwarps of threads to realize an early compression effect which reduces the overhead of global memory accesses. The performance is compared with a parallel CPU implementation as well as with three GPU-based implementations. Measurements performed for computing the matrix square for 21 sparse matrices show that the proposed method consistently outperforms the other methods. Analysis showed that the performance is achieved by utilizing the compression effect and the GPU caching architecture. An improved performance was also found for computing Galerkin products which are required by algebraic multigrid solvers. The performance was particularly good for seven-point stencil matrices arising in the context of diffuse optical imaging and the improved performance allows one to perform image reconstruction at higher resolution using the same computational resources.

show abstract

Section: Performance Measurementsmentioning

confidence: 99%

Section: Performance Measurementsmentioning

confidence: 99%

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

Gremse¹,

Höfter²,

Schwen³

et al. 2015

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

show abstract

“…Parallelization of the computation on the Device is done mainly in the matrix product (Liu & Vinter, 2014) and in the calculation of the sigmoid function. We have defined a fixed number of threads per block and the number of blocks is calculated from the size of the layer divided by the number of threads.…”

Section: Implementation Of Neural Networkmentioning

confidence: 99%

Accelerating the Detection of Spectral Bands by ANN-ED on a GPU

Hafid¹,

Elrharras²,

Guennoun³

et al. 2015

CIS

View full text Add to dashboard Cite

Spectrum sensing is the most important technique used to implement cognitive radio; this approach allows opportunistic and dynamic allocation of spectral bands. Among the methods used for detection, there are Artificial Neural Networks (ANN) and Energy Detection (ED); those exploit the signals coming from a Fast Fourier Transformed block (FFT). In this work, we focus on improving the performance of these three blocks by performing parallel computing, and considering the fusion of the two detectors ANN and ED. In this context, we implement three algorithms on GPU, which consist on exploiting the large number of cores to perform parallel calculation. The experimental results are compared with those obtained for CPU implementations. Our study presents how calculations distribution on GPU cores influences the global performance, and how to reduce execution time by optimizing data transfer. Furthermore, by exploiting the fine-grained parallel processing, and using a suitable choice of parameters, we find a considerable advantage of GPUs compared to CPUs, specifically for high data volumes.

show abstract

“…Thus more evaluation criteria, such as format conversion cost and memory footprint, must be taken into consideration. Secondly, when the SpMV operation is used with other sparse building blocks (e.g., sparse matrix-matrix multiplication [11]) that require basic storage formats, using the all-new formats is less feasible.…”

Section: Introductionmentioning

confidence: 99%

Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors

Liu

Vinter

2015

Parallel Computing

Self Cite

View full text Add to dashboard Cite

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms.

show abstract

An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

Cited by 96 publications

References 36 publications

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

Accelerating the Detection of Spectral Bands by ANN-ED on a GPU

Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors

Contact Info

Product

Resources

About