spECK

Parger, Mathias; Winter, Martin; Mlakar, Daniel; Steinberger, Markus

doi:10.1145/3332466.3374521

Cited by 33 publications

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When an element 𝐴 𝑖𝑘 × 𝐵 𝑘 𝑗 is computed, one needs to know whether it results in a new non-zero element in column 𝑗 of 𝐶 or it needs to be accumulated with already computed values 𝐴 𝑖𝑙 × 𝐵 𝑙 𝑗 for any 𝑙 ≠ 𝑘. This operation can be carried out by a hash table [6,20], by sorting and merging keys [11,15], or by the use of a dense vector, that is, a dense data structure that stores all intermediate accumulated values [21,39]. Section 3 proposes two new algorithms based on the dense vector and the hash table accumulators, respectively, to efficiently run SpGEMM on vector processors.…”

Section: Gustavson Methodsmentioning

confidence: 99%

“…Load imbalance Load imbalance is a frequently mentioned problem of the Gustavson method, especially on GPU. Parger et al [39] use a low complexity pre-processing analysis of the matrices, linear in the number of non-zeros. Depending on the result, a binning method can be used to reduce load imbalance of the algorithm on a GPU.…”

Section: Related Workmentioning

confidence: 99%

“…Since both input matrices are sparse, SpGEMM displays much more irregular memory access patterns than the Sparse Matrix Multi-vector multiplication (SpMM) or the Sparse Matrix Dense Matrix multiplication (SpMDM), where 𝐵 is a dense matrix. The efficient execution of SpGEMM on many-core architectures [4,5], GPUs [6,38,39], or both [20], has been extensively studied.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Optimization of SpGEMM with Risc-V vector instructions

Fèvre¹,

Casas²

2023

Preprint

View full text Add to dashboard Cite

The Sparse GEneral Matrix-Matrix multiplication (SpGEMM) 𝐶 = 𝐴 × 𝐵 is a fundamental routine extensively used in domains like machine learning or graph analytics. Despite its relevance, the efficient execution of SpGEMM on vector architectures is a relatively unexplored topic. The most recent algorithm to run SpGEMM on these architectures is based on the SParse Accumulator (SPA) approach, and it is relatively efficient for sparse matrices featuring several tens of nonzero coefficients per column as it computes 𝐶 columns one by one. However, when dealing with matrices containing just a few non-zero coefficients per column, the state-of-the-art algorithm is not able to fully exploit long vector architectures when computing the SpGEMM kernel.To overcome this issue we propose the SPA paRallel with Sorting (SPARS) algorithm, which computes in parallel several 𝐶 columns among other optimizations, and the HASH algorithm, which uses dynamically sized hash tables to store intermediate output values. To combine the efficiency of SPA for relatively dense matrix blocks with the high performance that SPARS and HASH deliver for very sparse matrix blocks we propose H-SPA(𝑡) and H-HASH(𝑡), which dynamically switch between different algorithms. H-SPA(𝑡) and H-HASH(𝑡) obtain 1.24× and 1.57× average speed-ups with respect to SPA respectively, over a set of 40 sparse matrices obtained from the SuiteSparse Matrix Collection [19]. For the 22 most sparse matrices, H-SPA(𝑡) and H-HASH(𝑡) deliver 1.42× and 1.99× average speed-ups respectively.

show abstract

Section: Gustavson Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimization of SpGEMM with Risc-V vector instructions

Fèvre¹,

Casas²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Nagasaka et al (2017) proposed NSparse for NVIDIA Pascal GPU, and Deveci et al (2017) proposed Kokkos for many-core GPU architectures. Parger et al (2020) designed spECK for SpGEMM. Niu et al (2022) proposed a tiled algorithm for SpGEMM on GPUs called TileSpGEMM.…”

Section: Spgemm Algorithmsmentioning

confidence: 99%

“…In addition to utilizing PLUB and PGO to evaluate our method's performance, we also compared the performance with six existing implementations, including cuSPARSE (Demouth, 2012), NSparse (Nagasaka et al, 2017), spECK (Parger et al, 2020), bhSPARSE (Liu and Vinter, 2015), Kokkos (Deveci et al, 2017), and TileSpGEMM (Niu et al, 2022). The evaluation is based on the GFLOPS performance, which is twice the number of the intermediate products divided by the execution time.…”

Section: Comparison With Existing Algorithmsmentioning

confidence: 99%

Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs

Wei,

Wang,

Chang

et al. 2024

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

Sparse General Matrix-Matrix Multiplication (SpGEMM) has played an important role in a number of applications. So far, many efficient algorithms have been proposed to improve the performance of SpGEMM on GPUs. However, the performance of each algorithm for matrices of different structures varies a lot. There is no algorithm that can achieve the optimal performance of SpGEMM computation on all matrices. In this article, we design a machine learning based approach for predicting the optimal SpGEMM algorithm on input matrices. By extracting features from input matrices, we utilize LightGBM and XGBoost to train different lightweight models. The models are capable of predicting the best performing algorithm with low inference overhead and high accuracy for the given input matrices. We also investigate the impact of tree depth on model accuracy and inference overhead. Our evaluation shows that an increase in tree depth leads to a corresponding increase in prediction accuracy, reaching a maximum of approximately 85%, while resulting in increased inference overhead of approximately 10 µs. Compared with the state-of-the-art algorithms on three GPU platforms, our method achieves better overall performance.

show abstract

A dynamic parameter tuning method for SpMM parallel execution

Komatsu

Sato

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

Sparse matrix-matrix multiplication (SpMM) is a basic kernel that is used by many algorithms. Several researches focus on various optimizations for SpMM parallel execution.However, a division of a task for parallelization is not well considered yet. Generally, a matrix is equally divided into blocks for processes even though the sparsities of input matrices are different. The parameter that divides a task into multiple processes for parallelization is fixed. As a result, load imbalance among the processes occurs. To balance the loads among the processes, this article proposes a dynamic parameter tuning method by analyzing the sparsities of input matrices. The experimental results show that the proposed method improves the performance of SpMM for examined matrices by up to 39.5% on a single vector engine and 3.49 × on a single CPU.

show abstract

spECK

Cited by 33 publications

References 15 publications

Optimization of SpGEMM with Risc-V vector instructions

Optimization of SpGEMM with Risc-V vector instructions

Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs

A dynamic parameter tuning method for SpMM parallel execution

Contact Info

Product

Resources

About