2019
DOI: 10.1016/j.parco.2019.102545
|View full text |Cite
|
Sign up to set email alerts
|

Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

Abstract: Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi-and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
63
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 36 publications
(63 citation statements)
references
References 30 publications
(59 reference statements)
0
63
0
Order By: Relevance
“…HipMCL is an iterative algorithm that relies on SpGEMM as its workhorse at each iteration. The ExaGraph project ported the hash-based SpGEMM algorithm, which was originally developed for GPUs by collaborators (Nagasaka et al, 2017), into multicore CPUs and Intel KNLs (Nagasaka et al, 2019). For GPU-equipped clusters, we developed a model to choose the fastest GPU-based SpGEMM depending on the sparsity of the current MCL iteration and utilized a pipelined communication scheme that hides the cost of CPU-to-GPU data transfers.…”
Section: Algebraic Approaches For Graph Algorithms and Combinatorial Problemsmentioning
confidence: 99%
“…HipMCL is an iterative algorithm that relies on SpGEMM as its workhorse at each iteration. The ExaGraph project ported the hash-based SpGEMM algorithm, which was originally developed for GPUs by collaborators (Nagasaka et al, 2017), into multicore CPUs and Intel KNLs (Nagasaka et al, 2019). For GPU-equipped clusters, we developed a model to choose the fastest GPU-based SpGEMM depending on the sparsity of the current MCL iteration and utilized a pipelined communication scheme that hides the cost of CPU-to-GPU data transfers.…”
Section: Algebraic Approaches For Graph Algorithms and Combinatorial Problemsmentioning
confidence: 99%
“…The high-performance distributed re-implementation of the Markov Cluster algorithm, known as HipMCL [40], uses some of the most general and scalable sparse matrix algorithms implemented within the Combinatorial BLAS [41]. These algorithms include a two-dimensional SpGEMM algorithm known as Sparse SUMMA [24], several different shared memory SpGEMM algorithms [42] that are optimized for different iterations of HipMCL, a fast memory estimator based on sparse matrix dense matrix multiplication for memory-efficient SpGEMM [43], as well as a very fast distributed memory connected components algorithm [44] that is used for extracting the final clusters from the result of the HipMCL iterations. The integration of GPU support as well as faster communication-avoiding SpGEMM algorithms [45] is ongoing work.…”
Section: (E) Sparse Matrix Operations For Protein Clusteringmentioning
confidence: 99%
“…These methods do not simply partition the result matrix or particles/sequences over processors, but instead replicate them to the extent allowed by available memory. For sparse matrices and sparse interactions, the benefits depend more on the sparsity patterns [24,42,43,70], but are useful in clustering [40] and possibly alignment.…”
Section: Hardware and Software Support For Parallel Genome Analysismentioning
confidence: 99%
“…The implementation of this method within our pipeline enables the use of high-performance techniques previously not applied in the context of long-read alignment. It also allows continuing performance improvements in this step due to the ever-improving optimized implementations of SpGEMM (Nagasaka et al, 2019;Deveci et al, 2017).…”
Section: Proposed Algorithmmentioning
confidence: 99%
“…More importantly, the computational problem of accumulating the contributions from multiple shared k-mers to each pair of reads is handled automatically by the choice of appropriate data structures within SpGEMM. Figure 2 illustrates the merging operation of BELLA, which uses a hash table data structure indexed by the row indexes of A, following the multi-threaded implementation proposed by Nagasaka et al (2019). Finally, the contents of the hash table are stored into a column of the final matrix once all required nonzeros for that column are accumulated.…”
Section: Sparse Matrix Construction and Multiplicationmentioning
confidence: 99%