2015
DOI: 10.1007/978-3-319-20119-1_4
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 47 publications
(23 citation statements)
references
References 12 publications
0
23
0
Order By: Relevance
“…Problems where B is larger than hbm requires partitioning of B. Column-wise partitions have been explored in one level memory before [29]. However, since our data is stored row-wise, finding column-wise partitions that will fit into hbm is usually prohibitively expensive.…”
Section: Chunking Methods For Knlsmentioning
confidence: 99%
“…Problems where B is larger than hbm requires partitioning of B. Column-wise partitions have been explored in one level memory before [29]. However, since our data is stored row-wise, finding column-wise partitions that will fit into hbm is usually prohibitively expensive.…”
Section: Chunking Methods For Knlsmentioning
confidence: 99%
“…An alternative is another format with random access such as a hash map. These result in slower execution [Patwary et al 2015], but only use memory proportional to the number of nonzeros.…”
Section: Policy and Choice Of Workpacementioning
confidence: 99%
“…A workspace used for accumulating temporary values is referred to as an expanded real accumulator in [Pissanetzky 1984] and as an abstract sparse accumulator data structure in [Gilbert et al 1992]. Dense workspaces and blocking are used to produce fast parallel code by Patwary et al [Patwary et al 2015]. They also tried a hash map workspace, but report that it did not have good performance for their use.…”
Section: Related Workmentioning
confidence: 99%
“…T AB: Optimizing sparse matrixmatrix multiplication is an active area of research [17], [18]; state-of-the-art implementations are bound by the memory bandwidth and heavily underutilize the compute resources.…”
Section: ) Optimizing Res = Amentioning
confidence: 99%