2013 IEEE High Performance Extreme Computing Conference (HPEC) 2013
DOI: 10.1109/hpec.2013.6670336
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware

Abstract: Abstract-This paper introduces a 3D-stacked logic-in-memory (LiM) system to accelerate the processing of sparse matrix data that is held in a 3D DRAM system. We build a customized content addressable memory (CAM) hardware structure to exploit the inherent sparse data patterns and model the LiM based hardware accelerator layers that are stacked in between DRAM dies for the efficient sparse matrix operations. Through silicon vias (TSVs) are used to provide the required high inter-layer bandwidth. Furthermore, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
46
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 83 publications
(47 citation statements)
references
References 23 publications
0
46
0
Order By: Relevance
“…A similar operation is performed for assembling C by using a single "vertical CAM," which activates individual horizontal CAM blocks only if their corresponding column indices are matched. High-level system simulations in [12] show that such a LiM based CAM-SpGEMM core can be used as a low-power hardware accelerator in 3D IC stacks. Sparse matrices are decomposed into sub-blocks and then mapped to DRAM rows for maximizing off-chip DRAM row buffer hit.…”
Section: Lim Synthesis Examplementioning
confidence: 99%
See 2 more Smart Citations
“…A similar operation is performed for assembling C by using a single "vertical CAM," which activates individual horizontal CAM blocks only if their corresponding column indices are matched. High-level system simulations in [12] show that such a LiM based CAM-SpGEMM core can be used as a low-power hardware accelerator in 3D IC stacks. Sparse matrices are decomposed into sub-blocks and then mapped to DRAM rows for maximizing off-chip DRAM row buffer hit.…”
Section: Lim Synthesis Examplementioning
confidence: 99%
“…To improve the column-by-column algorithm for SpGEMM, Zhu et al explored the data storage and access patterns in [12] and showed that the SpGEMM operations can be effectively mapped to LiM based content addressable memory (CAM) blocks. As matrix sparsity requires storing only the non-zero elements that are accompanied by their row and column indices, the single cycle "matching" capability of CAMs facilitates index comparison and alignment.…”
Section: Lim Synthesis Examplementioning
confidence: 99%
See 1 more Smart Citation
“…At the bottom of the Fig. 6, we show the structures of LiM core customized for the 2D FFT and SpGEMM respectively [4], [33]. As we can see, both LiM cores involve embedded memory arrays, on-chip buffers, arithmetic units, as well as the control models such as DRAM to Local Memory (D2L) and Local Memory to Core (L2C).…”
Section: D Lim Accelerated Data Intensive Applicationsmentioning
confidence: 99%
“…The CAM based SpGEMM is designed to match the specific sparse data access pattern, and it is able to process the sparse data in an extremely high throughput to match the TSV bandwidth. The design details are beyond the scope of this paper and can be found in another accompanying work [33].…”
Section: D Lim Accelerated Data Intensive Applicationsmentioning
confidence: 99%