2022
DOI: 10.1109/access.2022.3203051
|View full text |Cite
|
Sign up to set email alerts
|

Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling

Abstract: Processing-in-Memory (PIM) has been actively studied to overcome the memory bottleneck by placing computing units near or in memory, especially for efficiently processing low locality dataintensive applications. We can categorize the in-DRAM PIMs depending on how many banks perform the PIM computation by one DRAM command: per-bank and all-bank. The per-bank PIM operates only one bank, delivering low performance but preserving the standard DRAM interface and servicing non-PIM requests during PIM execution. The … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(13 citation statements)
references
References 43 publications
(114 reference statements)
0
13
0
Order By: Relevance
“…Processing-in-Memory (PIM) architectures have been actively studied by placing computing units close to [9], [10], [11], [12], and [13] or inside memory [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26] to overcome the memory bandwidth limitation. PIM can maximize internal memory bandwidth for the computation using bank-level parallelism [14], [15], [17], [18], [22], [23], [24], [25], [26], thus providing high computation performance. For example, the decoupled PIM [26] achieved a speedup of 75.8x and 1.2x over CPU and GPU at the Level-3 BLAS, respectively.…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…Processing-in-Memory (PIM) architectures have been actively studied by placing computing units close to [9], [10], [11], [12], and [13] or inside memory [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26] to overcome the memory bandwidth limitation. PIM can maximize internal memory bandwidth for the computation using bank-level parallelism [14], [15], [17], [18], [22], [23], [24], [25], [26], thus providing high computation performance. For example, the decoupled PIM [26] achieved a speedup of 75.8x and 1.2x over CPU and GPU at the Level-3 BLAS, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…PIM can maximize internal memory bandwidth for the computation using bank-level parallelism [14], [15], [17], [18], [22], [23], [24], [25], [26], thus providing high computation performance. For example, the decoupled PIM [26] achieved a speedup of 75.8x and 1.2x over CPU and GPU at the Level-3 BLAS, respectively. Samsung FIM [23] achieved a speedup of 11.2x and 3.5x over CPU for memory-bound neural network kernels and applications [27], [29], [32], respectively.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations