2020
DOI: 10.48550/arxiv.2012.03112
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Modern Primer on Processing in Memory

Abstract: Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks:(1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 251 publications
0
16
0
Order By: Relevance
“…Synergy With PIM. Processing-in-memory (PIM) systems improve system performance and/or energy consumption by performing computations directly within a memory chip, thereby avoiding unnecessary data movement [25,26,57,58,60,116,118,137,139]. Prior works propose a broad range of PIM systems [5-8, 13, 22-24, 34, 38, 44, 48, 49, 54, 55, 58, 59, 65, 66, 71, 72, 89, 98, 100, 103, 107, 113, 115, 119, 120, 124, 133-135, 137-139, 142, 148, 164, 168] in the context of various workloads and memory devices.…”
Section: Motivation and Goalmentioning
confidence: 99%
See 1 more Smart Citation
“…Synergy With PIM. Processing-in-memory (PIM) systems improve system performance and/or energy consumption by performing computations directly within a memory chip, thereby avoiding unnecessary data movement [25,26,57,58,60,116,118,137,139]. Prior works propose a broad range of PIM systems [5-8, 13, 22-24, 34, 38, 44, 48, 49, 54, 55, 58, 59, 65, 66, 71, 72, 89, 98, 100, 103, 107, 113, 115, 119, 120, 124, 133-135, 137-139, 142, 148, 164, 168] in the context of various workloads and memory devices.…”
Section: Motivation and Goalmentioning
confidence: 99%
“…Therefore, QUAC-TRNG o ers a new design point that can enable new applications that were previously infeasible with alternative TRNGs, especially for systems where the costs of on-chip TRNGs may be prohibitive (e.g., heavily constrained embedded systems, processing-in-memory architectures). For example, QUAC-TRNG would enable processing-in-memory systems [62,116,137,157] to execute security workloads as it enables true random number generation directly within a DRAM chip.…”
Section: Non-dram-based Trngs That Require Specialized Hardwarementioning
confidence: 99%
“…Stacked memory architectures vertically stack DRAM layers on top of each other and connect the vertical partitions of memory using high-bandwidth through-silicon vias (TSVs). A typical 3D-stacked memory configuration can employ thousands of TSVs [45], which makes its internal memory bandwidth far exceed that of traditional memory systems. At the bottom of the memory stacks, there is a logic layer that can host hardware logic that can interact with both the host processor and the DRAM memory.…”
Section: Background and Assumptions A Processing-in-memorymentioning
confidence: 99%
“…Due to its nature of being near memory, applications offloaded to PIM gain a high memory bandwidth as they do not have to move data across the slow memory bus. Moreover, in 3D-stacked memories, TSV connection between the layer naturally provides more internal bandwidth [45]. This makes PIM-based applications exceed memory-bounded workloads and workloads with erratic memory access patterns.…”
Section: A Processing-in-memorymentioning
confidence: 99%
“…This motivates using processing-in-memory (PIM) to gain the much needed speedups in graph mining. While PIM is not the only potential solution for hardware acceleration of graph mining, we select PIM because (1) it represents one of the most promising trends to tackle the memory bottleneck [69,128] outperforming other approaches [153], (2) it offers well-understood designs [129], and (3) numerous works illustrate it brings very large speedups in simple graph algorithms such as BFS or PageRank (see more than 15 works in Table 7), also using processing fully inside DRAM [10]. Yet, graph mining algorithms are much more complex: they employ deep recursion, create many intermediate data structures with non-trivial inter-dependencies, and have high load imbalance [62,186].…”
Section: Introductionmentioning
confidence: 99%