Proceedings of the 2015 International Symposium on Memory Systems 2015
DOI: 10.1145/2818950.2818986
|View full text |Cite
|
Sign up to set email alerts
|

Near memory data structure rearrangement

Abstract: As CPU core counts continue to increase, the gap between compute power and available memory bandwidth has widened. A larger and deeper cache hierarchy benefits locality-friendly computation, but offers limited improvement to irregular, data intensive applications. In this work we explore a novel approach to accelerating these applications through in-memory data restructuring. Unlike other proposed processing-inmemory architectures, the rearrangement hardware performs data reduction, not compute offload. Using … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(16 citation statements)
references
References 11 publications
0
16
0
Order By: Relevance
“…Gokhale 1 (2015) Gokhale et al [24] proposed to place a data rearrangement engine (DRE) in the logic layer of the HMC to accelerate data accesses while still performing the computation on the main CPU. The authors targeted cache unfriendly applications with high memory latency due to irregular access patterns, e.g., sparse matrix multiplication.…”
Section: Re-configurable Unitmentioning
confidence: 99%
“…Gokhale 1 (2015) Gokhale et al [24] proposed to place a data rearrangement engine (DRE) in the logic layer of the HMC to accelerate data accesses while still performing the computation on the main CPU. The authors targeted cache unfriendly applications with high memory latency due to irregular access patterns, e.g., sparse matrix multiplication.…”
Section: Re-configurable Unitmentioning
confidence: 99%
“…First, NDP architectures typically do not have a shared level of cache memory [8, 19, 25, 38, 42-46, 49, 55, 67, 98, 110, 111, 113, 119, 155, 158], since the NDP-suited workloads usually do not benefit from deep cache hierarchies due to their poor locality [33,43,133,143]. Second, NDP architectures do not typically support conventional hardware cache coherence protocols [8,19,25,38,[42][43][44][45]49,55,67,82,98,111,119,155,158], because they would add area and traffic overheads [46,143], and would incur high complexity and latency [4], limiting the benefits of NDP. Third, communication across NDP units is expensive, because NDP systems are non-uniform distributed architectures.…”
Section: Memory Arraysmentioning
confidence: 99%
“…First, most NDP architectures [8, 19, 25, 38, 42-46, 49, 55, 67, 98, 110, 111, 113, 119, 155, 158] lack shared caches that can enable low-cost communication and synchronization among NDP cores of the system. Second, hardware cache coherence protocols are typically not supported in NDP systems [8,19,25,38,[42][43][44][45]49,55,67,82,98,111,119,155,158], due to high area and traffic overheads associated with such protocols [46,143]. Third, NDP systems are non-uniform, distributed architectures, in which inter-unit communication is more expensive (both in performance and energy) than intraunit communication [8,20,21,38,43,83,155,158].…”
Section: Introductionmentioning
confidence: 99%
“…Other works couple GPU architectures with 3D stacked memories [16], [17]. Still others utilize reconfigurable logic near the DRAM [18], [19], [20].…”
Section: Near Memory Processing (Nmp)mentioning
confidence: 99%