Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) 2020
DOI: 10.1109/micro50266.2020.00040
|View full text |Cite
|
Sign up to set email alerts
|

Newton: A DRAM-maker’s Accelerator-in-Memory (AiM) Architecture for Machine Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
46
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 85 publications
(48 citation statements)
references
References 35 publications
0
46
0
Order By: Relevance
“…Hence, they act as hardware accelerators with high throughput for specific applications. Recently, the DRAM makers SK-Hynix (He et al, 2020) and Samsung (Kwon et al, 2021) introduced 16-bit floating-point processing units inside the DRAM. ePIM architectures have a high area overhead and have to reduce the size of memory arrays to accommodate the added digital logic.…”
Section: Processing In Memorymentioning
confidence: 99%
See 1 more Smart Citation
“…Hence, they act as hardware accelerators with high throughput for specific applications. Recently, the DRAM makers SK-Hynix (He et al, 2020) and Samsung (Kwon et al, 2021) introduced 16-bit floating-point processing units inside the DRAM. ePIM architectures have a high area overhead and have to reduce the size of memory arrays to accommodate the added digital logic.…”
Section: Processing In Memorymentioning
confidence: 99%
“…Hence, their throughput and energy benefits show a decreasing trend for higher bit precision. To overcome this shortcoming, the architectures with custom logic (large multipliers and accumulators) He et al, 2020), programmable computing units (Kwon et al, 2021), and LUT-based designs LAcc (Deng et al, 2019), pPIM (Sutradhar et al, 2022), pLUTo (Ferreira et al, 2021) have been proposed. These architectures embed external logic to the DRAM outside the memory array, hence, referred to as ePIM architectures.…”
Section: Prior Work On Logic Operations (Ipim) and Arithmetic Operati...mentioning
confidence: 99%
“…Thus, the reduction in total runtime comes from the reduce operation, updating embedding tables, and its PCIe transfer time. The performance of the baseline can be improved by using Processing-in-Memory (PiM) instead of the NPU as proposed in [13]. By deploying a PiM device, the latency of a forward/backward propagation in the top MLP is minimized (Fig.…”
Section: Case Study Ii: Recommendation Systemmentioning
confidence: 99%
“…Prior approaches to this challenge fall into one of three categories. The first avoids the challenge altogether by maintaining a copy of the data that is stored in a PIM-friendly layout and not accessed by the CPU [17], [22], [25]. This either duplicates substantial data arrays (possibly > 100GiB) [7], [32], [37] or prevents the CPU from assisting with requests that can tolerate higher response latency [15].…”
Section: Motivation and Challengesmentioning
confidence: 99%