2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) 2020
DOI: 10.1109/isca45697.2020.00071
|View full text |Cite
|
Sign up to set email alerts
|

iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(32 citation statements)
references
References 61 publications
0
32
0
Order By: Relevance
“…A large body of prior work examines Processing-Near-Memory (PNM) [3, 4, 8, 9, 16, 30-32, 39, 47, 52, 57, 66, 68, 76-78, 81, 89, 90, 101, 102, 109, 110, 112, 126, 129, 130, 144, 151, 166, 167, 176, 177, 179-181, 191, 194, 199, 206, 212, 223, 224, 232, 240, 269, 271, 280, 281]. PNM integrates processing units near or inside the memory via a 3D PNM configuration (i.e., processing units are located at the logic layer of 3D-stacked memories) [3, 30-32, 47, 57, 76, 166, 180, 181, 206, 269, 271, 281], a 2.5D PNM configuration (i.e., processing units are located in the same package as the CPU connected via silicon interposers) [68,81,223], a 2D PNM configuration (i.e., processing units are placed inside DDRX DIMMs) [9,16,44,89,90,126,143,147,148,179,185,199,212,282], or at the memory controller of CPU systems [101,102,167]. These works propose hardware designs for irregular applications like graph processing [3,4,31,32,52,180,281], bioinformatics [39,81,130,147,148], neural networks [29,30,48,68,78,89,129,…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…A large body of prior work examines Processing-Near-Memory (PNM) [3, 4, 8, 9, 16, 30-32, 39, 47, 52, 57, 66, 68, 76-78, 81, 89, 90, 101, 102, 109, 110, 112, 126, 129, 130, 144, 151, 166, 167, 176, 177, 179-181, 191, 194, 199, 206, 212, 223, 224, 232, 240, 269, 271, 280, 281]. PNM integrates processing units near or inside the memory via a 3D PNM configuration (i.e., processing units are located at the logic layer of 3D-stacked memories) [3, 30-32, 47, 57, 76, 166, 180, 181, 206, 269, 271, 281], a 2.5D PNM configuration (i.e., processing units are located in the same package as the CPU connected via silicon interposers) [68,81,223], a 2D PNM configuration (i.e., processing units are placed inside DDRX DIMMs) [9,16,44,89,90,126,143,147,148,179,185,199,212,282], or at the memory controller of CPU systems [101,102,167]. These works propose hardware designs for irregular applications like graph processing [3,4,31,32,52,180,281], bioinformatics [39,81,130,147,148], neural networks [29,30,48,68,78,89,129,…”
Section: Related Workmentioning
confidence: 99%
“…Most near-bank PIM architectures [16,44,45,55,82,89,94,140,145,151,179,199,240] support several PIM-enabled memory chips connected to a host CPU via memory channels. Each memory chip comprises multiple PIM cores, which are low-area and low-power cores with relatively low computation capability [82,94], and each of them is located close to a DRAM bank [16,44,45,55,82,89,94,140,145,151,179,199,240]. Each PIM core can access data located on their local DRAM banks, and typically there is no direct communication channel among PIM cores.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In an NMP system with 3D memory cubes, the processing capability is in the base logic die under a stack of DRAM layers to utilize the ample internal bandwidth [5]. Later research also proposes near-bank processing with logic near memory banks in the same DRAM layer to exploit even higher bandwidth [20,21], such as FIMDRAM [22] announced recently by Samsung. Recent proposals [23,24,25,26,27] have also explored augmenting traditional DIMMs with computation in the buffer die to provide low-cost but bandwidth-limited NMP solutions.…”
Section: Near-memory Processingmentioning
confidence: 99%
“…However, the elapsed time of operators from the R 3 cluster takes 52% of total time, making R 3 -like operators (memory-intensive highly parallel operators) the actual bottleneck, not Conv2D. Instead of accelerating Conv2D, which would result in more computation resources or larger on-chip memory, our analysis recommends that the architecture should be designed with higher effective memory bandwidth, such as processing-in-memory architectures [15,22,30,33] for R 3 -like operators, because they take the majority of the elapsed time.…”
Section: Applicationmentioning
confidence: 99%