2020
DOI: 10.1109/access.2020.3011265
|View full text |Cite
|
Sign up to set email alerts
|

McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge

Abstract: The energy efficiency of accelerating hundreds of MB-large deep neural networks (DNNs) in a mobile environment is less than that of a server-class big chip accelerator because of the limited power budget, silicon area, and smaller buffer size of static random access memory associated with mobile systems. To address this challenge and provide powerful computing capability for processing large DNN models in power/resource-limited mobile systems, we propose McDRAM v2, which is a novel in-dynamic random access mem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(16 citation statements)
references
References 47 publications
0
16
0
Order By: Relevance
“…A recent work [145] presents a real-world PIM system with programmable near-bank computation units, called FIMDRAM, based on HBM technology [113,153]. The FIMDRAM architecture, designed specifically for machine learning applications, implements a SIMD pipeline with simple multiplyand-accumulate units [44,226]. Compared to the more general-purpose UPMEM PIM architecture, FIMDRAM is focused on a specific domain of applications (i.e., machine learning), and thus it may lack flexibility to support a wider range of applications.…”
Section: Related Workmentioning
confidence: 99%
“…A recent work [145] presents a real-world PIM system with programmable near-bank computation units, called FIMDRAM, based on HBM technology [113,153]. The FIMDRAM architecture, designed specifically for machine learning applications, implements a SIMD pipeline with simple multiplyand-accumulate units [44,226]. Compared to the more general-purpose UPMEM PIM architecture, FIMDRAM is focused on a specific domain of applications (i.e., machine learning), and thus it may lack flexibility to support a wider range of applications.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, as TPU v4 [28] reuses hardware designs of TPU v3 except for several components such as on-chip memory capacity, on-chip interconnect, and DMA, the VU of TPU v4 is the same structure as that of TPU v3. There have been processing-near-DRAM studies [10,14,31] to provide high of-chip memory bandwidth during inference. Because [10,14] use datalow architecture such as Eyeriss v1 [7] and systolic array, they still do not process DW-CONV eiciently.…”
Section: Related Workmentioning
confidence: 99%
“…There have been processing-near-DRAM studies [10,14,31] to provide high of-chip memory bandwidth during inference. Because [10,14] use datalow architecture such as Eyeriss v1 [7] and systolic array, they still do not process DW-CONV eiciently. In contrast, [31] has advantages for memory-intensive operations but has weaknesses for compute-intensive ST-CONV operations.…”
Section: Related Workmentioning
confidence: 99%
“…Note that also DRAM has been used for CIM [74], however, targeting high performance server applications which goes beyond the scope of this work.…”
Section: In-memory Computingmentioning
confidence: 99%