A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool

Siegl, Patrick; Buchty, Rainer; Berekovic, Mladen

doi:10.1145/3132402.3132403

Cited by 2 publications

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For intra-HMC communication, the vault controllers are connected by an internal NoC, which is shown in the right part of Figure 2. At 1.25GHz execution frequency [16], HMC supplies a maximum bit-rate of 30 Gbit/s and 480 Gbit/s in transmission (Tx) and receive (Rx) directions, respectively, at each of the 16 link lanes. As a result, total 384 bits can be transferred between memory dies and switch per cycle.…”

Section: Hmc Communicationmentioning

confidence: 99%

“…Because the MAC operation in [10] is carried out in parallel under the HMC-MAC architecture with the support of parallel vault operations, bank interleaving and data block accesses. According to HMC specification, up to 128KB of data, which is the product of the number of vaults (32), the number of banks (16) and the maximum block size (256B), can be processed in parallel in a HMC. Such high efficiency of HMC execution inspires us to exploit the execution efficiency of HMC on DNN-related applications, which has rarely been exploited in previous arts.…”

Section: Potential Of Dnn Execution On Hmcmentioning

confidence: 99%

See 1 more Smart Citation

NeuralHMC

Min

Mao

et al. 2019

Proceedings of the 24th Asia and South Pacific Design Automation Conference

View full text Add to dashboard Cite

In Deep Neural Network (DNN) applications, energy consumption and performance cost of moving data between memory hierarchy and computational units are significantly higher than that of the computation itself. Process-in-memory (PIM) architecture such as Hybrid Memory Cube (HMC), becomes an excellent candidate to improve the data locality for efficient DNN execution. However, it's still hard to efficiently deploy large-scale matrix computation in DNN on HMC because of its coarse grained packet protocol. In this work, we propose NeuralHMC, the first HMC-based accelerator tailored for efficient DNN execution. Experimental results show that NeuralHMC reduces the data movement by 1.4× to 2.5× (depending on the DNN data reuse strategy) compared to Von Neumann architecture. Furthermore, compared to state-of-the-art PIM-based DNN accelerator, NeuralHMC can promisingly improve the system performance by 4.1× and reduces energy by 1.5×, on average.

show abstract

Section: Hmc Communicationmentioning

confidence: 99%

Section: Potential Of Dnn Execution On Hmcmentioning

confidence: 99%