CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading

Nai, Lifeng; Hadidi, Ramyad; He, Xiao; Kim, Hyojong; Sim, Jaewoong

doi:10.1109/ipdps.2018.00077

Cited by 16 publications

(22 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For DCC architectures, solutions can be divided into two main categories: (1) PIM systems, which perform computations using special circuitry inside the memory module or by taking advantage of particular aspects of the memory itself, e.g., simultaneous activation of multiple DRAM rows for logical operations [1,[11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]; (2) NMP systems, which perform computations on a PE placed close to the memory module, e.g., CPU or GPU cores placed on the logic layer of 3D-stacked memory [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42]. For the purposes of this survey, we classify systems that use logic layers in 3D-stacked memories as NMP systems, as these logic layers are essentially computational cores that are near the memory stack (directly underneath it).…”

Section: Data-centric Computing Architecturesmentioning

confidence: 99%

“…Offloading can be performed at different granularities, e.g., instructions (including small groups of instructions) [1,13,16,19,24,25,28,32,37,39,40,42,57,91,92], threads [71], Nvidia's CUDA blocks/warps [27,29], kernels [26], and applications [38,41,73,74]. Instruction-level offloading is often used with a fixed-function accelerator and PIM systems [1,13,16,19,24,25,28,29,32,37,39,42,57,92]. For example, [42] offloads atomic instructions at instruction-level granularity to a fixed-function near-memory graph accelerator.…”

Section: Data Offloading Granularitymentioning

confidence: 99%

“…The most common optimization knobs in DCCs include selecting offloading workloads for memory, selecting the most suitable PE in/near memory, or the timing of executing selected offloads. To implement the policy, management techniques have employed code annotation [1,13,16,19,24,25,28,31,32,37,40,57,91,95], compiler-based code analysis [27,39,40,70,92,96], and online heuristics [27][28][29]38,71,72,74]. Table 1 classifies prominent works based on these attributes.…”

Section: Resource Management Of Data-centric Computing Systemsmentioning

confidence: 99%

“…An optimization objective is pivotal to the definition of a resource management policy. Although the direct goal of PIM/NMP systems is to reduce data movement between the host and memory, the optimization objectives are better expressed as performance improvement [1,13,16,19,22,24,25,31,32,[37][38][39][40][41]57,59,[71][72][73]75,92,96], energy efficiency [38,73,91], and thermal feasibility [29] of the system. 4.1.1.…”

Section: Optimization Objectivesmentioning

confidence: 99%

“…The specific method to assess locality varies across different implementations. Commonly applied methods include cache profiling [27,28,42], code analysis [26,39,42,70,96], hardware cache-hit counters [28], and heuristic or machine learning techniques [27][28][29]38,71,72,74].…”

Section: Optimization Objectivesmentioning

confidence: 99%

See 4 more Smart Citations

A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures

Khan

Pasricha

Kim

2020

JLPEA

View full text Add to dashboard Cite

Due to the amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become a bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D-stacked memories or charge-sharing-based bitwise operations in dynamic random-access memory (DRAM). However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management.

show abstract