Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM

Liu, Ke; Zhang, Xuan; So, Jinin; Lee, Jong Geon; Kang, Shin Haeng; Lee, Sukhan; Han, Song-Yi; Cho, Yeongon; Kim, Jin Hyun; Kwon, Yongsuk; Kim, Kyung-Soo; Jung, Jin Chul; Yun, Ilkwon; Park, Sung Joo; Park, Hyun Sun; Song, Joonho; Cho, Jeonghyeon; Sohn, Kiwon; Kim, Nam Sung; Lee, Hsien Hsin Sean

doi:10.1109/mm.2021.3097700

Cited by 55 publications

(14 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many works from academia [2, 10-12, 15-23, 25, 31, 35-39, 48, 81-83, 85, 86, 90, 99, 104-112] and industry [34,[41][42][43][50][51][52][53][54] have shown the benefits of PnM and PuM for a wide range of workloads from different domains. However, fully adopting PIM in commercial systems is still very challenging due to the lack of tools and system support for PIM architectures across the computer architecture stack [4], which includes: (i) workload characterization methodologies and benchmark suites targeting PIM architectures; (ii) frameworks that can facilitate the implementation of complex operations and algorithms using the underlying PIM primitives (e.g., simple PIM arithmetic operations [19], bulk bitwise Boolean in-DRAM operations [83,84,92]); (iii) compiler support and compiler optimizations targeting PIM architectures; (iv) operating system support for PIM-aware virtual memory, memory management, data allocation and mapping; and (v) efficient data coherence and consistency mechanisms.…”

Section: Motivation and Problemmentioning

confidence: 99%

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

F.¹,

Gómez-Luna²,

Ghose³

et al. 2022

Preprint

View full text Add to dashboard Cite

Section: Motivation and Problemmentioning

confidence: 99%

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

F.¹,

Gómez-Luna²,

Ghose³

et al. 2022

Preprint

View full text Add to dashboard Cite

“…Finally, the last two bars show the impact of applying various recent technologies such as DRAM refresh reduction [39], DRAM idle power-off [56], memory disaggregation [44], and near-memory processing [38]. We model the impact of these optimizations by employing them only on the appropriate components, following the typical power profile of data-centerclass servers [26] and switches [4], [28].…”

Section: Motivationmentioning

confidence: 99%

“…To capture a wider range or projections, we also include two more sophisticated energy-efficiency optimizations that have been in active development in industry for more than a decade, and while they are not mainstream products yet, they are in advanced stages of development. These optimizations include near-memory processing [38] and disaggregation [44]. There is a large number of much more aggressive optimizations that we explicitly chose not to include, as their ability to scale up to production at reasonable cost is unknown, or they are not a good fit for hypercale data centers, or simply because they are not commercially available yet, despite their high potential (e.g., STT-RAM, PCM, near-threshold-voltage processors, spintronics, neuromorphic processors, and chipand board-level photonics).…”

Section: Motivationmentioning

confidence: 99%

Energy-Proportional Data Center Network Architecture Through OS, Switch and Laser Co-design

Han¹,

Terzenidis²,

Syrivelis³

et al. 2021

Preprint

View full text Add to dashboard Cite

Optical interconnects are already the dominant technology in large-scale data center networks. However, the high optical loss of many optical components coupled with the low efficiency of laser sources result in high aggregate power requirements for the thousands of optical transceivers used by these networks. As optical interconnects stay always on even as traffic demands ebb and flow, most of this power is wasted. We present LC DC , a data center network system architecture in which the operating system, the switch, and the optical components are co-designed to achieve energy proportionality.LC DC capitalizes on the path divergence of data center networks to turn on and off redundant paths according to traffic demand, while maintaining full connectivity. Turning off redundant paths allows the optical transceivers and their electronic drivers to power down and save energy. Maintaining full connectivity hides the laser turn-on delay. At the node layer, intercepting send requests within the OS allows for the NIC's laser turn-on delay to be fully overlapped with TCP/IP packet processing, and thus egress links can remain powered off until needed with zero performance penalty.We demonstrate the feasibility of LC DC by i) implementing the necessary modifications in the Linux kernel and device drivers, ii) implementing a 10 Gbit/s FPGA switch, and iii) performing physical experiments with optical devices and circuit simulations. Our results on university data center traces and models of Facebook and Microsoft data center traffic show that LC DC saves on average 60% of the optical transceivers power (68% max) at the cost of 6% higher packet delay.

show abstract

“…In such a computing architecture, data can be processed directly inside the memory, minimizing the data movement between the CPU and the memory. Machine learning applications [26,29], databases [15,16], personalised recommendation systems [10,11], and genomics [2] benefit from the massive parallelization of in-memory computing.…”

Section: Introductionmentioning

confidence: 99%

FAT-PIM: Low-Cost Error Detection for Processing-In-Memory

Zubair¹,

Jha²,

Mohaisen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Processing In Memory (PIM) accelerators are promising because they can provide massive parallelization and high efficiency in multiple application domains. These architectures can produce near-instantaneous results over wide data streams, allowing for real-time performance in data-intensive workloads. For instance, Resistive Memory (ReRAM) based PIM architectures are widely known for their inherent dot-product computation capability. While the performance of these architectures is appealing, reliability and accuracy are also important, especially in mission-critical real-time systems. Unfortunately, PIM architectures have a fundamental limitation in guaranteeing error-free operation. As a result, the current methods must pay high implementation costs or performance penalties to achieve reliable execution in the PIM accelerator. In this paper, we make a fundamental observation of this reliability limitation of ReRAM based PIM architecture. Accordingly, we propose a novel solution-Fault Tolerant PIM or FAT-PIM, that allows for low-cost error detection. Our evaluation using simulation technique shows that we can detect all errors with only 4.9% performance cost and 3.9% storage overhead.

show abstract

Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM

Cited by 55 publications

References 16 publications

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

Energy-Proportional Data Center Network Architecture Through OS, Switch and Laser Co-design

FAT-PIM: Low-Cost Error Detection for Processing-In-Memory

Contact Info

Product

Resources

About