A hierarchical neural model of data prefetching

Shi, Zhan; Jain, Anshul; Swersky, Kevin; Hashemi, Milad; Ranganathan, Parthasarathy; Lin, Calvin

doi:10.1145/3445814.3446752

Cited by 46 publications

(15 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our work takes a different ML-based approach looking at memory disaggregation design at the platform-level and is generally orthogonal to these prior works. ML for systems: ML is increasingly applied to tackle systems problems, such as cloud efficiency [55,61], memory/storage optimizations [124,125], microservices [126], caching/prefetching policies [127,128]. We uniquely apply ML methods for frigid memory prediction to support pooled memory provisioning to VMs without jeopardizing QoS.…”

Section: Related Workmentioning

confidence: 99%

First-generation Memory Disaggregation for Cloud Platforms

Li¹,

Berger²,

Novaković³

et al. 2022

Preprint

View full text Add to dashboard Cite

In Azure, up to 25% of memory is stranded, i.e., it is leftover after the servers' cores have been rented to VMs. Memory disaggregation promises to reduce this stranding. However, making disaggregation practical for production cloud deployment remains challenging. For example, RDMA-based disaggregation involves too much overhead for common workloads and transparent latency management is incompatible with virtualization acceleration. The emerging Compute Express Link (CXL) standard offers a low-overhead substrate to build memory disaggregation while overcoming these challenges. This paper proposes a first-generation CXL-based disaggregation system that meets the requirements of cloud providers. Our system includes a memory pool controller, and prediction-based system software and distributed control plane designs. Its predictions of VM latency sensitivity and memory usage allow it to split workloads across local and pooled memory while mitigating the higher pool latency.Our analysis of production clusters shows that small pools of 8-32 sockets are sufficient to reduce stranding significantly. It also shows that ∼50% of all VMs never touch 50% of their rented memory. In emulated experiments with 150+ workloads, we show our pooling approach incurs a configurable performance loss between 1-5%. Finally, we show that disaggregation can achieve a 9-10% reduction in overall DRAM, which represents hundreds of millions of dollars in cost savings for a large cloud provider.

show abstract

Section: Related Workmentioning

confidence: 99%

First-generation Memory Disaggregation for Cloud Platforms

Li¹,

Berger²,

Novaković³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Ease of implementation. Prior works have evaluated many sophisticated machine learning models like simple neural networks [104], LSTMs [61,113], and Graph Neural Networks (GNNs) [115] as models for hardware prefetching. Even though these techniques show encouraging results in accurately predicting memory accesses, they fall short especially in two major aspects.…”

Section: Why Is Rl a Good Fit For Prefetching?mentioning

confidence: 99%

“…Even though these techniques show encouraging results in accurately predicting memory accesses, they fall short especially in two major aspects. First, these models' sizes often exceed even the largest caches in traditional processors [61,104,113,115], making them impractical (or at best very difficult) to implement. Second, due to the vast amount of computation they require for inference, these models' inference latency is much higher than an acceptable latency of a prefetcher at any cache level.…”

Section: Why Is Rl a Good Fit For Prefetching?mentioning

confidence: 99%

“…Prior works apply ML techniques in computer architecture in two major ways: (1) to design adaptive, data-driven algorithms, and (2) to explore the large microarchitectural design-space. Researchers have proposed ML-based algorithms for various microarchitectural tasks like memory scheduling [64,93], cache management [28,86,109,112,126], branch prediction [57, 67-70, 124, 125, 138, 139, 145], address translation [88] and hardware prefetching [61,103,104,[113][114][115]140]. Pythia provides three key advantages over prior MLbased prefetchers.…”

Section: Other Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

Bera

Kanellopoulos

Nori

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

Past research has proposed numerous hardware prefetching techniques, most of which rely on exploiting one specific type of program context information (e.g., program counter, cacheline address, or delta between cacheline addresses) to predict future memory accesses. These techniques either completely neglect a prefetcher's undesirable effects (e.g., memory bandwidth usage) on the overall system, or incorporate system-level feedback as an afterthought to a system-unaware prefetch algorithm. We show that prior prefetchers often lose their performance benefit over a wide range of workloads and system configurations due to their inherent inability to take multiple different types of program context and system-level feedback information into account while prefetching. In this paper, we make a case for designing a holistic prefetch algorithm that learns to prefetch using multiple different types of program context and system-level feedback information inherent to its design.To this end, we propose Pythia, which formulates the prefetcher as a reinforcement learning agent. For every demand request, Pythia observes multiple different types of program context information to make a prefetch decision. For every prefetch decision, Pythia receives a numerical reward that evaluates prefetch quality under the current memory bandwidth usage. Pythia uses this reward to reinforce the correlation between program context information and prefetch decision to generate highly accurate, timely, and systemaware prefetch requests in the future. Our extensive evaluations using simulation and hardware synthesis show that Pythia outperforms two state-of-the-art prefetchers (MLOP and Bingo) by 3.4% and 3.8% in single-core, 7.7% and 9.6% in twelve-core, and 16.9% and 20.2% in bandwidth-constrained core configurations, while incurring only 1.03% area overhead over a desktop-class processor and no software changes in workloads. The source code of Pythia can be freely downloaded from https://github.com/CMU-SAFARI/Pythia.

show abstract

“…memory access history window) and the LSTM model size strongly affect the prefetcher learning ability under different noise levels or workload patterns. To accommodate the large memory space, Shi et al [202] introduce a neural hierarchical sequence model to decouple predictions of pages and offsets by using two separate attention-based LSTM layers, whereas its hardware implementation is impractical for actual processors.…”

Section: Memory System Designmentioning

confidence: 99%

A Survey of Machine Learning for Computer Architecture and Systems

Wu,

Xie

2021

Preprint

View full text Add to dashboard Cite

It has been a long time that computer architecture and systems are optimized to enable efficient execution of machine learning (ML) algorithms or models. Now, it is time to reconsider the relationship between ML and systems, and let ML transform the way that computer architecture and systems are designed. This embraces a twofold meaning: the improvement of designers' productivity, and the completion of the virtuous cycle. In this paper, we present a comprehensive review of work that applies ML for system design, which can be grouped into two major categories, ML-based modelling that involves predictions of performance metrics or some other criteria of interest, and ML-based design methodology that directly leverages ML as the design tool. For ML-based modelling, we discuss existing studies based on their target level of system, ranging from the circuit level to the architecture/system level. For ML-based design methodology, we follow a bottom-up path to review current work, with a scope of (micro-)architecture design (memory, branch prediction, NoC), coordination between architecture/system and workload (resource allocation and management, data center management, and security), compiler, and design automation. We further provide a future vision of opportunities and potential directions, and envision that applying ML for computer architecture and systems would thrive in the community. CCS Concepts: • Computing methodologies → Machine learning; • Computer systems organization → Architectures; • General and reference → Surveys and overviews.

show abstract

A hierarchical neural model of data prefetching

Cited by 46 publications

References 60 publications

First-generation Memory Disaggregation for Cloud Platforms

First-generation Memory Disaggregation for Cloud Platforms

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

A Survey of Machine Learning for Computer Architecture and Systems

Contact Info

Product

Resources

About