Discrete Cache Insertion Policies for Shared Last Level Cache Management on Large Multicores

Sridharan, Aswinkumar; Seznec, André

doi:10.1109/ipdps.2016.30

Cited by 3 publications

(2 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Class 1b functions benefit from the NDP system, but primarily because of the lower memory access latency (and energy) that the NDP system provides for memory requests that need to be serviced by DRAM. These functions could benefit from other latency and energy reduction techniques, such as L2/L3 cache bypassing [151,189,190,205,247,269,354,356,365,378,395,396,403], low-latency DRAM [62-66, 75, 86, 163-165, 212, 236, 238-240, 256, 263, 271, 314, 352, 355, 358, 375, 417], and better memory access scheduling [24, 100, 102, 129, 173, 181, 221, 222, 275, 277, 294, 295, 343, 344, 384-387, 405, 412, 425, 438, 450]. However, they generally do not benefit significantly from prefetching (as seen in Figure 5(b)), since infrequent memory requests make it difficult for the prefetcher to successfully train on an access pattern.…”

Section: Class 1bmentioning

confidence: 99%

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

F.¹,

Gómez-Luna²,

Orosa³

et al. 2021

Preprint

View full text Add to dashboard Cite

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Prior NDP works investigate the root causes of data movement bottlenecks using different profiling methodologies and tools. However, there is still a lack of understanding about the key metrics that can identify different data movement bottlenecks and their relation to traditional and emerging data movement mitigation mechanisms. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques (e.g., caching and prefetching) to more memory-centric techniques (e.g., NDP), thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement.With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV. CCS Concepts: • Hardware → Dynamic memory; • Computing methodologies → Model development and analysis; • Computer systems organization → Architectures.

show abstract

Section: Class 1bmentioning

confidence: 99%

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

F.¹,

Gómez-Luna²,

Orosa³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Such methods work well for small number of cores say 2 or 4, but show limited performance for large number of cores. To address the issue, Sridharan and Seznec (2016) introduce a policy called adaptive discrete and deprioritised application prioritisation (ADAPT) for cores with large count. The scheme works on a metric termed as foot-print numbers.…”

Section: Optimising Eviction Using Partitioning Schemesmentioning

confidence: 99%

A review on shared resource contention in multicores and its mitigating techniques

Jain¹,

Surve²

2020

IJHPSA

View full text Add to dashboard Cite

Chip multiprocessor (CMP) systems have become inevitable to meet high computing demands. In such systems sharing of resources is imperative for better resource utilisation. The challenge arises when various application programs running on neighbouring cores compete for these resources concurrently and introduce contention. We aim to present in a simple, lucid and captivating manner a review of previous work on contention in multicores due to various shared resources like shared caches, main memory, memory bus bandwidth, prefetchers etc. The work investigates key ideas proposed by the research community to alleviate resource contention due to these various resources, under a single umbrella. The prime objective of the study is to throw light upon the fact that, alone a single shared component is not a dominant reason for performance degradation in CMPs, rather all elements in the memory hierarchy introduce resource contention thereby affecting performance cumulatively. The work presented would assist novice readers, researchers and academicians to further serve to propose optimal policies to address contention in designing multicore applications, considering the overall impact of these resources on the performance of multicore systems.

show abstract

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Oliveira¹,

Gómez-Luna

Orosa

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Prior NDP works investigate the root causes of data movement bottlenecks using different profiling methodologies and tools. However, there is still a lack of understanding about the key metrics that can identify different data movement bottlenecks and their relation to traditional and emerging data movement mitigation mechanisms. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques (e.g., caching and prefetching) to more memory-centric techniques (e.g., NDP), thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https: //github.com/CMU-SAFARI/DAMOV.

show abstract

Discrete Cache Insertion Policies for Shared Last Level Cache Management on Large Multicores

Cited by 3 publications

References 45 publications

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

A review on shared resource contention in multicores and its mitigating techniques

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Contact Info

Product

Resources

About