2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) 2015
DOI: 10.1109/hpca.2015.7056045
|View full text |Cite
|
Sign up to set email alerts
|

High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches

Abstract: Increasing transistor density enables adding more on-die cache real-estate. However, devoting more space to the shared lastlevel-cache (LLC) causes the memory latency bottleneck to move from memory access latency to shared cache access latency. As such, applications whose working set is larger than the smaller caches spend a large fraction of their execution time on shared cache access latency. To address this problem, this paper investigates increasing the size of smaller private caches in the hierarchy as op… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(18 citation statements)
references
References 29 publications
(61 reference statements)
0
18
0
Order By: Relevance
“…On the other hand, the Skylake architecture has support for AVX-512 instructions, more parallel cores, and larger L2 caches. Furthermore, Haswell and Broadwell implement an inclusive L2/L3 cache hierarchy, while Skylake implements an non-inclusive/exclusive cachehierarchy [34,35]. (For the remainder of this paper we will refer to Skylake's L2/L3 cache hierarchy as exclusive).…”
Section: Machinesmentioning
confidence: 99%
“…On the other hand, the Skylake architecture has support for AVX-512 instructions, more parallel cores, and larger L2 caches. Furthermore, Haswell and Broadwell implement an inclusive L2/L3 cache hierarchy, while Skylake implements an non-inclusive/exclusive cachehierarchy [34,35]. (For the remainder of this paper we will refer to Skylake's L2/L3 cache hierarchy as exclusive).…”
Section: Machinesmentioning
confidence: 99%
“…Xiao et al [24] presented a dual queues cache replacement algorithm based on sequentiality detection to improve the cache design. Jaleel et al [23] presented directions for further research to maximize performance of exclusive cache hierarchies. Chou et al [35] proposed CAMEO which not only makes stacked DRAM visible as part of the memory address space but also exploits data locality.…”
Section: Cache Architecture Designmentioning
confidence: 99%
“…Most of recent researches on hybrid SRAM and DRAM caches focus mainly on enhancing the overall performance of SRAM (resp., DRAM) by utilizing the merit of DRAM (resp., SRAM). There are also many papers devoted to investigating workload performance: (1) For multi-programmed workloads, prior work discussed the issues of relieving memory contention [10,11], workload balance [12,13] and power related optimization [14]; (2) To improve the performance of memory-intensive workloads, many solutions (e.g., architecture design [15][16][17], OS level method [18][19][20] and feedback control [21,22]) have also been proposed; (3) In the cache system, the improved cache architectures [4,9,23,24] and 3D-stacked DRAM technologies [25][26][27] are used to achieve better workload performance; and so on (a broader overview of related work will be covered in Section 2). Instead, little attention has been paid to designing a last level cache (LLC) scheduling scheme for multi-programmed workloads with different memory footprints.…”
Section: Introductionmentioning
confidence: 99%
“…Our work focuses on mitigating the power dissipation caused by the following cache coherence problems: a) Non sequential data fetch: Cache prefetcher fetches data in a sequential manner, 978-1-4799-5341-7/16/$31.00 ©2016 IEEE randomly allocated data causes more cache misses. One way to ensure sequential data fetching is to redesign the cache hierarchy [10]. However, it is difficult to keep data allocation in sequence in SMT CMP architecture, where the context switches regularly and memory gets allocated randomly.…”
Section: A Cache Coherence In Multithreadingmentioning
confidence: 99%