Ravishankar Iyer scite author profile

As chip multiprocessors (CMPs) become increasingly mainstream, architects have likewise become more interested in how best to share a cache hierarchy among multiple simultaneous threads of execution. The complexity of this problem is exacerbated as the number of simultaneous threads grows from two or four to the tens or hundreds. However, there is no consensus in the architectural community on what "best" means in this context. Some papers in the literature seek to equalize each thread's performance loss due to sharing, while others emphasize maximizing overall system performance. Furthermore, the specific effect of these goals varies depending on the metric used to define "performance".In this paper we label equal performance targets as Communist cache policies and overall performance targets as Utilitarian cache policies. We compare both of these models to the most common current model of a free-for-all cache (a Capitalist policy). We consider various performance metrics, including miss rates, bandwidth usage, and IPC, including both absolute and relative values of each metric. Using analytical models and behavioral cache simulation, we find that the optimal partitioning of a shared cache can vary greatly as different but reasonable definitions of optimality are applied. We also find that, although Communist and Utilitarian targets are generally compatible, each policy has workloads for which it provides poor overall performance or poor fairness, respectively. Finally, we find that simple policies like LRU replacement and static uniform partitioning are not sufficient to provide near-optimal performance under any reasonable definition, indicating that some thread-aware cache resource allocation mechanism is required.

show abstract

A Framework for Providing Quality of Service in Chip Multi-Processors

Guo

Solihin

Zhao

et al. 2007

View full text Add to dashboard Cite

Exploring DRAM cache architectures for CMP server platforms

Zhao

Iyer

Illikkal

et al. 2007

View full text Add to dashboard Cite

As dual-core and quad-core processors arrive in the marketplace, the momentum behind CMP architectures continues to grow strong. As more and more cores/threads are placed on-die, the pressure on the memory subsystem is rapidly increasing. To address this issue, we explore DRAM cache architectures for CMP platforms. In this paper, we investigate the impact of introducing a low latency, large capacity and high bandwidth DRAM-based cache between the last level SRAM cache and memory subsystem. We first show the potential benefits of large DRAM caches for key commercial server workloads. As the primary hurdle to achieving these benefits with DRAM caches is the tag space overheads associated with them, we identify the most efficient DRAM cache organization and investigate various options. Our results show that the combination of 8-bit partial tags and 2-way sectoring achieves the highest performance (20% to 70%) with the lowest tag space ( 25%) overhead.

show abstract

Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy

Madan

Zhao

Muralimanohar

et al. 2009

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.