Adrian Moga scite author profile

Increasing transistor density enables adding more on-die cache real-estate. However, devoting more space to the shared lastlevel-cache (LLC) causes the memory latency bottleneck to move from memory access latency to shared cache access latency. As such, applications whose working set is larger than the smaller caches spend a large fraction of their execution time on shared cache access latency. To address this problem, this paper investigates increasing the size of smaller private caches in the hierarchy as opposed to increasing the shared LLC. Doing so improves average cache access latency for workloads whose working set fits into the larger private cache while retaining the benefits of a shared LLC. The consequence of increasing the size of private caches is to relax inclusion and build exclusive hierarchies. Thus, for the same total caching capacity, an exclusive cache hierarchy provides better cache access latency.We observe that server workloads benefit tremendously from an exclusive hierarchy with large private caches. This is primarily because large private caches accommodate the large code workingsets of server workloads. For a 16-core CMP, an exclusive cache hierarchy improves server workload performance by 5-12% as compared to an equal capacity inclusive cache hierarchy. The paper also presents directions for further research to maximize performance of exclusive cache hierarchies.

show abstract

Hardware versus software implementation of COMA

Moga

Gefflaut²,

Dubois³

View full text Add to dashboard Cite

Rapid hardware prototyping on RPM-2

Dubois

Jeong

Song³

et al. 1998

IEEE Des. Test. Comput.

View full text Add to dashboard Cite

TODAY, THERE ARE MANY competingideas about how to implement multiprocessor systems. Although some of these ideas have been prototyped in hardware, hardware prototypes take too long to build and are very expensive. Often, by the time a hardware prototype really works, it is obsolete. First, the prototype's absolute speed is no longer on a par with current hardware. Second, the technology trade-offs among components change, so that performance results obtained on the prototype become meaningless. Third, the new architecture ideas embodied in the prototype may become irrelevant. Moreover, hardware prototypes are often hard to observe. By contrast, software simulations are very flexible, observable, and relatively inexpensive to develop. However, software simulations often force a trade-off between speed and realism.Hardware emulation using FPGAs (fieldprogrammable gate arrays) 1 is an intermediate approach between software simulation and hardware prototyping. We adopted this approach in a multiprocessor emulator called RPM (Rapid Prototyping Engine for Multiprocessor Systems). Because of its flexibility, the RPM hardware can adapt during its lifetime to the rapid evolution of technology trade-offs and new architectural ideas. RPM is also much more observable than typical hardware prototypes. RPM-2, the second RPM implementation, is up and running. Our first RPM-2 prototype is a cachecoherent nonuniform memory-access (CC-NUMA) multiprocessor. 2

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Adrian Moga

The effectiveness of SRAM network caches in clustered DSMs

High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches

Hardware versus software implementation of COMA

Rapid hardware prototyping on RPM-2

Contact Info

Product

Resources

About