Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Cho, Sangyeun; Jin, Lei

doi:10.1109/micro.2006.31

Cited by 234 publications

(205 citation statements)

References 25 publications

Supporting

Mentioning

202

Contrasting

Order By: Relevance

“…With page-coloring, one can mitigate the contention problem [4,10,15,17,21,22,24,26] by modifying the kernel buddy system while avoiding expensive hardware changes to memory controllers or cache hierarchies.…”

Section: Page-coloring Based Memory Managementmentioning

confidence: 99%

Going vertical in memory management: Handling multiplicity by multi-policy

Liu

Cui

et al. 2014

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

show abstract

Section: Page-coloring Based Memory Managementmentioning

confidence: 99%

Going vertical in memory management: Handling multiplicity by multi-policy

Liu

Cui

et al. 2014

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

show abstract

“…A third configuration uses private LLCs. Finally, we consider an S-NUCA configuration in which the blocks are mapped to the L2 banks using a first touch policy [29]. The first time a block is requested, the memory page containing that block is mapped to the L2 bank in the requestor's tile.…”

Section: Performance Evaluationmentioning

confidence: 99%

“…OS-based techniques to achieve a better mapping of the cache blocks to the LLC banks have been proposed by Cho et al [29], Ros et al [20], Das et al…”

Section: Related Workmentioning

confidence: 99%

Runtime home mapping for effective memory resource usage

Lodde

Flich

2014

Microprocessors and Microsystems

View full text Add to dashboard Cite

In tiled Chip Multiprocessors (CMPs) last-level cache (LLC) banks are usually shared but distributed among the tiles. A static mapping of cache blocks to the LLC banks leads to poor efficiency since a block may be mapped away from the tiles actually accessing it. Dynamic policies either rely on the static mapping of blocks to a set of banks (D-NUCA) or rely on the OS to dynamically load pages to statically mapped addresses (first-touch).In this paper, we propose Runtime Home Mapping (RHM), a new dynamic approach where the LLC home bank is determined at runtime by the memory controller when the block is fetched from main memory, trying to map each block as close as possible to the requestor thus speeding up execution time and lowering message latencies. Block migration and replication provide further improvements to basic RHM. Also, in a further optimization we eliminate the directory structure. All these optimizations involve specific NoC optimizations and co-designs. Results with PARSEC and SPLASH-2 applications show, when compared with alternative solutions, that RHM achieves a 41% and 35% average reduction in load and store latencies respectively compared to static mapping. This leads to an average reduction of 28% in applications execution.

show abstract

“…All results are normalized to that of an ideal interconnect, in which we do not model any routing delay, contention, or queuing delays. We model only the wire delay over the manhattan distance between the sender and receiver node (30ps/mm [32] [4], [15], [24] to reduce traffic Benchmarks Used Splash-2 [35] barnes (ba), cholesky (ch), fft (ff), fmm (fm) lu (lu), ocean (oc), radiosity (rs), radix (rx) raytrace (ry), water-spatial (ws) Parsec [7] blackscholes (bl), fluidanimate (fl) Other em3d (em), ilink (il), jacobi (ja) mp3d (mp), shallow (sh), tsp (ts) The reason for TLLB's performance is its latency. In a medium-scale CMP like the one simulated here, the overall throughput demand seldom overwhelms the shared bus.…”

Section: B Experimental Analysismentioning

confidence: 99%

A design space exploration of transmission-line links for on-chip interconnect

Carpenter¹,

Hu$^{²,

Huang³

et al. 2011

IEEE/ACM International Symposium on Low Power Electronics and Design

View full text Add to dashboard Cite

Abstract-With increasing core count, chip multiprocessors (CMP) require a high-performance interconnect fabric that is energy-efficient. Well-engineered transmission linebased communication systems offer an attractive solution, especially for CMPs with a moderate number of cores. While transmission lines have been used in a wide variety of purposes, there lack comprehensive studies to guide architects to navigate the circuit and physical design space to make proper architecture-level analyses and tradeoffs. This paper makes a first-step effort in exploring part of the design space. Using detailed simulation-based analysis, we show that a shared-medium fabric based on transmission line can offer better performance and a much better energy profile than a conventional mesh interconnect.

show abstract

Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Abstract: Abstract

Cited by 234 publications

References 25 publications

Going vertical in memory management: Handling multiplicity by multi-policy

Going vertical in memory management: Handling multiplicity by multi-policy

Runtime home mapping for effective memory resource usage

A design space exploration of transmission-line links for on-chip interconnect

Contact Info

Product

Resources

About