Cache Hierarchy Optimization

Yavits, Leonid; Morad, Amir; Ginosar, Ran

doi:10.1109/l-ca.2013.18

Cited by 9 publications

(8 citation statements)

References 14 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Larger caches are more expensive [26,27]: Area and static power increase roughly linearly with size, while access latency and energy scale roughly with its square root [65]. SRAM caches from 512 KB to 32 MB have access latencies from 9 to 45 cycles and access energies from 0.2 to 1.7 nJ, and stacked DRAM caches from 128 MB to 2 GB have access latencies from 42 to 74 cycles and energies from 4.4 nJ to 6 nJ.…”

Section: Motivationmentioning

confidence: 99%

Jenga

Tsai¹,

Beckmann²,

Sánchez³

2017

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

Caches are traditionally organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, since working sets settle at the smallest (i.e., fastest and most energy-efficient) level they fit in. However, rigid hierarchies also add overheads, because each level adds latency and energy even when it does not fit the working set. These overheads are expensive on emerging systems with heterogeneous memories, where the differences in latency and energy across levels are small. Significant gains are possible by specializing the hierarchy to applications. We propose Jenga, a reconfigurable cache hierarchy that dynamically and transparently specializes itself to applications. Jenga builds virtual cache hierarchies out of heterogeneous, distributed cache banks using simple hardware mechanisms and an OS runtime. In contrast to prior techniques that trade energy and bandwidth for performance (e.g., dynamic bypassing or prefetching), Jenga eliminates accesses to unwanted cache levels. Jenga thus improves both performance and energy efficiency. On a 36-core chip with a 1 GB DRAM cache, Jenga improves energy-delay product over a combination of state-of-the-art techniques by 23% on average and by up to 85%. CCS CONCEPTS • Computer systems organization → Multicore architectures;

show abstract

Section: Motivationmentioning

confidence: 99%

Jenga

Tsai¹,

Beckmann²,

Sánchez³

2017

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

show abstract

“…Krishna et al [5] researched the optimal area allocation between cores and cache. Yavits et al [19] developed an analytical model for cache hierarchy levels.…”

Section: Related Workmentioning

confidence: 99%

“…This framework can be extended to any number of private, shared or hybrid levels. Following [19], we assume that the access time of the LLC is approximated by power-law model: (22) Both T and the exponent p are found by fitting the power law (22) curve to the cache access time data generated by CACTI. For caches having several shared clients, the access time can be written as follows:…”

Section: B Processing Corementioning

confidence: 99%

Convex optimization of resource allocation in asymmetric and heterogeneous SoC

Morad

Yavits

Ginosar

2014

2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)

Self Cite

View full text Add to dashboard Cite

Chip area, power consumption, execution time, off chip memory bandwidth, overall cache miss rate and Network on Chip (NoC) capacity are limiting the scalability of SoCs. Consider a workload comprising a sequential and multiple concurrent tasks and asymmetric or heterogeneous SoC architecture. A convex optimization framework is proposed, for selecting the optimal set of processing cores and allocating area and power resources among them, the NoC and the last level cache, under constrained total area, total average power, total execution time and off-chip bandwidth. The framework relies on analytical performance and power models of the processing cores, NoC and last level cache as a function of their allocated resources. Due to practical implementation of the cores, the optimal architecture under constraints may exclude several of the cores. Several asymmetric and heterogeneous configurations are explored. Convex optimization is shown to extend optimizations based on Lagrange multipliers. We find that our framework obtains the optimal chip resources allocation over a wide spectrum of parameters and constraints, and thus can automate complex architectural design, analysis and verification.

show abstract

“…A classical CMP architecture paradigm includes design choices such as symmetric vs. asymmetric CMP [18], number of cores vs. core size [18], cores vs. cache [1] [14] etc. When designing a 3D CMP, the computer architect must address an additional question: How does the temperature affect the number of cores of 3D CMP and their size?…”

Section: Introduction and Related Workmentioning

confidence: 99%

The Effect of Temperature on Amdahl Law in 3D Multicore Era

Yavits

Morad

Ginosar

2016

IEEE Trans. Comput.

View full text Add to dashboard Cite

This work studies the influence of temperature on performance and scalability of 3D Chip Multiprocessors (CMP) from Amdahl's law perspective. We find that 3D CMP may reach its thermal limit before reaching its maximum power. We show that a high level of parallelism may lead to high peak temperatures even in small scale 3D CMPs, thus limiting 3D CMP scalability and calling for different, in-memory computing architectures.

show abstract

Cache Hierarchy Optimization

Cited by 9 publications

References 14 publications

Jenga

Jenga

Convex optimization of resource allocation in asymmetric and heterogeneous SoC

The Effect of Temperature on Amdahl Law in 3D Multicore Era

Contact Info

Product

Resources

About