An Energy-Efficient Last-Level Cache Architecture for Process Variation-Tolerant 3D Microprocessors

Kong, Joonho; Koushanfar, Farinaz; Chung, Sung Woo

doi:10.1109/tc.2014.2378291

Cited by 8 publications

(5 citation statements)

References 40 publications

(69 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…WBD disables the faulty cache lines (blocks). Please note that this technique is widely used, simple, practical, and also introduced in [8] (similar to Intel Pellston Technology), [9], and [10] (introduced as a naïve way reduction scheme). Though this scheme could always lead to 100% cache yield by employing the adaptive cache bypassing (i.e., bypasses cache if there is no non-faulty blocks in a cache set), the effective cache capacity would be significantly reduced, eventually causing performance losses.…”

Section: Performancementioning

confidence: 99%

“…For energy comparison, we further classify the WBD technique into the cases where the Gated-Vdd [23] is applied (WBD w/ Gated-Vdd) and not applied (WBD w/o Gated-Vdd). In the case of WBD w/ GVdd, the disabled cache block is powered off to reduce leakage power consumption (as in [9] and [10]) while the WBD w/o Gated-Vdd (WBD w/o GVdd) does not apply the Gated-Vdd. Compared to the ideal case (i.e., baseline), VL_base and VL_mig show energy overheads of only 7.5% and 7.3%, respectively.…”

Section: Performancementioning

confidence: 99%

“…Moreover, compared to a frequency (speed) binning technique [7] that is widely used for yield improvement, our proposed cache architecture improves not only yield but also performance and energy efficiency. In addition, as compared to disabling the faulty cache blocks (WBD) [8] (similar to Intel Pellston Technology) [9], [10] (introduced as a naïve way reduction scheme), our technique leads to much better performance with comparable yield even under severe process variations. Considering that the yield significantly affects the profitability of the chip manufacturing companies, our work can be a promising alternative to future near-threshold processor design for manufacturability.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Near-Threshold L1 Data Cache for Yield Management Under Process Variations

Kong

Hur

2020

IEEE Access

Self Cite

View full text Add to dashboard Cite

Near-threshold computing (NTC) has recently emerged and been considered as a strong candidate for future energy-efficient computing. However, adverse impacts from process variation such as delay and power fluctuations within die as well as across dies are much more severe than the super-threshold regime. In particular, static random access memory (SRAM)-based components (e.g., cache memories) are easily affected by process variation in NTC, resulting in large delay fluctuations. It incurs a huge loss in the maximum clock frequencies of processors, which eventually leads to huge yield losses. In this paper, we first analyze L1 data cache yield in NTC and reveal an inefficiency of frequency binning for yield improvement in NTC. We then introduce a variable latency L1 data cache for NTC to obtain a sufficient yield. By allowing the higher cache access cycles, we can improve cache yield with only a little performance overhead. Moreover, we propose an adaptive line migration technique which improves performance and energy efficiency of variable latency caches. The cache line which is expected to be frequently accessed in the near future is dynamically migrated to the fastest way in a cache set. According to our evaluation, our cache architecture greatly improves cache yield with only a little performance, energy, and area overhead.

show abstract

Section: Performancementioning

confidence: 99%

Section: Performancementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Near-Threshold L1 Data Cache for Yield Management Under Process Variations

Kong

Hur

2020

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some researchers try to reduce energy consumption by applying the dynamic voltage and frequency scaling technique to manage the shared cache network [20], or proposing novel tree-based directory to bridge plenty of shared cache portions in 3D network [21]. The research in [22] proposed a novel narrow-width-value based stacked 3D cache architecture for both energy saving and yield improvement. The research in [23] tried to use thermal information in the shared cache to adaptively balance runtime status.…”

Section: Related Workmentioning

confidence: 99%

Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP Systems

2019

View full text Add to dashboard Cite

In current Chip Multi-Processor (CMP) systems, data sharing existing in cache hierarchy acts as a critical issue which costs plenty of clock cycles for maintaining data coherence. Along with the integrated core number increasing, the only shared cache serves too many processing threads to maintain sharing data efficiently. In this work, an enhanced router network is integrated within the private cache level for fast interconnecting sharing data accesses existing in different threads. All sharing data in private cache level can be classified into seven access types by experimental pattern analysis. Then, both shared accesses and thread-crossed accesses can be rapidly detected and dealt with in the proposed router network. As a result, the access latency of private cache is decreased, and a conventional coherence traffic problem is alleviated. The process in the proposed path is composed of three steps. Firstly, the target accesses can be detected by exploring in the router network. Then, the proposed replacement logic can handle those accesses for maintaining data coherence. Finally, those accesses are delivered in the proposed data deliverer. Thus, the harmful data sharing accesses are solved within the first chip layer in 3D-IC structure. The proposed system is also implemented into a cycle-precise simulation platform, and experimental results illustrate that our model can improve the Instructions Per Cycle (IPC) of on-chip execution by maximum 31.85 percent, while energy consumption can be saved by about 17.61 percent compared to the base system.

show abstract

“…Although many recent researches discover that stacked architectures are greatly adapted in area saving, network interconnection and layout optimization [8,21], however, those architectures are limited in their ability to match locality distributions among applications, and to manage highly shared data efficiently as each application contains different behaviors on runtime system latency, performance and energy debit [29]. Moreover, situations are more critical in shared last level cache, because the shared cache should serve too many threads for many data sharing, resulting in serious efficiency and coherence problems [15,17,26]. For better management of the shared cache, partitioned cache methods [1,7,22] are proposed, which can allocate cache parts into several groups corresponding to each thread.…”

Section: Introductionmentioning

confidence: 99%

Filter router: An enhanced router design for efficient stacked shared cache network

Zhao

Jia

Watanabe

2019

IEICE Electron. Express

View full text Add to dashboard Cite

In this paper, many shared cache accesses such as crossed accesses and repeated accesses will be filtered in proposed router network for purpose of access latency reduction. Firstly, the distribution features of all shared cache accesses are analyzed for further optimization on access latency. And then, a meshed router network integrated with enhanced routers is proposed for fast identification of target accesses and further handling them. Hence, the experimental results show that our network design can achieve an average improvement of 26.1 percent on speedup IPC and an average saving of 9.7 percent on energy consumption over base system.

show abstract

An Energy-Efficient Last-Level Cache Architecture for Process Variation-Tolerant 3D Microprocessors

Cited by 8 publications

References 40 publications

Near-Threshold L1 Data Cache for Yield Management Under Process Variations

Near-Threshold L1 Data Cache for Yield Management Under Process Variations

Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP Systems

Filter router: An enhanced router design for efficient stacked shared cache network

Contact Info

Product

Resources

About