StimulusCache: Boosting performance of chip multiprocessors with excess cache

Lee, Hyunjin; Cho, Sangyeun; Childers, Bruce R.

doi:10.1109/hpca.2010.5416644

Cited by 11 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The state-of-the-art nonuniform cache architecture (NUCA) management schemes, such as RNUCA [Hardavellas et al 2009] and page-recoloring scheme [Cho and Jin 2006], intend to place cache blocks near their most frequent requestors by smart initial placement, dynamic migration, and replication. Moreover, cache partitioning schemes [Qureshi and Patt 2006;Lee et al 2010; can be used so that a cluster of cores are only going to access a locally allocated cache partition. Communication between these partitions is only required when there are coherence invalidations and fills.…”

Section: Hierarchical Stream Arbitrationmentioning

confidence: 99%

Stream arbitration

Xiao

Chang²,

Cong³

et al. 2013

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Alternative interconnects are attractive for scaling on-chip communication bandwidth in a power-efficient manner. However, efficient utilization of the bandwidth provided by these emerging interconnects still remains an open problem due to the spatial and temporal communication heterogeneity. In this article, a Stream Arbitration scheme is proposed, where at runtime any source can compete for any communication channel of the interconnect to talk to any destination. We apply stream arbitration to radio frequency interconnect (RF-I). Experimental results show that compared to the representative token arbitration scheme, stream arbitration can provide an average 20% performance improvement and 12% power reduction.

show abstract

Section: Hierarchical Stream Arbitrationmentioning

confidence: 99%

Stream arbitration

Xiao

Chang²,

Cong³

et al. 2013

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

“…It avoids excessive replication of shared data and places private data in local L2 banks. StimulusCache [19] introduced techniques to utilize "excess caches" when some cores are disabled to improve the chip yield. Lastly, Elastic Cooperative Caching (ECC) [20] uses a distributed coherence engine for scalability.…”

Section: Related Workmentioning

confidence: 99%

“…The thread on this core needs a capacity of 6, which can be provided locally. Lastly, the cores in the core ID list form a "virtual L2 cache chain," somewhat similar to [19]. For example, when core 4 has a miss, the access is directed to core 1, then to core 5, and so on (from the MRU position to later positions).…”

Section: Cloudcachementioning

confidence: 99%

See 1 more Smart Citation

CloudCache: Expanding and shrinking private caches

Lee

Cho

Childers

2011

2011 IEEE 17th International Symposium on High Performance Computer Architecture

Self Cite

View full text Add to dashboard Cite

show abstract

“…Another approach is to detect sharedresource contention dynamically and take counter measures to mitigate the contention [8,11,15,21,24,30]. Several other recent studies address contention for shared LLC by using different hardware techniques [7,12,16,19,23,25].…”

Section: Introductionmentioning

confidence: 99%

Characterizing multi-threaded applications based on shared-resource contention

Dey

Wang

Davidson

et al. 2011

(Ieee Ispass) Ieee International Symposium on Performance Analysis of Systems and Software

View full text Add to dashboard Cite

Abstract-For higher processing and computing power, chip multiprocessors (CMPs) have become the new mainstream architecture. This shift to CMPs has created many challenges for fully utilizing the power of multiple execution cores. One of these challenges is managing contention for shared resources. Most of the recent research address contention for shared resources by single-threaded applications. However, as CMPs scale up to many cores, the trend of application design has shifted towards multi-threaded programming and new parallel models to fully utilize the underlying hardware. There are differences between how single-and multi-threaded applications contend for shared resources. Therefore, to develop approaches to reduce shared resource contention for emerging multi-threaded applications, it is crucial to understand how their performances are affected by contention for a particular shared resource. In this research, we propose and evaluate a general methodology for characterizing multi-threaded applications by determining the effect of shared-resource contention on performance. To demonstrate the methodology, we characterize the applications in the widely used PARSEC benchmark suite for shared-memory resource contention. The characterization reveals several interesting aspects of the benchmark suite. Three of twelve PARSEC benchmarks exhibit no contention for cache resources. Nine of the benchmarks exhibit contention for the L2-cache. Of these nine, only three exhibit contention between their own threads-most contention is because of competition with a co-runner. Interestingly, contention for the Front Side Bus is a major factor with all but two of the benchmarks and degrades performance by more than 11%.

show abstract

StimulusCache: Boosting performance of chip multiprocessors with excess cache

Cited by 11 publications

References 16 publications

Stream arbitration

Stream arbitration

CloudCache: Expanding and shrinking private caches

Characterizing multi-threaded applications based on shared-resource contention

Contact Info

Product

Resources

About