Alessandro Bardine scite author profile

NUCA caches are large L2 on-chip cache memories characterized by multi-bank partitioning and designed to hide wire delay effects. They exhibit high hit rates while keeping access latency low. Proposed designs for such caches are Static NUCA, in which data are statically allocated to the cache banks, and Dynamic NUCA, in which data may reside in different banks, and a migration mechanism is introduced to better tolerate wire delay effects. The two architectures permit to achieve different performances by acting on architectural parameters and data management policies, at the cost of different balances between static and dynamic power consumption and energy dissipation. In this work, we propose preliminary results of the characterization of such balances, by presenting an evaluation of performance and energy consumption of conventional UCAs, and Static and Dynamic NUCA caches. All the considered caches architectures are equal sized and they are supposed to be used in an aggressive high frequency system running some applications from the SPEC CPU2000 and the NAS Parallel Benchmarks suites. The experimental results obtained indicate that, although the migration of data contributes to increase the dynamic energy consumption in Dynamic NUCA caches, the higher IPC achieved permits to save static energy, which dominates the power/energy balance in all the considered architectures. As a consequence, such results would designate NUCA caches as the most performing and energy saving architectures. Besides, according to the obtained results, future power improvements for NUCA caches should concentrate on static energy, while, for the dynamic energy, the on-chip network is the most critical element. Migration of data is acceptable, since it has a positive impact on performance, and the increased dynamic energy is overwhelmed by the static energy savings resulting from the shorter execution time. In order to give a general validity to such statements, we need to explore more design space points for each architecture (by varying the running clock rate and other design parameters) and to evaluate them considering a larger set of benchmarks.

show abstract

Leveraging Data Promotion for Low Power D-NUCA Caches

Bardine

Comparetti

Foglia

et al. 2008

View full text Add to dashboard Cite

D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promotion/demotion mechanism, are able to tolerate the increasing wire delay effects introduced by technology scaling. As a consequence, they will outperform conventional caches (UCA, Uniform Cache Architectures) in future generation cores. Due to the promotion/demotion mechanism, we observed that the distribution of hits across the ways of a D-NUCA cache varies across applications as well as across different execution phases within a single application. In this work, we show how such a behavior can be leveraged to improve the D-NUCA power efficiency as well as to decrease its access latency. In particular, we propose: 1) A new microarchitectural technique to reduce the static power consumption of a D-NUCA cache by dynamically adapting the number of active (i.e. powered-on) ways to the need of the running application; our evaluation shows that a strong reduction of the average number of active ways (37.1%) is achievable, without significantly affecting the IPC (-2.25%), leading to a resultant reduction of the Energy Delay Product (EDP) of 30.9%. 2) A strategy to estimate the characteristic parameters of the proposed technique. 3) An evaluation of the effectiveness of the proposed technique in the multicore environment

show abstract

A real-time configurable NURBS interpolator with bounded acceleration, jerk and chord error

Annoni

Bardine

Campanelli

et al. 2012

Computer-Aided Design

View full text Add to dashboard Cite

Evaluation of Leakage Reduction Alternatives for Deep Submicron Dynamic Nonuniform Cache Architecture Caches

Bardine

Comparetti²,

Foglia

et al. 2014

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Abstract— Wire delays and leakage energy consumption are both growing problems in designing large on-chip caches. Nonuniform cache architecture (NUCA) is a wire-delay aware design paradigm based on the sub-banking of a cache, which allows the banks closer to the controller to be accessed with reduced latencies with respect to the other banks. This feature is leveraged by dynamic NUCA (D-NUCA) caches via a migration mechanism which speeds up frequently used data access, further reducing the effect wire delays have on performance. To reduce leakage power consumption of static random access memory caches, various micro-architectural techniques have been proposed. In this brief, we compare the beneﬁts and limits of the application of some of these techniques to a D-NUCA cache memory, and propose a novel hybrid scheme based on the Drowsy and Way Adaptable techniques. Such a scheme allows further improvement in leakage reduction and limits the impact of process variation on the effectiveness of the Drowsy technique

show abstract

Way adaptable D-NUCA caches

Bardine

Comparetti

Foglia

et al. 2010

IJHPSA

View full text Add to dashboard Cite

Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a scalable on-chip network to interconnect the banks with the cache controller, the average access latency can be reduced with respect to a traditional cache. The addition of a migration mechanism to move the most frequently accessed data towards the cache controller (D-NUCA) further improves the average access latency. In this work we propose a last-level cache design, based on the D-NUCA scheme, which is able to significantly limit its static power consumption by dynamically adapting to the needs of the running application: the way adaptable D-NUCA cache. This design leads to a fast and power-efficient memory hierarchy with an average reduction by 31.2% in energy-delay product (EDP) with respect to a traditional D-NUCA. We propose and discuss a methodology for tuning the intrinsic parameters of our design and investigate the adoption of the way adaptable D-NUCA scheme as a shared L2 cache in a chip multiprocessor (CMP) system (24% reduction of EDP).

show abstract

Improving power efficiency of D-NUCA caches

Bardine

Foglia

Gabrielli

et al. 2007

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promotion/demotion mechanism, are able to tolerate the increasing wire delay effects introduced by technology scaling. As a consequence, they will outperform conventional caches (UCA, Uniform Cache Architectures) in future generation cores. Due to the promotion/demotion mechanism, we have found that, in a D-NUCA cache, the distribution of hits on the ways varies across applications as well as across different execution phases within a single application. In this paper, we show how such a behavior can be utilized to improve D-NUCA power efficiency as well as to decrease its access latencies. In particular, we propose a new D-NUCA structure, called Way Adaptable D-NUCA cache, in which the number of active (i.e. powered-on) ways is dynamically adapted to the need of the running application. Our initial evaluation shows that a consistent reduction of both the average number of active ways (42% in average) and the number of bank access requests (29% in average) is achieved, without significantly affecting the IPC.

show abstract

NURBS interpolator with confined chord error and tangential and centripetal acceleration control

Bardine

Campanelli

Foglia

et al. 2010

View full text Add to dashboard Cite

Energy Behaviour of NUCA Caches in CMPs

Bardine

Foglia

Panicucci

et al. 2011

View full text Add to dashboard Cite

Advances in technology of semiconductor make nowadays possible to design Chip Multiprocessor Systems equipped with huge on-chip Last Level Caches. Due to the wire delay problem, the use of traditional cache memories with a uniform access time would result in unacceptable response latencies. NUCA (Non Uniform Cache Access) architecture has been proposed as a viable solution to hide the adverse impact of wires delay on performance. Many previous studies have focused on the effectiveness of NUCA architectures, but the study of the energy and power aspects of NUCA caches is still limited. In this work, we present an energy model specifically suited for NUCA-based CMP systems, together with a methodology to employ the model to evaluate the NUCA energy consumption. Moreover, we present a performance and energy dissipation analysis for two 8-core CMP systems with an S-NUCA and a D-NUCA, respectively. Experimental results show that, similarly to the monolithic processor, the static power also dominates the total power budget in the CMP system.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.