Jose Renau scite author profile

This paper presents CHeckpointed Early Resource RecYcling (Cherry), a hybrid mode of execution based on ROB and checkpointing that decouples resource recycling and instruction retirement. Resources are recycled early, resulting in a more efficient utilization. Cherry relies on state checkpointing and rollback to service exceptions for instructions whose resources have been recycled. Cherry leverages the ROB to (1) not require in-order execution as a fallback mechanism, (2) allow memory replay traps and branch mispredictions without rolling back to the Cherry checkpoint, and (3) quickly fall back to conventional out-of-order execution without rolling back to the checkpoint or flushing the pipeline.We present a Cherry implementation with early recycling at three different points of the execution engine: the load queue, the store queue, and the register file. We report average speedups of 1.06 and 1.26 in SPECint and SPECfp applications, respectively, relative to an aggressive conventional architecture. We also describe how Cherry and speculative multithreading can be combined and complement each other.

show abstract

Positional adaptation of processors: application to energy reduction

Huang¹,

Renau²,

Torrellas³

120

View full text Add to dashboard Cite

Although adaptive processors can exploit application variability to improve performance or save energy, effectively managing their adaptivity is challenging. To address this problem, we introduce a new approach to adaptivity: the Positional approach. In this approach, both the testing of configurations and the application of the chosen configurations are associated with particular code sections. This is in contrast to the currently-used Temporal approach to adaptation, where both the testing and application of configurations are tied to successive intervals in time.We propose to use subroutines as the granularity of code sections in positional adaptation. Moreover, we design three implementations of subroutine-based positional adaptation that target energy reduction in three different workload environments: embedded or specialized server, general purpose, and highly dynamic. All three implementations of positional adaptation are much more effective than temporal schemes. On average, they boost the energy savings of applications by 50% and 84% over temporal schemes in two experiments.

show abstract

Thread-Level Speculation on a CMP can be energy efficient

et al. 2005

View full text Add to dashboard Cite

Chip Multiprocessors (CMP) with Thread-Level Speculation (TLS) have become the subject of intense research. However, TLS is suspected of being too energy inefficient to compete against conventional processors. In this paper, we refute this claim. To do so, we first identify the main sources of dynamic energy consumption in TLS. Then, we present simple energy-saving optimizations that cut the energy cost of TLS by over 60% on average with minimal performance impact. The resulting TLS CMP, populated with four 3-issue cores, speeds-up full SPECint 2000 codes by 1.27 on average, while keeping the fraction of the chip's energy consumption due to TLS to only 20%. Compared to a 6-issue superscalar at the same frequency, the TLS CMP is on average faster, while consuming only 85% of its total on-chip power.

show abstract

Tasking with out-of-order spawn in TLS chip multiprocessors

Renau

Tuck

Liu

et al. 2005

View full text Add to dashboard Cite

Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since tasks are spawned out-of-order and unpredictably, maintaining key TLS basics such as task ordering and efficient resource allocation is challenging.While the concept of out-of-order spawning is not new, this paper is the first to propose a set of microarchitectural mechanisms that, altogether, fundamentally enable fast TLS with out-of-order spawn in a CMP. Moreover, we develop a fully-automated TLS compiler for aggressive out-of-order spawn. With our mechanisms, a TLS CMP with four 4-issue cores achieves an average speedup of 1.30 for full SPECint 2000 applications; the corresponding speedup for in-orderonly spawn is 1.04. Overall, our mechanisms unlock the potential of TLS for the toughest applications.

show abstract

Power model validation through thermal measurements

Mesa-Martinez

Nayfach-Battilana

Renau

2007

View full text Add to dashboard Cite

Simulation environments are an indispensable tool in the design, prototyping, performance evaluation, and analysis of computer systems. Simulator must be able to faithfully reflect the behavior of the system being analyzed. To ensure the accuracy of the simulator, it must be verified and determined to closely match empirical data. Modern processors provide enough performance counters to validate the majority of the performance models; nevertheless, the information provided is not enough to validate power and thermal models.In order to address some of the difficulties associated with the validation of power and thermal models, this paper proposes an infrared measurement setup to capture run-time power consumption and thermal characteristics of modern chips. We use infrared cameras with high spatial resolution (10x10μm) and high frame rate (125fps) to capture thermal maps. To generate a detailed power breakdown (leakage and dynamic) for each processor floorplan unit, we employ genetic algorithms. The genetic algorithm finds a power equation for each floorplan block that produces the measured temperature for a given thermal package. The difference between the predicted power and the externally measured power consumption for an AMD Athlon analyzed in this paper has less than 1% discrepancy. As an example of applicability, we compare the obtained measurements with CACTI power models, and propose extensions to existing thermal models to increase accuracy.

show abstract

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs

Mesa-Martinez

Renau

2007

131

View full text Add to dashboard Cite

A framework for dynamic energy efficiency and temperature management

Huang

Renau

Yoo

et al. 2000

View full text Add to dashboard Cite

While technology is delivering increasingly sophisticated and powerful chip designs, it is also imposing alarmingly high energy requirements on the chips. One way to address this problem is to manage the energy dynamically. Unfortunately, current dynamic schemes for energy management are relatively limited. In addition, they manage energy either for energy efficiency or for temperature control, but not for both simultaneously.In this paper, we design and evaluate for the first time an energymanagement framework that tackles both energy efficiency and temperature control in a unified manner. We call this general approach Dynamic Energy Efficiency and Temperature Management (DEETM). Our framework combines many energy-management techniques and can activate them individually or in groups in a finegrained manner according to a given policy. The goal of the framework is two-fold: maximize energy savings without extending application execution time beyond a given tolerable limit, and guarantee that the temperature remains below a given limit while minimizing any resulting slowdown. The framework successfully meets these goals. For example, it delivers a 40% energy reduction with only a 10% application slowdown.

show abstract

ESESC: A fast multicore simulator using Time-Based Sampling

Ardestani

Renau

2013

102

View full text Add to dashboard Cite

Architects rely on simulation in their exploration of the design space. However, slow simulation speed caps their productivity and limits the depth of their exploration. Sampling has been a commonly used remedy. While sampling is shown to be an effective technique for single core processors, its application has been limited to simulation of multiprogram, throughput applications only. This work presents Time-Based Sampling (TBS), a framework that is the first to enable sampling in simulation of multicore processors with virtually no limitation in terms of application type (multiprogrammed or multithreaded), number of cores, homogeneity or heterogeneity of the simulated configuration (4.99% error averaged across all the evaluated configurations). TBS also is the first to enable integrated power and temperature evaluation in statistically sampled simulation of multicore systems (with 5.5% and 2.4% error on average, respectively). We implement an architectural simulator based on TBS, called ESESC, that provides a holistic set of tools for a fair evaluation of different architectures.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jose Renau

Cherry: Checkpointed early resource recycling in out-of-order microprocessors

Positional adaptation of processors: application to energy reduction

Thread-Level Speculation on a CMP can be energy efficient

Tasking with out-of-order spawn in TLS chip multiprocessors

Power model validation through thermal measurements

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs

A framework for dynamic energy efficiency and temperature management

ESESC: A fast multicore simulator using Time-Based Sampling

Contact Info

Product

Resources

About