Adi Yoaz scite author profile

Adi Yoaz

5Publications

297Citation Statements Received

69Citation Statements Given

How they've been cited

350

295

How they cite others

Affiliations

Intel (United States), Intel (Israel), Intel (United Kingdom)

Publications

Order By: Most citations

A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

Jourdan

Ronen²,

Bekerman³

et al.

View full text Add to dashboard Cite

Hardware renaming schemes provide multiple physical locations (register or memory) for each logical name. In current renaming schemes, a new physical location is allocated for each dispatched instruction regardless of its result value. However, these values exhibit a high level of temporal locality (result redundancy). This paper proposes: 1. Physical Register Reuse. To reuse a physical location whenever it is detected that an incoming result value matches a previous one. This is performed during register renaming and requires some VALUE-IDENTITY DETECTION hardware. By mapping several logical registers holding the same value to the same physical register, Physical Register Reuse gives the opportunities: • SHARING -exploit the high level of value-redundancy in the register file to either reduce the file size and complexity, or effectively enlarge the active instruction window. Our results suggest reduction factors of 2 to 4 in some cases. Performance is increased either by the enlarged instruction window or by the higher frequency enabled by a smaller register file requiring fewer ports. • RESULT REUSE & DEPENDENCY REDIRECTION -move the responsibility of generating results: (1) From the functional units to the register renamer, resulting in the possible elimination of processed instructions from the execution stream. (2) From one instruction to an earlier one in the instruction stream, possibly allowing dependent instructions to be scheduled earlier. This way, large performance speedups are achieved. 2. Unification. To combine the memory renamer with the register renamer in order to extend the above-stated sharing and result reuse & dependency redirection ideas to both registers and memory locations. This allows even greater hardware savings and performance improvements. This also simplifies the processing of store instructions. KeywordsRegister and memory renaming, physical register reuse, value temporal locality, result reuse, dependency redirection.

show abstract

Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake

et al. 2017

View full text Add to dashboard Cite

Speculation techniques for improving load related instruction scheduling

Yoaz

Erez

Ronen

et al.

View full text Add to dashboard Cite

State of the art microprocessors achieve high performance by executing multiple instructions per cycle. In an out-oforder engine, the instruction scheduler is responsible for dispatching instructions to execution units based on dependencies, latencies, and resource availability. Most existing instruction schedulers are doing a less than optimal job of scheduling memory accesses and instructions dependent on them, for the following reasons:• Memory dependencies cannot be resolved prior to execution, so loads are not advanced ahead of preceding stores. • The dynamic latencies of load instructions are unknown, so scheduling dependent instructions is based on either optimistic load-use delay (may cause re-scheduling and re-execution) or pessimistic delay (creating unnecessary delays). • Memory pipelines are more expensive than other execution units, and as such, are a scarce resource. Currently, an increase in the memory execution bandwidth is usually achieved through multi-banked caches where bank conflicts limit efficiency. In this paper we present three techniques to address these scheduler limitations. One is to improve the scheduling of load instructions by using a simple memory disambiguation mechanism. The second is to improve the scheduling of load dependent instructions by employing a Data Cache Hit-Miss Predictor to predict the dynamic load latencies. And the third is to improve the efficiency of load scheduling in a multi-banked cache through Cache-Bank Prediction.

show abstract

Correlated load-address predictors

Bekerman

Jourdan

Ronen

et al.

View full text Add to dashboard Cite

As microprocessors become faster, the relative performance cost of memory accesses increases. Bigger and faster caches significantly reduce the absolute load-to-use time delay. However, increase in processor operational frequencies impairs the relative load-to-use latency, measured in processor cycles (e.g. from two cycles on the Pentium® processor to three cycles or more in current designs). Load-address prediction techniques were introduced to partially cut the load-to-use latency. This paper focuses on advanced address-prediction schemes to further shorten program execution time.Existing address prediction schemes are capable of predicting simple address patterns, consisting mainly of constant addresses or stride-based addresses. This paper explores the characteristics of the remaining loads and suggests new enhanced techniques to improve prediction effectiveness:• Context-based prediction to tackle part of the remaining, difficult-to-predict, load instructions. • New prediction algorithms to take advantage of global correlation among different static loads. • New confidence mechanisms to increase the correct prediction rate and to eliminate costly mispredictions.• Mechanisms to prevent long or random address sequences from polluting the predictor data structures while providing some hysteresis behavior to the predictions. Such an enhanced address predictor accurately predicts 67% of all loads, while keeping the misprediction rate close to 1%. We further prove that the proposed predictor works reasonably well in a deep pipelined architecture where the predict-to-update delay may significantly impair both prediction rate and accuracy. KeywordsLoad-address prediction, context-based predictor, global correlation, predictor implementation, recursive data structures. I N T _ x l i J A V _ 3 d g J A V _ a u d J A V _ c f c J A V _ c w c J A V _ c w s

show abstract

Microarchitectural support for precomputation microthreads

Chappell

Tseng

Yoaz

et al.

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Adi Yoaz

A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake

Speculation techniques for improving load related instruction scheduling

Correlated load-address predictors

Microarchitectural support for precomputation microthreads

Contact Info

Product

Resources

About