SeznecAndré scite author profile

As modem microprocessors employ deeper pipelines and issue multiple instructions per cycle, they are becoming increasingly dependent on accurate branch prediction. Because hardware resources for branch-predictor tables are invariably limited, it is not possible to hold all relevant branch history for all active branches at the same time, especially for large workloads consisting of multiple processes and operating-system code. The problem that results, commonly referred to ar aliasing in the branch-predictor tables, is in many ways similar to the misses that occur injnite-sized hardware caches. In this paper we propose a new classt$cation for branch aliasing based on the three-Cs model for caches, and show that conflict aliasing is a significant source of mispredictions. Unfortunately, the obvious method for removing conjicts -adding tags and associativity to the predictor tables -is not a cost-effective solution.To address this problem, we propose the skewed branch predictor, a multi-bank, tag-less branch predictol; designed specijcally to reduce the impact of conjlict aliasing. Through both analytical and simulation models, we show that the skewed branch predictor removes a substantial portion of conflict a&sing by introducing redundancy to the branch-predictor tables. Although this redundancy increases capacity aliasing compared to a standard one-bank structure of comparable size, our simulations show that the reduction in conflict aliasing overcomes this effect to yield a gain in prediction accuracy. Alternatively, we show that a skewed organization can achieve the same prediction accuracy as a standard one-bank organization but with halfthe storage requirements.

show abstract

Effective ahead pipelining of instruction block address generation

SeznecAndré¹,

FrabouletAntony²

2003

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

On a N-way issue superscalar processor, the front end instruction fetch engine must deliver instructions to the execution core at a sustained rate higher than N instructions per cycle. This means that the instruction address generator/predictor (IAG) has to predict the instruction flow at an even higher rate while the prediction accuracy can not be sacrificed.Achieving high accuracy on this prediction becomes more and more critical since the overall pipeline is becoming deeper and deeper with each new generation of processors. Then very complex IAGs featuring different predictors for jumps, returns, conditional and unconditional branches and complex logic are used. Usually, the IAG uses information (branch histories, fetch addresses, . . . ) available at a cycle to predict the next fetch address(es). Unfortunately, a complex IAG cannot deliver a prediction within a short cycle. Therefore, processors rely on a hierarchy of IAGs with increasing accuracies but also increasing latencies: the accurate but slow IAG is used to correct the fast, but less accurate IAG. A significant part of the potential instruction bandwidth is often wasted in pipeline bubbles due to these corrections.As an alternative to the use of a hierarchy of IAGs, it is possible to initiate the instruction address generation several cycles ahead of its use. In this paper, we explore in details such an ahead pipelined IAG. The example illustrated in this paper shows that, even when the instruction address generation is (partially) initiated five cycles ahead of its use, it is possible to reach approximately the same prediction accuracy as the one of a conventional one-block ahead complex IAG. The solution presented in this paper allows to deliver a sustained address generation rate close to one instruction block per cycle with state-of-the art accuracy.

show abstract

Skewed associativity enhances performance predictability

BodinFrançois

SeznecAndré

1995

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

Performance tuning becomes harder as computer technology advances. One of the factors is the increasing complexity of memory hierarchies. Most modern machines now use at least one level of cache memory. To reduce execution stalls, cache misses must be very low. Software techniques used to improve locality have been developped for numerical codes, such as loop blocking and copying. Unfortunately, the behavior of direct mapped and set associative caches is still erratic when large numerical data is accessed. Execution time can vary drasticly for the same loop kernel depending on uncontrolled factors such as array leading size. The only software method available to improve execution time stability is the copying of frequently used data, which is costly in execution time. Users are not usually cache organisation experts. They are not aware of such phenomena, and have no control over it.In this paper, we show that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms. As a result, execution time is faster and much more predictable than with conventional caches. As a result of its better comportment, it is possible to use larger blocks sizes with blocked algorithms, which will furthermore reduces blocking overhead costs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

SeznecAndré

A case for two-way skewed-associative caches

Trading conflict and capacity aliasing in conditional branch predictors

Effective ahead pipelining of instruction block address generation

Skewed associativity enhances performance predictability

Contact Info

Product

Resources

About