We present the PARROT concept that seeks to achievehigher performance with reduced energy consumptionthrough gradual optimization of frequently executed codetraces. The PARROT microarchitectural framework integratestrace caching, dynamic optimizations and pipelinedecoupling. We employ a selective approach for applyingcomplex mechanisms only upon the most frequently usedtraces to maximize the performance gain at any givenpower constraint, thus attaining finer control of tradeoffsbetween performance and power awareness.We show that the PARROT based microarchitecture canimprove the performance of aggressively designed processorsby providing the means to improve the utilizationof their more elaborate resources. At the same time, rigorousselection of traces prior to storage and optimizationprovides the key to attenuating increases in thepower budget.For resource-constrained designs, PARROT based architecturesdeliver better performance (up to an average16% increase in IPC) at a comparable energy level,whereas the conventional path to a similar performanceimprovement consumes an average 70% more energy.Meanwhile, for those designs which can tolerate a higherpower budget, PARROT gracefully scales up to use additionalexecution resources in a uniformly efficient manner.In particular, a PARROT-style doubly-wide machinedelivers an average 45% IPC improvement while actuallyimproving the cubic-MIPS-per-WATT power awarenessmetric by over 50%.
We introduce the Micro-Operation Cache (Uop Cache -UC) designed to reduce processor's frontend power and energy consumption without performance degradation. The UC caches basic blocks of instructions -pre-decoded into micro-operations (uops). The UC fetches a single basic-block worth of uops per cycle. Fetching complete pre-decoded basic-blocks eliminates the need to repeatedly decode variable length instructions and simplifies the process of predicting, fetching, rotating and aligning fetched instructions. The UC design enables even a small structure to be quite effective. Results: a moderate-sized UC eliminates about 75% instruction decodes across a broad range of benchmarks and over 90% in multimedia applications and high-power tests. For existing Intel P6 family processors, the eliminated work may save about 10% of the full-chip power consumption with no performance degradation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.