The trace cache is becoming an important building block of modern, wide-issue, processors. So far, trace cache related research has been focused on increasing fetch bandwidth. Trace-caches have been shown to effectively increase the number of "useful" instructions that can be fetched into the machine, thus enabling more instructions to be executed each cycle. However, trace cache has another important benefit that got less attention in recent research: especially for variable length ISA, such as Intel's IA-32 architecture (X86), reducing instruction decoding power is particularly attractive. Keeping the instruction traces in decoded format, implies the decoding power is only paid upon the build of a trace, thus reducing the overall power consumption of the system. This paper has three main contributions: it indicates that trace cache optimizations directed to reducing power consumption are do not necessarily coincide with optimizations directed to increasing fetch bandwidth; it extends our understanding on how well the trace cache utilizes its resources and introduces a new trace-cache organization based on filtering techniques. The knowledge obtained from the analysis of the traces' behavioral patterns motivates the use of filtering techniques. The new trace-cache organization increases the effective instruction-fetch bandwidth in conjunction with reducing the power consumption of the trace-cache system. We observe that (1) the majority of traces that are inserted into the trace-cache are rarely used again before being replaced; (2) the majority of the instructions delivered for execution originate from the fewer traces that are heavily and repeatedly used; and that (3) techniques that aim to improve instruction-fetch bandwidth may increase the number of traces built during program execution. Based on these observations, we propose splitting the trace cache into two components: the filter trace-cache (FTC) and the main trace-cache (MTC). Traces are first inserted into the FTC that is used to filter out the infrequently used traces; traces that prove "useful" are later moved into the MTC itself. The FTC/MTC organization exhibits an important benefit: it decreases the number of traces built, thus reducing power consumption while improving overall performance. For medium-size applications, the FTC/MTC pair reduces the number of trace builds by 16% in average. An extension of the filtering concept involves adding a second level (L2) trace-cache that stores less frequent traces that are replaced in the FTC or the MTC. The extra level of caching allows for order-of-magnitude reduction in the number of trace builds. Second level trace cache proves particularly useful for applications with large instruction footprints.
We present the PARROT concept that seeks to achieve higher performance with reduced energy consumption through gradual optimization of frequently executed code traces. The PARROT microarchitectural framework integrates trace caching, dynamic optimizations and pipeline decoupling. We employ a selective approach for applying complex mechanisms only upon the most frequently used traces to maximize the performance gain at any given power constraint, thus attaining finer control of tradeoffs between performance and power awareness.We show that the PARROT based microarchitecture can improve the performance of aggressively designed processors by providing the means to improve the utilization of their more elaborate resources. At the same time, rigorous selection of traces prior to storage and optimization provides the key to attenuating increases in the power budget.For resource-constrained designs, PARROT based architectures deliver better performance (up to an average 16% increase in IPC) at a comparable energy level, whereas the conventional path to a similar performance improvement consumes an average 70% more energy. Meanwhile, for those designs which can tolerate a higher power budget, PARROT gracefully scales up to use additional execution resources in a uniformly efficient manner. In particular, a PARROT-style doubly-wide machine delivers an average 45% IPC improvement while actually improving the cubic-MIPS-per-WATT power awareness metric by over 50%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.