Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the largest of scales (that is, over the program's complete execution). During one part of the execution, a program can be completely memory bound; in another, it can repeatedly stall on branch mispredicts. Average statistics gathered about a program might not accurately picture where the real problems lie. This realization has ramifications for many architecture and compiler techniques, from how to best schedule threads on a multithreaded machine, to feedback-directed optimizations, power management, and the simulation and test of architectures. Taking advantage of time-varying behavior requires a set of automated analytic tools and hardware techniques that can discover similarities and changes in program behavior on the largest of time scales.The challenge in building such tools is that during a program's lifetime it can execute billions or trillions of instructions. How can high-level behavior be extracted from this sea of instructions?The reality is this: The way a program's execution changes over time is not totally random; in fact, it often falls into repeating behaviors, called phases. Automatically identifying this phase behavior is the goal of our research and key to unlocking many new optimizations. We define a phase as a set of intervals (or slices in time) within a program's execution that have similar behavior, regardless of temporal adjacency. Recent research has shown that it is indeed possible to accurately identify and predict these phases in program behavior to capture meaningful phase behavior. 1-8The key observation for phase recognition is that any program metric is a direct function of the way a program traverses the code during execution. We can find this phase behavior and classify it by examining only the ratios in which different regions of code are being executed over time. We can simply and quickly collect this information using basic block vector profiles for off-line classification 4,6 or through dynamic branch profiling for online classification. 7 In addition, accurately capturing phase behavior through the computation of a single metric, independent of the underlying architectural details, means that it is pos-
Modern architecture research relies heavily on detailed pipeline simulation. Simulating KeywordsSimPoint, Clustering, Simulation, Fast-forwarding, Sampling SIMPOINTUnderstanding the cycle level behavior of a processor running an application is crucial to modern computer architecture research. To gain this understanding, detailed cycle level simulators are typically employed. Unfortunately, this level of detail comes at the cost of speed, and simulating the full execution of an industry standard benchmark on even the fastest simulator can take weeks to months to complete. This fact has not gone unnoticed in the academic community, and several researchers have started to develop techniques aimed at reducing simulation time.For architecture research it is often necessary to take one instance of a program with a given input, and simulate its performance over many different architecture configurations. The same program binary with the input may be run hundreds or thousands of times to examine how, for example, the effectiveness of a given architecture changes with its size. Our goal in creating SimPoint [1, 2] is to (1) significantly reduce simulation time, (2) provide an accurate characterization of the full program, and (3) to perform the analysis to accomplish the first two goals in a matter of minutes. These goals are met by simulating only a handful of intelligently chosen sections of the full program. When these sections (simulation points) are carefully chosen, it provides an accurate picture of the complete execution of the program and results in highly accurate estimations of performanceThe key to our approach is that for a given program and input, the simulation points only need to be chosen once. This is because we select them using a method that is independent of any particular architecture configuration. The simulation points are selected using a metric that is only based on the code that is executed over time for a program/input pair. Once the sim- ulation points are chosen they can be used for the hundreds or thousands of independent simulations that may be needed, significantly reducing simulation time.To pick the simulation points in [1, 2], we introduce the concept of profiling Basic Block Vectors (BBV) as a way of capturing the important behaviors of the program over time [1]. A Basic Block Vector captures the relative frequency of the code blocks executed during a given portion of execution. After profiling a program with a particular input, we compare the basic block vectors to see how similar they are to one another. Intervals of execution that execute the same code blocks with the same frequency are grouped together into clusters using clustering algorithms from machine learning. We found that sections of execution (represented by basic block vectors) that are grouped into the same cluster have very similar behavior across all the architecture metrics we have examined. Once we break the program into clusters, we pick a single point from each cluster (appropriately weighted) to serve ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.