Michael Van Biesbrouck scite author profile

Modern architecture research relies heavily on detailed pipeline simulation. Simulating KeywordsSimPoint, Clustering, Simulation, Fast-forwarding, Sampling SIMPOINTUnderstanding the cycle level behavior of a processor running an application is crucial to modern computer architecture research. To gain this understanding, detailed cycle level simulators are typically employed. Unfortunately, this level of detail comes at the cost of speed, and simulating the full execution of an industry standard benchmark on even the fastest simulator can take weeks to months to complete. This fact has not gone unnoticed in the academic community, and several researchers have started to develop techniques aimed at reducing simulation time.For architecture research it is often necessary to take one instance of a program with a given input, and simulate its performance over many different architecture configurations. The same program binary with the input may be run hundreds or thousands of times to examine how, for example, the effectiveness of a given architecture changes with its size. Our goal in creating SimPoint [1, 2] is to (1) significantly reduce simulation time, (2) provide an accurate characterization of the full program, and (3) to perform the analysis to accomplish the first two goals in a matter of minutes. These goals are met by simulating only a handful of intelligently chosen sections of the full program. When these sections (simulation points) are carefully chosen, it provides an accurate picture of the complete execution of the program and results in highly accurate estimations of performanceThe key to our approach is that for a given program and input, the simulation points only need to be chosen once. This is because we select them using a method that is independent of any particular architecture configuration. The simulation points are selected using a metric that is only based on the code that is executed over time for a program/input pair. Once the sim- ulation points are chosen they can be used for the hundreds or thousands of independent simulations that may be needed, significantly reducing simulation time.To pick the simulation points in [1, 2], we introduce the concept of profiling Basic Block Vectors (BBV) as a way of capturing the important behaviors of the program over time [1]. A Basic Block Vector captures the relative frequency of the code blocks executed during a given portion of execution. After profiling a program with a particular input, we compare the basic block vectors to see how similar they are to one another. Intervals of execution that execute the same code blocks with the same frequency are grouped together into clusters using clustering algorithms from machine learning. We found that sections of execution (represented by basic block vectors) that are grouped into the same cluster have very similar behavior across all the architecture metrics we have examined. Once we break the program into clusters, we pick a single point from each cluster (appropriately weighted) to serve ...

show abstract

Unbounded page-based transactional memory

Chuang

¹

,

Narayanasamy

²

,

Venkatesh

³

et al. 2006

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

Exploiting thread level parallelism is paramount in the multicore era. Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model. Virtualized transactions (unbounded in space and time) are desirable, as they can increase the scope of transactions' use, and thereby further simplify a programmer's job. However, hardware support is essential to support efficient execution of unbounded transactions. In this paper, we introduce Page-based Transactional Memory to support unbounded transactions. We combine transaction bookkeeping with the virtual memory system to support fast transaction conflict detection, commit, abort, and to maintain transactions' speculative data.

show abstract

Using SimPoint for accurate and efficient simulation

Perelman¹,

Hamerly²,

Biesbrouck³

et al. 2003

View full text Add to dashboard Cite

Modern architecture research relies heavily on detailed pipeline simulation. Simulating KeywordsSimPoint, Clustering, Simulation, Fast-forwarding, Sampling SIMPOINTUnderstanding the cycle level behavior of a processor running an application is crucial to modern computer architecture research. To gain this understanding, detailed cycle level simulators are typically employed. Unfortunately, this level of detail comes at the cost of speed, and simulating the full execution of an industry standard benchmark on even the fastest simulator can take weeks to months to complete. This fact has not gone unnoticed in the academic community, and several researchers have started to develop techniques aimed at reducing simulation time.For architecture research it is often necessary to take one instance of a program with a given input, and simulate its performance over many different architecture configurations. The same program binary with the input may be run hundreds or thousands of times to examine how, for example, the effectiveness of a given architecture changes with its size. Our goal in creating SimPoint [1, 2] is to (1) significantly reduce simulation time, (2) provide an accurate characterization of the full program, and (3) to perform the analysis to accomplish the first two goals in a matter of minutes. These goals are met by simulating only a handful of intelligently chosen sections of the full program. When these sections (simulation points) are carefully chosen, it provides an accurate picture of the complete execution of the program and results in highly accurate estimations of performanceThe key to our approach is that for a given program and input, the simulation points only need to be chosen once. This is because we select them using a method that is independent of any particular architecture configuration. The simulation points are selected using a metric that is only based on the code that is executed over time for a program/input pair. Once the sim- ulation points are chosen they can be used for the hundreds or thousands of independent simulations that may be needed, significantly reducing simulation time.To pick the simulation points in [1, 2], we introduce the concept of profiling Basic Block Vectors (BBV) as a way of capturing the important behaviors of the program over time [1]. A Basic Block Vector captures the relative frequency of the code blocks executed during a given portion of execution. After profiling a program with a particular input, we compare the basic block vectors to see how similar they are to one another. Intervals of execution that execute the same code blocks with the same frequency are grouped together into clusters using clustering algorithms from machine learning. We found that sections of execution (represented by basic block vectors) that are grouped into the same cluster have very similar behavior across all the architecture metrics we have examined. Once we break the program into clusters, we pick a single point from each cluster (appropriately weighted) to serve ...

show abstract

Unbounded page-based transactional memory

Chuang

¹

,

Narayanasamy

²

,

Venkatesh

³

et al. 2006

SIGOPS Oper. Syst. Rev.

View full text Add to dashboard Cite

Exploiting thread level parallelism is paramount in the multicore era. Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model. Virtualized transactions (unbounded in space and time) are desirable, as they can increase the scope of transactions' use, and thereby further simplify a programmer's job. However, hardware support is essential to support efficient execution of unbounded transactions. In this paper, we introduce Page-based Transactional Memory to support unbounded transactions. We combine transaction bookkeeping with the virtual memory system to support fast transaction conflict detection, commit, abort, and to maintain transactions' speculative data.

show abstract