The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a sequential loop can be executed speculatively in parallel. Runtime parallelism is obtained when the speculation is correct. To take full advantage of this new execution model, a program needs to be programmed or compiled in such a way that it exhibits high degree of speculative thread-level parallelism. We propose a comprehensive cost-driven compilation framework to perform speculative parallelization. Based on a misspeculation cost model, the compiler aggressively transforms loops into optimal speculative parallel loops and selects only those loops whose speculative parallel execution is likely to improve program performance. The framework also supports and uses enabling techniques such as loop unrolling, software value prediction and dependence profiling to expose more speculative parallelism. The proposed framework was implemented on the ORC compiler. Our evaluation showed that the cost-driven speculative parallelization was effective. Our compiler was able to generate good speculative parallel loops in ten Spec2000Int benchmarks, which currently achieve an average 8% speedup. We anticipate an average 15.6% speedup when all enabling techniques are in place.
Thread-level speculation is a technique that brings thread-level parallelism beyond the data-flow limit by executing a piece of code ahead of time speculatively before all its input data are ready. This technique appears particularly appealing for speeding up hardto-parallelize applications. Although various threadlevel speculation architectures and compilation techniques have been proposed by the research community, scalar applications remain difficult to be parallelized. It has not yet shown how well these applications can actually be benefited from threadlevel speculation and if the performance gain is significant enough to justify the required hardware support. In an attempt to understand and realize the potential gain with thread-level speculation especially for scalar applications, we proposed an SPT (Speculative Parallel Threading) architecture and developed an SPT compiler to generate optimal speculatively parallelized code. Our evaluation showed that with our SPT approach 10 SPECint2000 programs can achieve an average of 15.6% speedup on a two-core SPT processor by exploiting only loop parallelism. This paper describes the SPT architecture and the SPT compiler which performs aggressive costdriven loop selection and transformation, and presents our performance evaluation results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.