Antonia Zhai scite author profile

While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how to easily create parallel software to effectively exploit all of this raw performance potential. One promising technique for overcoming this problem is Thread-Level Speculation (TLS), which enables the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent. In this paper, we propose and evaluate a design for supporting TLS that seamlessly scales to any machine size because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down). Our experimental results demonstrate that our scheme performs well on both single-chip multiprocessors and on larger-scale machines where communication latencies are twenty times larger.

show abstract

The STAMPede approach to thread-level speculation

Steffan

Colohan

Zhai

et al. 2005

ACM Trans. Comput. Syst.

138

102

View full text Add to dashboard Cite

Multithreaded processor architectures are becoming increasingly commonplace: many current and upcoming designs support chip multiprocessing, simultaneous multithreading, or both. While it is relatively straightforward to use these architectures to improve the throughput of a multithreaded or multiprogrammed workload, the real challenge is how to easily create parallel software to allow single programs to effectively exploit all of this raw performance potential. One promising technique for overcoming this problem is Thread-Level Speculation (TLS), which enables the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent. In this article, we propose and evaluate a design for supporting TLS that seamlessly scales both within a chip and beyond because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down). Our experimental results demonstrate that our scheme performs well on single-chip multiprocessors where the first level caches are either private or shared. For our private-cache design, the program performance of two of 13 general purpose applications studied improves by 86% and 56%, four others by more than 8%, and an average across all applications of 16%-confirming that TLS is a promising way to exploit the naturally-multithreaded processing resources of future computer systems.

show abstract

Improving value communication for thread-level speculation

Steffan

Colohan

Zhai

et al.

View full text Add to dashboard Cite

show abstract

Compiler optimization of scalar value communication between speculative threads

et al. 2002

View full text Add to dashboard Cite

do { While there have been many recent proposals for hardware that supworkl ( ) ; ports Thread-Level Speculation (TLS), there has been relatively litwalt (A) ; tie work on compiler optimizations to fully exploit this potential if (condition (A)) { A = A + *p;for parallelizing programs optimistically. In this paper, we focus } else { on one important limitation of program performance under TLS, a = 2; which is stalls due to forwarding scalar values between threads that work2 ( ) ; } would otherwise cause frequent data dependences. We present and A = A + 1 ; evaluate dataflow algorithms for three increasingly-aggressive insignal (A) ; struction scheduling techniques that reduce the critical forwarding work3 ( ) ; path introduced by the synchronization associated with this data } while (1) ; forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.

show abstract

Loop Selection for Thread-Level Speculation

Wang

Dai

Yellajyosula

et al. 2006

View full text Add to dashboard Cite

Abstract. Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. However, the high cost associated with unbalanced load, failed speculation, and inter-thread value communication makes it difficult to obtain the desired performance unless the speculative threads are carefully chosen.In this paper, we focus on extracting parallel threads from loops in generalpurpose applications because loops, with their regular structures and significant coverage on execution time, are ideal candidates for extracting parallel threads. General-purpose applications, however, usually contain a large number of nested loops with unpredictable parallel performance and dynamic behavior, thus making it difficult to decide which set of loops should be parallelized to improve overall program performance. Our proposed loop selection algorithm addresses all these difficulties. We have found that (i) with the aid of profiling information, compiler analyses can achieve a reasonably accurate estimation of the performance of parallel execution, and that (ii) different invocations of a loop may behave differently, and exploiting this dynamic behavior can further improve performance. With a judicious choice of loops, we can improve the overall program performance of SPEC2000 integer benchmarks by as much as 20%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Antonia Zhai

A scalable approach to thread-level speculation

The STAMPede approach to thread-level speculation

Improving value communication for thread-level speculation

Compiler optimization of scalar value communication between speculative threads

Loop Selection for Thread-Level Speculation

Contact Info

Product

Resources

About