Plasmon energy and lattice energy of binary tetrahedral semiconductors and I–VII ionic compounds

Alias analysis is a critical component in many compiler optimizations. A promising approach to reduce the complexity of alias analysis is to use speculation. The approach consists of performing optimizations assuming the alias relationships that are true most of the time, and repairing the code when such relationships are found not to hold through runtime checks.This paper proposes a general alias speculation scheme that leverages upcoming hardware support for transactions with the help of some ISA extensions. The ability of transactions to checkpoint and roll back frees the compiler to pursue aggressive optimizations without having to worry about recovery code. Also, exposing the memory conflict detection hardware in transactions to software allows runtime checking of aliases with little or no overhead. We test the potential of the novel alias speculation approach with Loop Invariant Code Motion (LICM), Global Value Numbering (GVN), and Partial Redundancy Elimination (PRE) optimization passes. On average, they are shown to reduce program execution time by 9% in SPEC FP2006 applications and 3% in SPEC INT2006 applications over the alias analysis of a state-of-the-art compiler.

show abstract

BulkCompactor: Optimized deterministic execution via Conflict-Aware commit of atomic blocks

Duan

Zhou

Ahn

et al. 2012

View full text Add to dashboard Cite

Recent proposals for determinism-enforcement architectures are able to honor the dependences between threads through a commit step that often becomes a performance bottleneck. As they commit code blocks (or chunks) in a round-robin order, if one chunk gets squashed due to a conflict, its successors also observe a stall. We call this effect transitive squash delay.This paper proposes a novel, high-performance approach to deterministic execution based on Conflict-Aware commit. Rather than committing chunks in strict round-robin order, the idea is to skip those chunks with conflicts and deterministically execute them slightly later. The scheme, called BulkCompactor, largely eliminates transitive squash delay, "compacts" the chunk commits, and substantially speeds-up execution. With BulkCompactor, the squash overhead is O(N ) rather than O(N 2 ) as in round-robin. We describe BulkCompactor designs for machines with centralized or distributed commit. Finally, a simulation-based evaluation shows that BulkCompactor delivers performance comparable to nondeterministic systems. For example, for 32 processors, BulkCompactor incurs an average execution overhead of 22% over a nondeterministic system. The round-robin scheme's average overhead is 133%.

show abstract

DeAliaser

Ahn

Duan

Torrellas

2013

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

Alias analysis is a critical component in many compiler optimizations. A promising approach to reduce the complexity of alias analysis is to use speculation. The approach consists of performing optimizations assuming the alias relationships that are true most of the time, and repairing the code when such relationships are found not to hold through runtime checks. This paper proposes a general alias speculation scheme that leverages upcoming hardware support for transactions with the help of some ISA extensions. The ability of transactions to checkpoint and roll back frees the compiler to pursue aggressive optimizations without having to worry about recovery code. Also, exposing the memory conflict detection hardware in transactions to software allows runtime checking of aliases with little or no overhead. We test the potential of the novel alias speculation approach with Loop Invariant Code Motion (LICM), Global Value Numbering (GVN), and Partial Redundancy Elimination (PRE) optimization passes. On average, they are shown to reduce program execution time by 9% in SPEC FP2006 applications and 3% in SPEC INT2006 applications over the alias analysis of a state-of-the-art compiler.

show abstract

Asymmetric Memory Fences

Duan

Honarmand

Torrellas

2015

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

There have been several recent efforts to improve the performance of fences. The most aggressive designs allow post-fence accesses to retire and complete before the fence completes. Unfortunately, such designs present implementation difficulties due to their reliance on global state and structures. This paper's goal is to optimize both the performance and the implementability of fences. We start-off with a design like the most aggressive ones but without the global state. We call it Weak Fence or wF. Since the concurrent execution of multiple wFs can deadlock, we combine wFs with a conventional fence (i.e., Strong Fence or sF) for the less performance-critical thread(s). We call the result an Asymmetric fence group. We also propose a taxonomy of Asymmetric fence groups under TSO. Compared to past aggressive fences, Asymmetric fence groups both are substantially easier to implement and have higher average performance. The two main designs presented (WS+ and W+) speed-up workloads under TSO by an average of 13% and 21%, respectively, over conventional fences.

show abstract

WeeFence

Duan

Muzahid

Torrellas

2013

View full text Add to dashboard Cite

Although fences are designed for low-overhead concurrency coordination, they can be expensive in current machines. If fences were largely free, faster fine-grained concurrent algorithms could be devised, and compilers could guarantee Sequential Consistency (SC) at little cost.In this paper, we present WeeFence (or WFence for short), a fence that is very cheap because it allows post-fence accesses to skip it. Such accesses can typically complete and retire before the pre-fence writes have drained from the write buffer. Only when an incorrect reordering of accesses is about to happen, does the hardware stall to prevent it. In the paper, we present the WFence design for TSO, and compare it to a conventional fence with speculation for 8-processor multicore simulations. We run parallel kernels that contain explicit fences and parallel applications that do not. For the kernels, WFence eliminates nearly all of the fence stall, reducing the kernels' execution time by an average of 11%. For the applications, a conservative compiler algorithm places fences in the code to guarantee SC. In this case, on average, WFences reduce the resulting fence overhead from 38% of the applications' execution time to 2% (in a centralized WFence design), or from 36% to 5% (in a distributed WFence design).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuelu Duan

Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs

An adaptive task creation strategy for work-stealing scheduling

SCsafe: Logging sequential consistency violations continuously and precisely

DeAliaser

BulkCompactor: Optimized deterministic execution via Conflict-Aware commit of atomic blocks

DeAliaser

Asymmetric Memory Fences

WeeFence

Contact Info

Product

Resources

About