Verifying Safety of a Token Coherence Implementation by Parametric Compositional Refinement

Burckhardt, Sebastian; Alur, Rajeev; Martin, Milo M. K.

doi:10.1007/978-3-540-30579-8_9

Cited by 7 publications

(9 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This gives us the option to leverage existing results on token coherence [4,5,12,16,25,26,27] in interesting ways. While the token cache coherence abstraction has these many nice properties, there are some road-blocks to its direct implementation in hardware.…”

Section: Analogy With Token Coherencementioning

confidence: 99%

Scalable and reliable communication for hardware transactional memory

Pugsley

Awasthi

Madan

et al. 2008

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

In a hardware transactional memory system with lazy versioning and lazy conflict detection, the process of transaction commit can emerge as a bottleneck. This is especially true for a large-scale distributed memory system where multiple transactions may attempt to commit simultaneously and coordination is required before allowing commits to proceed in parallel. In this paper, we propose novel algorithms to implement commit that are more scalable in terms of delay and are free of deadlocks/livelocks. We show that these algorithms have similarities with the token cache coherence concept and leverage these similarities to extend the algorithms to handle message loss and starvation scenarios. The proposed algorithms improve upon the state-of-the-art by yielding up to a 7X reduction in commit delay and up to a 48X reduction in network messages for commit. These translate into overall performance improvements of up to 66% (for synthetic workloads with average transaction length of 200 cycles), 35% (for average transaction length of 1000 cycles), and 8% (for average transaction length of 4000 cycles). For a small group of multi-threaded programs with frequent transaction commits, improvements of up to 8% were observed for a 32-node simulation.

show abstract

Section: Analogy With Token Coherencementioning

confidence: 99%

Scalable and reliable communication for hardware transactional memory

Pugsley

Awasthi

Madan

et al. 2008

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

show abstract

“…These invariants have been formally proven to guarantee coherence in the fault-free scenario [3], and their simplicity makes them attractive for online checking. Each cache controller can locally check Invariants 2 and 3 by performing a redundant token check for every load and store.…”

Section: Coherence State Signature For Single Blockmentioning

confidence: 99%

Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures

Meixner

Sorin

2007

2007 IEEE 13th International Symposium on High Performance Computer Architecture

View full text Add to dashboard Cite

To provide high dependability in a multithreaded system despite hardware faults, the system must detect and correct errors in its shared memory system. Recent IntroductionTwo trends motivate increased interest in fault tolerance for multithreaded shared-memory computer architectures. First, multithreaded systems-including traditional multiprocessors, chip multiprocessors, and simultaneously multithreaded processors-have come to dominate the commodity computing market. Second, the industrial roadmap [7] and recent research [17] forecast increases in hardware error rates due to decreasing transistor sizes and voltages. For example, smaller devices are more susceptible to having their charges disrupted by alpha particles or cosmic radiation [21].Many researchers have developed effective fault tolerance measures for microprocessor cores, using techniques such as redundant multithreading [16,15,20] and DIVA [2]. However, to provide fault tolerance in a multithreaded system, the machine must also be able to detect and correct errors in its shared memory system, including errors in the cache coherence protocol. Whereas we can efficiently detect errors in data storage and transmission using error codes, it is far more difficult to ensure the correct execution of a complex, distributed coherence protocol with multiple interacting controllers. To provide comprehensive, end-to-end error detection, recent research has explored online (dynamic) checking of cache coherence. A coherence checker can either operate stand-alone [5,4] or as an integral part of an online memory consistency checker [12,13] that also detects errors in the interactions between the memory system and the processor cores. Once a coherence checker detects an error, the system can recover to a prefault state using one of several existing recovery mechanisms [19,14]. Coherence checking is a powerful error detection mechanism, but existing coherence checkers are costly to implement, introduce high interconnection network traffic overhead, and do not scale well to large systems. These costs and limitations preclude their use in low-cost commodity systems.In this work, we develop the Token Coherence Signature Checker (TCSC), which is a low-cost, scalable alternative to prior cache coherence checkers. It can be used by itself to detect memory system errors, or it can be used as part of a memory consistency checker [12,13]. With TCSC, every cache and memory controller maintains a signature that represents its recent history of cache coherence events. Periodically, these signatures are gathered at a verifier which determines if an error has occurred. The cost advantages of signature-based error detection come at the expense of an arbitrarily small (but non-zero) probability of undetected errors. This paper makes three main contributions:• TCSC is the first signature-based scheme that completely checks cache coherence and can detect all types of coherence errors with arbitrarily high probability. The use of signatures significantly lowers hardware costs...

show abstract

“…Methods like trace-driven or other simulationbased testing are insufficient for ensuring the correctness of such complex protocols since they often do not hit the difficult corner cases. Formal verification techniques have therefore been proposed and deployed to prove the correctness of these protocols [Abts et al 2000;Burckhardt et al 2005;Gjessing et al 1989;McMillan and J. 1991;Nanda and Bhuyan 1992;Park and Dill 1995;Pong et al 1998].…”

Section: Introductionmentioning

confidence: 99%

“…Although more advanced verification techniques exist, we chose Murϕ for its easy-to-use interface and robustness. Murϕ has also been the tool of choice for many hardware cache-related studies [Abts et al 2000;Burckhardt et al 2005;Park and Dill 1995;Pong et al 1998;Zhang et al 2010]. …”

Section: Introductionmentioning

confidence: 99%

Revisiting the Complexity of Hardware Cache Coherence and Some Implications

Komuravelli

Adve

Chou

2014

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Cache coherence is an integral part of shared-memory systems but is also widely considered to be one of the most complex parts of such systems. Much prior work has addressed this complexity and the verification techniques to prove the correctness of hardware coherence. Given the new multicore era with increasing number of cores, there is a renewed debate about whether the complexity of hardware coherence has been tamed or whether it should be abandoned in favor of software coherence. This article revisits the complexity of hardware cache coherence by verifying a publicly available, state-of-the-art implementation of the widely used MESI protocol, using the Murϕ model checking tool. To our surprise, we found six bugs in this protocol, most of which were hard to analyze and took several days to fix. To compare the complexity, we also verified the recently proposed DeNovo protocol, which exploits disciplined software programming models. We found three relatively easy to fix bugs in this less mature protocol. After fixing these bugs, our verification experiments showed that, compared to DeNovo, MESI had 15X more reachable states leading to a 20X increase in verification (model checking) time. Although we were eventually successful in verifying the protocols, the tool required making several simplifying assumptions (e.g., two cores, one address). Our results have several implications: (1) they indicate that hardware coherence protocols remain complex;(2) they reinforce the need for protocol designers to embrace formal verification tools to demonstrate correctness of new protocols and extensions; (3) they reinforce the need for formal verification tools that are both scalable and usable by non-expert; and (4) they show that a system based on hardware-software co-design can offer a simpler approach for cache coherence, thus reducing the overall verification effort and allowing verification of more detailed models and protocol extensions that are otherwise limited by computing resources.

show abstract

Verifying Safety of a Token Coherence Implementation by Parametric Compositional Refinement

Cited by 7 publications

References 29 publications

Scalable and reliable communication for hardware transactional memory

Scalable and reliable communication for hardware transactional memory

Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures

Revisiting the Complexity of Hardware Cache Coherence and Some Implications

Contact Info

Product

Resources

About