Since counterexamples generated by model checking tools are only symptoms of faults in the model, a significant amount of manual work is required in order to locate the fault that is the root cause for the presence of counterexamples in the model. In this paper, we propose an automated method for explaining counterexamples that are symptoms of the occurrence of deadlocks in concurrent systems. Our method is based on an analysis of a set of counterexamples that can be generated by a model checking tool such as SPIN. By comparing the set of counterexamples with the set of correct traces that never deadlock, a number of sequences of actions are extracted that aid the model designer in locating the cause of the occurrence of a deadlock. We first argue that the obvious approach to extract such sequences which is by sequential pattern mining and by contrasting patterns that are typical •for the deadlocking counterexample traces but not typical for non-deadlocking traces, fails due to the inherent complexity of the problem. We then propose to extract substrings of specific length that only occur in the set of counterexamples for explaining the occurrence of deadlocks. We use a number of case studies to show the effectiveness of our approach and to compare it with an alternative approach to the counterexample explanation problem.
Abstract. Concurrent systems are often modeled using an interleaving semantics. Since system designers tend to think sequentially, it is highly probable that they do not foresee some interleavings that their model encompasses. As a consequence, one of the main sources of failure in concurrent systems is unforeseen interleavings. In this paper, we devise an automated method for revealing unforeseen interleavings in the form of sequences of actions derived from counterexamples obtained by explicit state model checking. In order to extract such sequences we use a data mining technique called sequential pattern mining. Our method is based on contrasting the patterns of a set of counterexamples with the patterns of a set of correct traces that do not violate a desired property. We first argue that mining sequential patterns from the dataset of counterexamples fails due to the inherent complexity of the problem. We then propose a reduction technique designed to reduce the length of the execution traces in order to make the problem more tractable. We finally demonstrate the effectiveness of our approach by applying it to a number of sample case studies.
Several recently proposed randomized testing tools for concurrent and distributed systems come with theoretical guarantees on their success. The key to these guarantees is a notion of bug depth—the minimum length of a sequence of events sufficient to expose the bug—and a characterization of d -hitting families of schedules—a set of schedules guaranteed to cover every bug of given depth d . Previous results show that in certain cases the size of a d -hitting family can be significantly smaller than the total number of possible schedules. However, these results either assume shared-memory multithreading, or that the underlying partial ordering of events is known statically and has special structure. These assumptions are not met by distributed message-passing applications. In this paper, we present a randomized scheduling algorithm for testing distributed systems. In contrast to previous approaches, our algorithm works for arbitrary partially ordered sets of events revealed online as the program is being executed. We show that for partial orders of width at most w and size at most n (both statically unknown), our algorithm is guaranteed to sample from at most w 2 n d −1 schedules, for every fixed bug depth d . Thus, our algorithm discovers a bug of depth d with probability at least 1 / ( w 2 n d −1 ). As a special case, our algorithm recovers a previous randomized testing algorithm for multithreaded programs. Our algorithm is simple to implement, but the correctness arguments depend on difficult combinatorial results about online dimension and online chain partitioning of partially ordered sets. We have implemented our algorithm in a randomized testing tool for distributed message-passing programs. We show that our algorithm can find bugs in distributed systems such as Zookeeper and Cassandra, and empirically outperforms naive random exploration while providing theoretical guarantees.
We propose an automated mining-based method for explaining concurrency bugs. We use a data mining technique called sequential pattern mining to identify problematic sequences of concurrent read and write accesses to the shared memory of a multithreaded program. Our technique does not rely on any characteristics specific to one type of concurrency bug, thus providing a general framework for concurrency bug explanation. In our method, given a set of concurrent execution traces, we first mine sequences that frequently occur in failing traces and then rank them based on the number of their occurrences in passing traces. We consider the highly ranked sequences of events that occur frequently only in failing traces an explanation of the system failure, as they can reveal its causes in the execution traces. Since the scalability of sequential pattern mining is limited by the length of the traces, we present an abstraction technique which shortens the traces at the cost of introducing spurious explanations. Spurious as well as misleading explanations are then eliminated by a subsequent filtering step, helping the programmer to focus on likely causes of the failure. We validate our approach using a number of case studies, including synthetic as well as real-world bugs.
Error invariants are assertions that over-approximate the reachable program states at a given position in an error trace while only capturing states that will still lead to failure if execution of the trace is continued from that position. Such assertions reflect the effect of statements that are involved in the root cause of an error and its propagation, enabling slicing of statements that do not contribute to the error. Previous work on error invariants focused on sequential programs. We generalize error invariants to concurrent traces by augmenting them with additional information about hazards such as write-after-write events, which are often involved in race conditions and atomicity violations. By providing the option to include varying levels of details in error invariants-such as hazards and branching information-our approach allows the programmer to systematically analyze individual aspects of an error trace. We have implemented a hazard-sensitive slicing tool for concurrent traces based on error invariants and evaluated it on benchmarks covering a broad range of real-world concurrency bugs. Hazard-sensitive slicing significantly reduced the length of the considered traces and still maintained the root causes of the concurrency bugs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.