This paper seeks to explore the predictability of SAT and SMT solvers in response to different kinds of changes to benchmarks. We consider both semantics-preserving and possibly semantics-modifying transformations, and provide preliminary data about solver predictability. We also propose carrying learned theory lemmas over from an original run to runs on similar benchmarks, and show the benefits of this idea as a heuristic for improving predictability of SMT solvers.
MotivationSAT and SMT (Satisfiability Modulo Theories) solvers have enjoyed tremendous performance improvements in the past ten years, increasing the automated-reasoning power available for applications like algorithmic verification, combinatorial design, planning, and others (e.g., [7,5,4]). Most work in the field has focused just on performance-oriented quality metrics for solvers. For example, the basic measure used in both the most recent (at the time of writing) SAT Competition and SMT Competition was simply the pair of the number of benchmarks solved and running time to solve them, compared in the natural lexicographic order (for the competitions mentioned, see, e.g., [1,3]). While the SAT competition has also experimented recently with more complex measures, they are also centered on performance.In this paper, we propose another property to consider when evaluating solvers, namely predictability. While users certainly require and benefit from improvements to raw performance, anecdotal evidence from end users of solvers suggests that in some cases unpredictability is at least as significant a concern. For example, Steve Miller, Principal Software Engineer in the Advanced Technology Center of Rockwell Collins, reported in his keynote presentation at Midwest Verification Day 2009 that unpredictability is a significant issue for his team in incorporating SAT/SMT solvers into their verification workflow. Unpredictability is a problem because a small change to a model can lead to an enormous change in the amount of time to solve the resulting verification condition. If the amount of time is enormously longer, the verification may become infeasible or unacceptably delayed. If it is enormously shorter, engineers may doubt the result, questioning if an error elsewhere in the workflow has led to such different system behavior. It may improve the usability of such solvers to sacrifice a modest amount of performance for improved predictability.In this paper we provide a preliminary study of predictability of SAT (Section 2) and SMT (Section 3) solvers under small mutations of standard benchmarks. We use the standard deviation of solver times on a collection of mutants as a measure of variability. In the case of SMT solvers, we also propose and study a technique for heuristically improving predictability, by carrying over a selection of learned theory lemmas from the original run of the solver to the runs on the mutants.