We introduce the problem of learning SMT(LRA) constraints from data. SMT(LRA) extends propositional logic with (in)equalities between numerical variables. Many relevant formal verification problems can be cast as SMT(LRA) instances and SMT(LRA) has supported recent developments in optimization and counting for hybrid Boolean and numerical domains. We introduce SMT(LRA) learning, the task of learning SMT(LRA) formulas from examples of feasible and infeasible instances, and we contribute INCAL, an exact non-greedy algorithm for this setting. Our approach encodes the learning task itself as an SMT(LRA) satisfiability problem that can be solved directly by SMT solvers. INCAL is an incremental algorithm that achieves exact learning by looking only at a small subset of the data, leading to significant speed-ups. We empirically evaluate our approach on both synthetic instances and benchmark problems taken from the SMT-LIB benchmarks repository.
Spreadsheets, comma separated value files and other tabular data representations are in wide use today. However, writing, maintaining and identifying good formulas for tabular data and spreadsheets can be time-consuming and error-prone. We investigate the automatic learning of constraints (formulas and relations) in raw tabular data in an unsupervised way. We represent common spreadsheet formulas and relations through predicates and expressions whose arguments must satisfy the inherent properties of the constraint. The challenge is to automatically infer the set of constraints present in the data, without labeled examples or user feedback. We propose a two-stage generate and test method where the first stage uses constraint solving techniques to efficiently reduce the number of candidates, based on the predicate signatures. Our approach takes inspiration from inductive logic programming, constraint learning and constraint satisfaction. We show that we are able to accurately discover constraints in spreadsheets from various sources.
Weighted model integration (WMI) extends weighted model counting (WMC) to the integration of functions over mixed discrete-continuous probability spaces. It has shown tremendous promise for solving inference problems in graphical models and probabilistic programs. Yet, state-of-the-art tools for WMI are generally limited either by the range of amenable theories, or in terms of performance. To address both limitations, we propose the use of extended algebraic decision diagrams (XADDs) as a compilation language for WMI. Aside from tackling typical WMI problems, XADDs also enable partial WMI yielding parametrized solutions. To overcome the main roadblock of XADDs -- the computational cost of integration -- we formulate a novel and powerful exact symbolic dynamic programming (SDP) algorithm that seamlessly handles Boolean, integer-valued and real variables, and is able to effectively cache partial computations, unlike its predecessor. Our empirical results demonstrate that these contributions can lead to a significant computational reduction over existing probabilistic inference algorithms.
Combinatorial optimization problems are ubiquitous in artificial intelligence. Designing the underlying models, however, requires substantial expertise, which is a limiting factor in practice. The models typically consist of hard and soft constraints, or combine hard constraints with a preference function. We introduce a novel setting for learning combinatorial optimisation problems from contextual examples. These positive and negative examples show – in a particular context – whether the solutions are good enough or not. We develop our framework using the MAX-SAT formalism. We provide learnability results within the realizable and agnostic settings, as well as hassle, an implementation based on syntax-guided synthesis and showcase its promise on recovering synthetic and benchmark instances from examples.
Linear Programming lies at the core of mathematical modelling and optimization. Designing linear programs (LPs) is a difficult and expensive process, as it requires both mathematical programming and domain expertise, and it involves both designing an objective function and feasibility constraints. To support this design process, we propose INCALP, an algorithm for inducing linear programs from examples. Since the objective can often be learned with standard techniques (e.g. regression), INCALP learns the hard constraints only. It does so by encoding constraint learning as a mixed integer linear program. INCALP achieves significant efficiency gains by considering gradually larger subsets of examples, and terminating as soon as a suitable program is found. In addition, INCALP encourages both compactness and sparsity of the learned program. Our empirical analysis on synthetic data and textbook problems highlights the promise of the approach.
Spreadsheets are arguably the most accessible data-analysis tool and are used by millions of people. Despite the fact that they lie at the core of most business practices, working with spreadsheets can be error prone, usage of formulas requires training and, crucially, spreadsheet users do not have access to state-of-the-art analysis techniques offered by machine learning. To tackle these issues, we introduce the novel task of predictive spreadsheet autocompletion, where the goal is to automatically predict the missing entries in the spreadsheets. This task is highly non-trivial: cells can hold heterogeneous data types and there might be unobserved relationships between their values, such as constraints or probabilistic dependencies. Critically, the exact prediction task itself is not given. We consider a simplified, yet non-trivial, setting and propose a principled probabilistic model to solve it. Our approach combines black-box predictive models specialized for different predictive tasks (e.g., classification, regression) and constraints and formulas detected by a constraint learner, and produces a maximally likely prediction for all target cells that is consistent with the constraints. Overall, our approach brings us one step closer to allowing end users to leverage machine learning in their workflows without writing a single line of code.
A simple but non-trivial setting for automating data science is introduced. Given are a set of worksheets in a spreadsheet and the goal is to automatically complete some values. We also outline elements of the Synth framework that tackles this task: Synth-a-Sizer, an automated data wrangling system for automatically transforming the problem into attribute-value format; TacLe, an inductive constraint learning system for inducing formula's in spreadsheets; Mercs, a versatile predictive learning system; as well as the autocompletion component that integrates these systems.
Weighted Model Integration (WMI) is a popular technique for probabilistic inference that extends Weighted Model Counting (WMC) -- the standard inference technique for inference in discrete domains -- to domains with both discrete and continuous variables. However, existing WMI solvers each have different interfaces and use different formats for representing WMI problems. Therefore, we introduce pywmi (http://pywmi.org), an open source framework and toolbox for probabilistic inference using WMI, to address these shortcomings. Crucially, pywmi fixes a common internal format for WMI problems and introduces a common interface for WMI solvers. To assist users in modeling WMI problems, pywmi introduces modeling languages based on SMT-LIB.v2 or MiniZinc and parsers for both. To assist users in comparing WMI solvers, pywmi includes implementations of several state-of-the-art solvers, a fast approximate WMI solver, and a command-line interface to solve WMI problems. Finally, to assist developers in implementing new solvers, pywmi provides Python implementations of commonly used subroutines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.