The problem of learning logical rules from examples arises in diverse fields, including program synthesis, logic programming, and machine learning. Existing approaches either involve solving computationally difficult combinatorial problems, or performing parameter estimation in complex statistical models. In this paper, we present DIFFLOG, a technique to extend the logic programming language Datalog to the continuous setting. By attaching real-valued weights to individual rules of a Datalog program, we naturally associate numerical values with individual conclusions of the program. Analogous to the strategy of numerical relaxation in optimization problems, we can now first determine the rule weights which cause the best agreement between the training labels and the induced values of output tuples, and subsequently recover the classical discrete-valued target program from the continuous optimum. We evaluate DIFFLOG on a suite of 34 benchmark problems from recent literature in knowledge discovery, formal verification, and database query-byexample, and demonstrate significant improvements in learning complex programs with recursive rules, invented predicates, and relations of arbitrary arity.
No abstract
We propose an interactive approach to resolve static analysis alarms. Our approach synergistically combines a sound but imprecise analysis with precise but unsound heuristics, through user interaction. In each iteration, it solves an optimization problem to find a set of questions for the user such that the expected payoff is maximized. We have implemented our approach in a tool, Ursa, that enables interactive alarm resolution for any analysis specified in the declarative logic programming language Datalog. We demonstrate the effectiveness of Ursa on a state-of-the-art static datarace analysis using a suite of 8 Java programs comprising 41-194 KLOC each. Ursa is able to eliminate 74% of the false alarms per benchmark with an average payoff of 12× per question. Moreover, Ursa prioritizes user effort effectively by posing questions that yield high payoffs earlier.
We propose a general end-to-end deep learning framework Code2Inv, which takes a verification task and a proof checker as input, and automatically learns a valid proof for the verification task by interacting with the given checker. Code2Inv is parameterized with an embedding module and a grammar: the former encodes the verification task into numeric vectors while the latter describes the format of solutions Code2Inv should produce. We demonstrate the flexibility of Code2Inv by means of two small-scale yet expressive instances: a loop invariant synthesizer for C programs, and a Constrained Horn Clause (CHC) solver.
Access control vulnerabilities within web applications pose serious security threats to the sensitive information stored at back-end databases. Existing approaches are limited from several aspects, including the coarse granularity at which the access control is modeled, the incapability of handling complex relationship between data entities and the requirement of source code and the specific application platform. In this paper, we present an automated black-box technique for identifying a broad range of access control vulnerabilities, which can be applied to applications that are developed using different languages and platforms. We model the access control policy based on a novel virtual SQL query concept, which captures both the database access operations (i.e., through SQL queries) and the post-processing filters within the web application. We leverage a crawler to automatically explore the application and collect execution traces. From the traces, we identify the set of database access operations that are allowed for each role (i.e., role-level policy inference) and extract the constraints over the operation parameters to characterize the relationship between the users and the accessed data (i.e., user-level policy inference). Based on the inferred policy, we construct test inputs to exploit the application for potential access control flaws. We implement a prototype system BATMAN and evaluate it over a set of PHP and JSP web applications. The experiment results demonstrate the effectiveness and accuracy of our approach.
No abstract
Programs often evolve by continuously integrating changes from multiple programmers. The effective adoption of program analysis tools in this continuous integration setting is hindered by the need to only report alarms relevant to a particular program change. We present a probabilistic framework, Drake, to apply program analyses to continuously evolving programs. Drake is applicable to a broad range of analyses that are based on deductive reasoning. The key insight underlying Drake is to compute a graph that concisely and precisely captures differences between the derivations of alarms produced by the given analysis on the program before and after the change. Performing Bayesian inference on the graph thereby enables to rank alarms by likelihood of relevance to the change. We evaluate Drake using SparrowÐa static analyzer that targets buffer-overrun, format-string, and integer-overflow errorsÐon a suite of ten widely-used C programs each comprising 13kś112k lines of code. Drake enables to discover all true bugs by inspecting only 30 alarms per benchmark on average, compared to 85 (3× more) alarms by the same ranking approach in batch mode, and 118 (4× more) alarms by a differential approach based on syntactic masking of alarms which also misses 4 of the 26 bugs overall. CCS Concepts • Software and its engineering → Automated static analysis; Software evolution; • Mathematics of computing → Bayesian networks.
Conventional program analyses have made great strides by leveraging logical reasoning. However, they cannot handle uncertain knowledge, and they lack the ability to learn and adapt. This in turn hinders the accuracy, scalability, and usability of program analysis tools in practice. We seek to address these limitations by proposing a methodology and framework for incorporating probabilistic reasoning directly into existing program analyses that are based on logical reasoning. We demonstrate that the combined approach can benefit a number of important applications of program analysis and thereby facilitate more widespread adoption of this technology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.