Relational learning and Inductive Logic Programming (ILP) commonly use as covering test the θ-subsumption test defined by Plotkin. Based on a reformulation of θ-subsumption as a binary constraint satisfaction problem, this paper describes a novel θ-subsumption algorithm named Django, 1 which combines well-known CSP procedures and θ-subsumption-specific data structures. Django is validated using the stochastic complexity framework developed in CSPs, and imported in ILP by Giordana et Saitta. Principled and extensive experiments within this framework show that Django improves on earlier θ-subsumption algorithms by several orders of magnitude, and that different procedures are better at different regions of the stochastic complexity landscape. These experiments allow for building a control layer over Django, termed Meta-Django, which determines the best procedures to use depending on the order parameters of the θ-subsumption problem instance. The performance gains and good scalability of Django and Meta-Django are finally demonstrated on a real-world ILP task (emulating the search for frequent clauses in the mutagenesis domain) though the smaller size of the problems results in smaller gain factors (ranging from 2.5 to 30).
The covering test intensively used in Inductive Logic Programming, i.e. θ-subsumption, is formally equivalent to a Constraint Satisfaction problem (CSP). This paper presents a general reformulation of θ-subsumption into a binary CSP, and a new θ-subsumption algorithm, termed Django, which combines some main trend CSP heuristics and other heuristics specifically designed for θ-subsumption. Django is evaluated after the CSP standards, shifting from a worst-case complexity perspective to a statistical framework, centered on the notion of Phase Transition (PT). In the PT region lie the hardest on average CSP instances; and this region has been shown of utmost relevance to ILP [4]. Experiments on artificial θ-subsumption problems designed to illustrate the phase transition phenomenon, show that Django is faster by several orders of magnitude than previous θ-subsumption algorithms, within and outside the PT region.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.