The chase is a family of algorithms used in a number of data management tasks, such as data exchange, answering queries under dependencies, query reformulation with constraints, and data cleaning. It is well established as a theoretical tool for understanding these tasks, and in addition a number of prototype systems have been developed. While individual chase-based systems and particular optimizations of the chase have been experimentally evaluated in the past, we provide the first comprehensive and publicly available benchmark-test infrastructure and a set of test scenarios-for evaluating chase implementations across a wide range of assumptions about the dependencies and the data. We used our benchmark to compare chase-based systems on data exchange and query answering tasks with one another, as well as with systems that can solve similar tasks developed in closely related communities. Our evaluation provided us with a number of new insights concerning the factors that impact the performance of chase implementations.
Nested relations, built up from atomic types via tupling and set types, form a rich data model. Over the last decades the nested relational calculus, NRC, has emerged as a standard language for defining transformations on nested collections. NRC is a strongly-typed functional language which allows building up transformations using products and projections, a singleton-former, and a map operation that lifts transformations on tuples to transformations on sets. In this work we show that NRC has a strong connection with first-order logic: it contains exactly the transformations that are implicitly definable by a theory Σ in first-order logic with quantification suited for nested collections.We also prove an effective variant of our result, providing a procedure that synthesizes an NRC expression in polynomial time from a proof witnessing that Σ provides an implicit definition for one subset of its free variables in terms of another subset of the variables. This synthesis result works off of proofs within an intuitionistic calculus that captures a natural style of reasoning about implicit definability in the context of nested collections.
ACM Subject Classification 500Acknowledgements We are very grateful to Szymon Toruńczyk, who outlined a route to show that implicitly definable transformations over nested relations can be defined via interpretations, in the process conjecturing a more general result concerning definability in multi-sorted logic. Szymon also helped in simplifying the mapping of NRC to interpretations, a basic component in one of our characterizations. We also thank Ehud Hrushovksi, who sketched a proof of the Beth-style result for multi-sorted logic that serves as another component. His proof proceeds along very similar lines to the one we present in the appendix of this submission, but making use of a prior Beth-style result in classical model theory [32].
We consider a setting where a user wants to pose a query against a dataset where background knowledge, expressed as logical sentences, is available, but only a subset of the information can be used to answer the query. We thus want to reformulate the user query against the subvocabulary, arriving at a query equivalent to the user's query assuming the background theory, but using only the restricted vocabulary. We consider two variations of the problem, one where we want any such reformulation and another where we restrict the size. We present a classification of the complexity of the problem, then provide algorithms for solving the problems in practice and evaluate their performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.