Since 2005, significant progress has been made in the problem of Consistent Query Answering (CQA) with respect to primary keys. In this problem, the input is a database instance that may violate one or more primary key constraints. A repair is defined as a maximal subinstance that satisfies all primary keys. Given a Boolean query q, the question then is whether q holds true in every repair. So far, theoretical research in this field has not addressed the combination of primary key and foreign key constraints, despite the importance of referential integrity in database systems. This paper addresses the problem of CQA with respect to both primary keys and foreign keys. In this setting, it is natural to adopt the notion of symmetric-difference repairs, because foreign keys can be repaired by inserting new tuples. We consider the case where foreign keys are unary, and queries are conjunctive queries without self-joins. In this setting, we characterize the boundary between those CQA problems that admit a consistent first-order rewriting, and those that do not.
Keywords consistent query answering • primary key • foreign key • conjunctive query 1 IntroductionConsistent query answering (CQA) was introduced in [1] as a principled semantics for answering queries on inconsistent databases. A symmetric-difference repair (or ⊕-repair) of a database db is defined as a consistent database r that ⊆-minimizes the symmetric difference with db. Informally, a ⊕-repair r becomes inconsistent as soon as we insert into it more tuples of db, or delete from it tuples not in db. Then, given a query q( x), an answer a is called consistent if q( a) holds true in every repair. The problem is often studied for Boolean queries q, where the question is to determine whether q holds true on every repair of a given input database.CQA has been studied in depth in case that the only constraints are primary keys, one per relation. In [2], this problem was coined as CERTAINTY(q), in which notation it is understood that every relation name in q has a predefined primary key. More than a decade of research has eventually resulted in the following complexity classification [3]: for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either in FO, L-complete, or coNP-complete. Now that this classification has been settled, it is natural to ask what happens if we add foreign key constraints. Indeed, every relational database textbook is likely to introduce very soon the notion of referential integrity, i.e., foreign keys referencing primary keys. In view thereof, one may even wonder why referential integrity in CQA has so far received little theoretical research attention. One plausible explanation is that ⊕-repairs with respect to primary keys are easy to characterize: every repair has to delete, in every block, all tuples but one, where a block is a maximal set of tuples of the same relation that agree on their primary key. In contrast, ⊕-repairs with respect to foreign keys can introduce new tuples, as illustrated next. It will ...