In this work we address the problem of dealing with data inconsistencies while integrating data sets derived from multiple autonomous relational databases. The fundamental assumption in the classical relational model is that data is consistent and hence no support is provided for dealing with inconsistent data. Due to this limitation of the classical relational model, the semantics for detecting, representing, and manipulating inconsistent data have to be explicitly encoded in the applications by the application developer.In this paper, we propose the flexible relational model, which extends the classical relational model by providing support for inconsistent data. We present a flexible relation algebra, which provides semantics for database operations in the presence of potentially inconsistent data. Finally, we discuss issues raised for query optimization when the data may be inconsistent.
. IntroductionAdvances in computer networking technology and the availability of economical computing hardware have led to a proliferation of autonomous databases connected by high speed communication networks. As a result of this greatly increased access to remote databases, a growing number of database applications need to jointly manipulate data located in multidatabases [Litwin89,Litwin90,Breitbart90,Sheth90,Bright92,Scheuermann94]. Since the component databases of a particular multidatabase are most likely autonomous, they tend to be heterogeneous with respect to each other. Further, the distribution of data among such databases is likely to be arbitrary, often redundant, and possibly inconsistent. Hence, the development and maintenance of applications that manipulate data from multiple databases is generally expensive and difficult. These applications have to explicitly resolve any heterogeneities, especially inconsistencies, among the data sets derived from these databases. This paper focuses on the problems of manipulating data from multiple autonomous databases that may be mutually inconsistent. For the purposes of this work it is assumed that all other types of heterogeneities such as hardware, OS, network, or SQL language variations have been resolved via a homogenizing veneer on each individual database and also that each database presents a relational interface.
. Definition Of ConsistencyThe term inconsistency has been used in the literature for several specific cases. The detection of such inconsistencies is, by itself, a difficult problem. A probabilistic reasoning approach for detecting such inconsistencies using data associated with non-key attributes is presented in [Chatterjee91].In this paper we consider the first type of inconsistency, i.e., where tuples with matching values for primary key attributes conflict in their non-key attribute values. This notion of conflicting tuples is formalized in Definition 1.Definition 1 Two tuples tf and tg associated with a relational schema (K,Z), where K is the entity identifying attribute set and Z is the non entity-identifying attribute set, are non-conflic...