Computing Optimal Repairs for Functional Dependencies

Livshits, Ester; Kimelfeld, Benny; Roy, Sudeepa

doi:10.1145/3360904

Cited by 36 publications

(49 citation statements)

References 37 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The problem has been studied extensively in database theory for various classes of constraints Γ. It is NP-hard even when D consists of a single relation (as it does in our paper) and Γ consists of functional dependencies [27]. In our setting, Γ consists of conditional independence statements, and it remains NP-hard, as we show in Sec.…”

Section: Preliminariesmentioning

confidence: 68%

Capuchin: Causal Database Repair for Algorithmic Fairness

Salimi¹,

Rodriguez²,

Howe³

et al. 2019

Preprint

View full text Add to dashboard Cite

Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflect discrimination, suggesting a database repair problem. Existing treatments of fairness rely on statistical correlations that can be fooled by statistical anomalies, such as Simpson's paradox. Proposals for causality-based definitions of fairness can correctly model some of these situations, but they require specification of the underlying causal models. In this paper, we formalize the situation as a database repair problem, proving sufficient conditions for fair classifiers in terms of admissible variables as opposed to a complete causal model. We show that these conditions correctly capture subtle fairness violations. We then use these conditions as the basis for database repair algorithms that provide provable fairness guarantees about classifiers trained on their training labels. We evaluate our algorithms on real data, demonstrating improvement over the state of the art on multiple fairness metrics proposed in the literature while retaining high utility.

show abstract

Section: Preliminariesmentioning

confidence: 68%

Capuchin: Causal Database Repair for Algorithmic Fairness

Salimi¹,

Rodriguez²,

Howe³

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Regarding methods of data repair, previous works have considered two main approaches: (1) repairing attribute values in cells [6,11,29,33,44] and (2) tuple deletion [10,33,34]; our work focuses on the latter. A major advantage of our approach is the ability to perform cascade deletions over multiple relations in the database while following different well-defined semantics (and the admin may choose which one to follow based on the application scenario).…”

Section: Related Workmentioning

confidence: 99%

“…A major advantage of our approach is the ability to perform cascade deletions over multiple relations in the database while following different well-defined semantics (and the admin may choose which one to follow based on the application scenario). Similar to our independent semantics, a common objective for data repairs is to change the database in the minimal way that will make it consistent with the constraints [5,19,33]. In some scenarios a good repair can be obtained by changing values in the database and the metric of minimal changes may not work well [44].…”

Section: Related Workmentioning

confidence: 99%

“…Many of these have focused on the desideratum of minimum cardinality, i.e., repairing the database while making the minimum number of changes [5,19,34]. In particular, when the repair only involves tuple deletion [10,33,34], this desideratum takes center stage since a näive repair could simply delete the entire database in order to repair it. Such repairs are commonly used with classes of constraints such as Denial Constraints (DCs) [10,11], SQL deletion triggers [22], and causal dependencies [46].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On Multiple Semantics for Declarative Database Repairs

Gilad

Deutch

Roy

2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Self Cite

View full text Add to dashboard Cite

We study the problem of database repairs through a rulebased framework that we refer to as Delta Rules. Delta rules are highly expressive and allow specifying complex, crossrelations repair logic associated with Denial Constraints, Causal Rules, and allowing to capture Database Triggers of interest. We show that there are no one-size-fits-all semantics for repairs in this inclusive setting, and we consequently introduce multiple alternative semantics, presenting the case for using each of them. We then study the relationships between the semantics in terms of their output and the complexity of computation. Our results formally establish the tradeoff between the permissiveness of the semantics and its computational complexity. We demonstrate the usefulness of the framework in capturing multiple data repair scenarios for an academic search database and the TPC-H databases, showing how using different semantics affects the repair in terms of size and runtime, and examining the relationships between the repairs. We also compare our approach with SQL triggers and a state-of-the-art data repair system.

show abstract

“…To capture partial knowledge of the rules and the data, we clean data by providing probabilistic fixes. Then, using our solution once all rules are known and given the probabilistic suggestions, we can either use inference [23,29,36] when master data exist, or have humans fix the errors in the query results. Inference approaches over the probabilistic data are complementary and out of the scope of this work.…”

Section: From Offline To Online Data Cleaningmentioning

confidence: 99%

Cleaning Denial Constraint Violations through Relaxation

Giannakopoulou,

Karpathiotakis,

Ailamaki

2020

Preprint

View full text Add to dashboard Cite

Data cleaning is a time-consuming process that depends on the data analysis that users perform. Existing solutions treat data cleaning as a separate offline process that takes place before analysis begins. Applying data cleaning before analysis assumes a priori knowledge of the inconsistencies and the query workload, thereby requiring effort on understanding and cleaning the data that is unnecessary for the analysis.We propose an approach that performs probabilistic repair of denial constraint violations on-demand, driven by the exploratory analysis that users perform. We introduce Daisy, a system that seamlessly integrates data cleaning into the analysis by relaxing query results. Daisy executes analytical queryworkloads over dirty data by weaving cleaning operators into the query plan. Our evaluation shows that Daisy adapts to the workload and outperforms traditional offline cleaning on both synthetic and real-world workloads.

show abstract

Computing Optimal Repairs for Functional Dependencies

Cited by 36 publications

References 37 publications

Capuchin: Causal Database Repair for Algorithmic Fairness

Capuchin: Causal Database Repair for Algorithmic Fairness

On Multiple Semantics for Declarative Database Repairs

Cleaning Denial Constraint Violations through Relaxation

Contact Info

Product

Resources

About