2020
DOI: 10.48550/arxiv.2005.08540
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Approximate Denial Constraints

Abstract: The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining approximate DCs (i.e., DCs that are "almost" satisfied) from data. Considering approximate constraints allows us to discover more accurate constraints in inconsistent databases, detect rules that are generally correc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…In those cases, we ignore this feature and include other features. The next information that might be available is a set of data rules in form of denial constraints [21]. Before, we discuss how we capture this information, lets formally define a denial constraint.…”
Section: Database-level Featuresmentioning
confidence: 99%

Record fusion: A learning approach

Heidari,
Michalopoulos,
Kushagra
et al. 2020
Preprint
Self Cite
“…In those cases, we ignore this feature and include other features. The next information that might be available is a set of data rules in form of denial constraints [21]. Before, we discuss how we capture this information, lets formally define a denial constraint.…”
Section: Database-level Featuresmentioning
confidence: 99%

Record fusion: A learning approach

Heidari,
Michalopoulos,
Kushagra
et al. 2020
Preprint
Self Cite
“…For example, consider the problem of estimating the mean and/or variance of some column of a table. The presence of duplicates can lead to inaccurate estimates [2]. Another example is unsupervised learning tasks, such as k-means clustering, where the presence of duplicates can perturb the computed centres and drastically change the clustering output.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we investigate certain properties of the data under which it is possible to construct efficient (both statistical and computational) procedures that can sample uniformly at random form the set of entities E. We consider three categories of datasets/methods: (1) in Section 3, we consider datasets that are 'balanced' (Defn. 2) and show how that can help us estimate the frequencies of all the entities from a 'small' sample; (2) in Section 4, we consider datasets that can be successfully partitioned into hashing blocks, and we show how access to such blocks can help us estimate the frequencies and then sample uniformly from the set of entities E; and (3) in section 5, we consider the case when the dataset is generated by a mixture of k-spherical Gaussian distributions. For all the three cases, we provide mathematical bounds to prove correctness of our approach.…”
Section: Introductionmentioning
confidence: 99%