2015
DOI: 10.14778/2752939.2752946
|View full text |Cite
|
Sign up to set email alerts
|

Divide & conquer-based inclusion dependency discovery

Abstract: The discovery of all inclusion dependencies (INDs) in a dataset is an important part of any data profiling effort. Apart from the detection of foreign key relationships, INDs can help to perform data integration, query optimization, integrity checking, or schema (re-)design. However, the detection of INDs gets harder as datasets become larger in terms of number of tuples as well as attributes. To this end, we propose Binder, an IND detection system that is capable of detecting both unary and n-ary INDs. It is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
21
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 62 publications
(23 citation statements)
references
References 16 publications
2
21
0
Order By: Relevance
“…In order to compute the above optimization problem, we try to extract the roots of the first derivative function of Equation 33( i.e., f (r, α 1 , α 2 , b)) with respect to r. However, the derivative function is a polynomial function with degree of r larger than four. According to Abel's impossibility theorem [39], there is no algebraic solution, thus we try to give the numerical solution.…”
Section: Then the Variance Of Gb-kmv Methods By Equation 32 Ismentioning
confidence: 99%
See 1 more Smart Citation
“…In order to compute the above optimization problem, we try to extract the roots of the first derivative function of Equation 33( i.e., f (r, α 1 , α 2 , b)) with respect to r. However, the derivative function is a polynomial function with degree of r larger than four. According to Abel's impossibility theorem [39], there is no algebraic solution, thus we try to give the numerical solution.…”
Section: Then the Variance Of Gb-kmv Methods By Equation 32 Ismentioning
confidence: 99%
“…In a dataset, the discovery of all inclusion dependencies is a crucial part of data profiling efforts. It has many applications such as foreign-key detection and data integration(e.g., [22], [31], [8], [33], [30]).…”
Section: Introductionmentioning
confidence: 99%
“…Additionally, hybrid algorithms have been proposed in [87,102] that combine bottom-up and top-down traversal for additional pruning. The Binder algorithm uses divide and conquer principles to handle larger datasets than related work [114]. In the divide step, it splits the input dataset horizontally into partitions and vertically into buckets with the goal to fit each partition into main memory.…”
Section: Generating N-ary Inclusion Dependenciesmentioning
confidence: 99%
“…If they are not available in the schema, one can extract them from the database content. AutoMode uses the Binder algorithm [9] to discover INDs from the database, shown by the Exact IND discovery box in Figure 1, and generates all unary INDs implied by them.…”
Section: Generating Predicate Definitionsmentioning
confidence: 99%