Mining Functional Dependency from Relational Databases Using Equivalent Classes and Minimal Cover

Atoum, Jalal Omer; Bader, Dojanah; Awajan, Arafat

doi:10.3844/jcssp.2008.421.426

Cited by 8 publications

(4 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(2) Rely on information theoretic measures [30] by considering the ratio Both above approaches have a fundamental flaw: given a finite sample of tuples, as the number of attributes in X increases, it more likely that the empirical ratio PR (x|y) = |(x, y)|/|x| is 1.0, leading both aforementioned approaches to determine that Equation 1 is satisfied 1 . This behavior leads to overfitting to spurious dependencies and the discovery of complex (dense) structures across attributes.…”

Section: Functional Dependenciesmentioning

confidence: 99%

“…We build upon recent works that observe that in the presence of strong structured dependencies automated data cleaning can be effective [17,40] and perform the following experiment: For each data set in Table 3, we separate its attributes into two groups (1) attributes that participate in an FD based on FDX's output, and (2) attributes that are independent according to FDX. We measure the median imputation accuracy for each group for AimNet and XGBoost and examine if the constraints discovered by FDX can be used as a proxy to identify if automated cleaning will be accurate.…”

Section: Using Fdx In Data Preparationmentioning

confidence: 99%

“…These applications include data cleaning [4,9,10,40,41,44], schema normalization [15], and query optimization [20]. Numerous algorithms have been proposed for discovering syntactically valid FDs in a data set [1,19,25,30,34,35]. We review the works related ours:…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

A Statistical Perspective on Discovering Functional Dependencies in Noisy Data

Zhang

Guo

Ρεκατσίνας

2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

We study the problem of discovering functional dependencies (FD) from a noisy data set. We adopt a statistical perspective and draw connections between FD discovery and structure learning in probabilistic graphical models. We show that discovering FDs from a noisy data set is equivalent to learning the structure of a model over binary random variables, where each random variable corresponds to a functional of the data set attributes. We build upon this observation to introduce FDX a conceptually simple framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that FDX can recover true functional dependencies across a diverse array of realworld and synthetic data sets, even in the presence of noisy or missing data. We find that FDX scales to large data instances with millions of tuples and hundreds of attributes while it yields an average F 1 improvement of 2× against state-of-the-art FD discovery methods.

show abstract

Section: Functional Dependenciesmentioning

confidence: 99%

Section: Using Fdx In Data Preparationmentioning

confidence: 99%

See 1 more Smart Citation

A Statistical Perspective on Discovering Functional Dependencies in Noisy Data

Zhang

Guo

Ρεκατσίνας

2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…In general, there is an effort to recover as much of the semantics as possible from many different source genres. For example, researchers have investigated semantic recovery from relational databases [5,6], XML [1,38], human-readable tables [24,25,34], forms [22,31], and free-running text [10].…”

Section: Theorem 3 Let S Be a Relational Database With Its Schema Rementioning

confidence: 99%

Theoretical Foundations for Enabling a Web of Knowledge

Embley

Zitzelberger

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The current web is a web of linked pages. Frustrated users search for facts by guessing which keywords or keyword phrases might lead them to pages where they can find facts. Can we make it possible for users to search directly for facts embedded in web pages? Instead of a web of human-readable pages containing machine-inaccessible facts, can the web be a web of machine-accessible facts superimposed over a web of human-readable pages? Ultimately, can the web be a web of knowledge that can provide direct answers to factual questions and support these answers by referencing and highlighting relevant base facts embedded in source pages? Answers to these questions call for distilling knowledge from the web's wealth of heterogeneous digital data into a web of knowledge. But how? Or, even more fundamentally, what, precisely, is this web of knowledge, and what is required to enable it? To answer these questions, we proffer a theoretical foundation for a web of knowledge: We formally define a computational view of knowledge in a way that enables practical construction and use of a web of knowledge.

show abstract

Topology of Generalized Classifications

Parrochia

Neuville

2013

Towards a General Theory of Classifications

View full text Add to dashboard Cite

Mining Functional Dependency from Relational Databases Using Equivalent Classes and Minimal Cover

Cited by 8 publications

References 9 publications

A Statistical Perspective on Discovering Functional Dependencies in Noisy Data

A Statistical Perspective on Discovering Functional Dependencies in Noisy Data

Theoretical Foundations for Enabling a Web of Knowledge

Topology of Generalized Classifications

Contact Info

Product

Resources

About