Proceedings of the 2016 International Conference on Management of Data 2016
DOI: 10.1145/2882903.2915242
|View full text |Cite
|
Sign up to set email alerts
|

Interactive and Deterministic Data Cleaning

Abstract: We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses SQL update queries as the language to repair data. Falcon does not rely on the existence of a set of pre-defined data quality rules. On the contrary, it encourages users to explore the data, identify possible problems, and make updates to fix them. Bootstrapped by one user update, Falcon guesses a set of possible SQL update queries that can be used to repair the data. The main technical challenge addressed in thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
4
1

Relationship

3
7

Authors

Journals

citations
Cited by 63 publications
(43 citation statements)
references
References 43 publications
(58 reference statements)
0
43
0
Order By: Relevance
“…Despite all these efforts, detecting data errors with high accuracy is far from automatic [3,18], and almost all practical tools (heavily) involve users to properly tune the parameters and provide feedback. PFDs shine some light on automatically detecting data errors by carefully examining the correlation of partial values across different attributes.…”
Section: Related Workmentioning
confidence: 99%
“…Despite all these efforts, detecting data errors with high accuracy is far from automatic [3,18], and almost all practical tools (heavily) involve users to properly tune the parameters and provide feedback. PFDs shine some light on automatically detecting data errors by carefully examining the correlation of partial values across different attributes.…”
Section: Related Workmentioning
confidence: 99%
“…In fact, standardizing the vocabulary of pathologies and pathology indicators is crucial in the early stages of data preparation. To this end we used a consolidated suite of data-cleaning tools [8][9][10].…”
Section: The Greg ML Ecosystemmentioning
confidence: 99%
“…Technically speaking, there are similarities between the algorithms provided in this paper and the classical powerset lattice and combinatorial set enumeration problems [11], such as data cube modeling [26], frequent item-sets and association rule mining [12], data profiling [27], recommendation systems [28], and data cleaning [29]. While such work, and the algorithms such as apriori, traverse over the powerset lattice, our problem is modeled as the traversal over the pattern graph which has a different structure (and properties) compared to a powerset lattice.…”
Section: Related Workmentioning
confidence: 99%