Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data 2014
DOI: 10.1145/2588555.2610520
|View full text |Cite
|
Sign up to set email alerts
|

Descriptive and prescriptive data cleaning

Abstract: Data cleaning techniques usually rely on some quality rules to identify violating tuples, and then fix these violations using some repair algorithms. Oftentimes, the rules, which are related to the business logic, can only be defined on some target report generated by transformations over multiple data sources. This creates a situation where the violations detected in the report are decoupled in space and time from the actual source of errors. In addition, applying the repair on the report would need to be rep… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
29
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 44 publications
(29 citation statements)
references
References 22 publications
0
29
0
Order By: Relevance
“…In this paper, the discovery of errors after some processing and integration is of particular importance given the use of large-scale cross comparisons and the integration of external sources. Such "delayed" cleaning has been analyzed and formally described in the literature [12]. Data integration, and in particular the analytical querying of an integrated set of data sources, is normally addressed through data warehousing (DW) and online analytical processing (OLAP) systems.…”
Section: G Scalia Et Al / Towards a Scientific Data Framework To Sumentioning
confidence: 99%
“…In this paper, the discovery of errors after some processing and integration is of particular importance given the use of large-scale cross comparisons and the integration of external sources. Such "delayed" cleaning has been analyzed and formally described in the literature [12]. Data integration, and in particular the analytical querying of an integrated set of data sources, is normally addressed through data warehousing (DW) and online analytical processing (OLAP) systems.…”
Section: G Scalia Et Al / Towards a Scientific Data Framework To Sumentioning
confidence: 99%
“…Here, provenance would be useful in investigating the range of options from which the resolution module made its selection and why one particular value was selected and the others rejected. Chalamalla et al [2014] describe how data identified as low quality can be repaired after isolating the causes of the quality issue. The example given describes shops run by managers and employees and a quality metric stating that, per shop, managers must have larger salaries.…”
Section: Related Workmentioning
confidence: 99%
“…This is a more restricted model of data cleaning than SVC, where the authors only consider changes to existing rows in an MV (no insertion or deletion) and do not handle the same generality of relational expressions (e.g., nested aggregates). Challamalla et al [6] proposed an approximate technique for specifying errors as constraints on a materialized view and proposing changes to the base data such that these constraints can be satisfied. While complementary, one major difference between the three works [6,36,50] and SVC is that they require an explicit specification of erroneous rows in a materialized view.…”
Section: Related Workmentioning
confidence: 99%
“…Challamalla et al [6] proposed an approximate technique for specifying errors as constraints on a materialized view and proposing changes to the base data such that these constraints can be satisfied. While complementary, one major difference between the three works [6,36,50] and SVC is that they require an explicit specification of erroneous rows in a materialized view. Identifying whether a row is erroneous requires materialization and thus specifying the errors is equivalent to full incremental maintenance.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation