2017
DOI: 10.1007/978-3-319-71273-4_29
|View full text |Cite
|
Sign up to set email alerts
|

An AI Planning System for Data Cleaning

Abstract: Data Cleaning represents a crucial and error prone activity in KDD that might have unpredictable effects on data analytics, affecting the believability of the whole KDD process. In this paper we describe how a bridge between AI Planning and Data Quality communities has been made, by expressing both the data quality and cleaning tasks in terms of AI planning. We also report a real-life application of our approach.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
16
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(17 citation statements)
references
References 14 publications
1
16
0
Order By: Relevance
“…Nearly a quarter of these job vacancies were not used since the process was not able to extract relevant information such as sector of economic activity, or the required education title (103,094 vacancies were analysed for both), or the required occupation (110,950 vacancies were analysed) from the web pages due to null values. It is well known that quality issues might have unpredictable effects on the reliability and believability of the generated analyses and results (see refs [51][52][53]65,66]). All in all, 67% of the vacancies analysed were concentrated in the services sector, 33% in manufacturing, and only 0.5% in construction (32.6%).…”
Section: Resultsmentioning
confidence: 99%
“…Nearly a quarter of these job vacancies were not used since the process was not able to extract relevant information such as sector of economic activity, or the required education title (103,094 vacancies were analysed for both), or the required occupation (110,950 vacancies were analysed) from the web pages due to null values. It is well known that quality issues might have unpredictable effects on the reliability and believability of the generated analyses and results (see refs [51][52][53]65,66]). All in all, 67% of the vacancies analysed were concentrated in the services sector, 33% in manufacturing, and only 0.5% in construction (32.6%).…”
Section: Resultsmentioning
confidence: 99%
“…In this way the graph can be used as a filter for classifying documents; (ii) Second, we apply our approach to a real-life problem, framed within an EU Project [10,11] in the context of Labor Market Intelligence [5]. Specifically, we show that our approach really can compete with classical machine-learning algorithms in terms of classification accuracy, by comparing our results with the ones presented in [1] on the same dataset. Moreover, we show that the explainable nature of our approach automatically provides explanations to motivate its behavior in an interpretable and reliable manner, while classical machine-learning approaches do not.…”
Section: Introductionmentioning
confidence: 89%
“…Finally, the information can be classified with standard taxonomies, which act like a lingua franca to overcome linguistic boundaries, such as (i) ISCO (The International Standard Classification of Occupations) [16], a four-level classification that represents a standardized system for organizing labor market occupations, and (ii) ESCO [17], the multilingual classification system of European Skills, Competences, Qualifications and Occupations, which is the European standard supporting all labor market intelligence over 28 EU languages. 1 The Relevance of LMI. In 2016, the EU and Eurostat launched the ESSnet Big Data project [18], involving 22 EU member states with the aim of "integrating big data in the regular production of official statistics, through pilots exploring the potential of selected big data sources and building concrete applications"".…”
Section: Background On Lmimentioning
confidence: 99%
See 2 more Smart Citations