DeepClean: Data Cleaning via Question Asking

Zhang, Xinyang; Ji, Yujie; Nguyen, Chanh; Wang, Ting

doi:10.1109/dsaa.2018.00039

Cited by 2 publications

(1 citation statement)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By surveying experts and using the weighted mean of their evaluations, we limit the external threat of the evaluation of the difficulty to do elementary tasks. Moreover, the elementary tasks were obtained by decomposing repairing methods from multiple papers [7,14,10,5,16,8,4,2,11,18,1,13]. For the generation of errors, we generated them randomly by means of a uniform distribution in datasets and repeated the process 30 times to reduce bias.…”

Section: Threats To Validitymentioning

confidence: 99%

On Studying the Effect of Data Quality on Classification Performances

Jouseau

Salva

Samir

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

During the last decade, data have played a key role for learning and decision making models. Unfortunately, the quality of data has been ignored or partially investigated as a pre-processing step. Motivated by applications in various fields, we propose to study data quality and its impact on the performance of several learning models. In this work, we first introduce a list of elementary repairing tasks ranging from easy to complex with an increasing level. Then, we form categories from the state-of-the-art cleaning and repairing methods. We also investigate if it is always efficient to repair data. By including standard classifications models and public dataset, our work enables their use in different contexts and can be extended to other machine learning applications.

show abstract