We study the problem of self-supervised and interpretable data cleaning, which automatically extracts interpretable data repair rules from dirty data. In this paper, we propose a novel framework, namely Garf, based on sequence generative adversarial networks (SeqGAN). One key information Garf tries to capture is data repair rules (for example, if the city is "Dothan", then the county should be "Houston"). Garf employs a SeqGAN consisting of a generator
G
and a discriminator
D
that trains
G
to learn the dependency relationships (
e.g.
, given a city value "Dothan" as input, the county can be determined as "Houston"). After training, the generator
G
can be used to generate data repair rules, but may contain both trusted and untrusted rules, especially when learning from dirty data. To mitigate this problem, Garf further updates the learned relationships with another discriminator
D'
to iteratively improve the quality of both rules and data. Garf takes advantages of both logical and learning-based methods, which allow cleaning dirty data with high interpretability and have no requirements for prior knowledge and training data. Extensive experiments on real-world and synthetic datasets demonstrate the effectiveness of Garf. Garf achieves new state-of-the-art data cleaning result with high accuracy, through learning from dirty datasets without human supervision.
Abstract-"Probability Theory and Mathematical Statistics" is a public mathematical elementary course for all science and engineering specialties in colleges and universities. It is because this course applies widely in many other professional courses of the specialties of science and engineering. But this course is full of theoretical knowledge and almost no practical experiment. It makes the students show little interest in this course. In this paper, we analyze disadvantage of classic educational method in course "Probability Theory and Mathematical Statistics" at first. Then with practical nature of itself, we provide a novel evaluating method after a long time teaching practice in this course. We improve the classic evaluating method, which is examination on paper only, but bring the formative evaluation in the evaluating method. Finally, based on the new evaluating method, experiment results show advantage and effectiveness of this improved evaluating method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.