In concept learning or data mining tasks, the learner is typically faced with a choice of many possible hypotheses characterizing the data. If one can assume that the training data are noise-free, then the generated hypothesis should be complete and consistent with regard to the data. In real-world problems, however, data are often noisy, and an insistence on full completeness and consistency is no longer valid. The problem then is to determine a hypothesis that represents the "best" trade-off between completeness and consistency. This paper presents an approach to this problem in which a learner seeks rules optimizing a description quality criterion that combines completeness and consistency gain, a measure based on consistency that reflects the rule's benefit. The method has been implemented in the AQ18 learning and data mining system and compared to several other methods. Experiments have indicated the flexibility and power of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.