This paper presents an algorithm for discovering surprising exception rules from data sets. An exception rule, which is defined as a deviational pattern to a common sense, exhibits unexpectedness and is sometimes extremely useful. A domain-independent approach, PEDRE, exists for the simultaneous discovery of exception rules and their common sense rules. However, PEDRE, being too conservative, have difficulty in discovering surprising rules. Historic exception discoveries show that surprise is often linked with interestingness. In order to formalize this notion we propose a novel approach by improving PEDRE. First, we reformalize the problem and settle a looser constraints on the reliability of an exception rule. Then, in order to screen out uninteresting rules, we introduce, for an exception rule, an evaluation criterion of surprise by modifying intensity of implication, which is based on significance. Our approach has been validated using data sets from the UCI repository.
This paper presents an efficient algorithm for discovering exception rules from a data set without domain-specific information. An exception rule, which is defined as a deviational pattern to a strong rule, exhibits unexpectedness and is sometimes extremely useful. Previous discovery approaches for this type of knowledge can be classified into a directed approach, which obtains exception rules each of which deviates from a set of user-prespecified strong rules, and an undirected approach, which typically discovers a set of rule pairs each of which represents a pair of an exception rule and its corresponding strong rule. It has been pointed out that unexpectedness is often related to interestingness. In this sense, an undirected approach is promising since its discovery outcome is free from human prejudice and thus tends to be highly unexpected. However, this approach is prohibitive due to extra search for strong rules as well as unreliable patterns in the output. In order to circumvent these difficulties we propose a method based on sound pruning and probabilistic estimation. The sound pruning reduces search time to a reasonable amount, and enables exhaustive search for rule pairs. The normal approximations of the multinomial distributions are employed as the method for evaluating reliability of a rule pair. Our method has been validated using two medical data sets under supervision of a physician and two benchmark data sets in the machine learning community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.