In essence, the goal of data mining is to discover knowledge which is highly accurate, comprehensible and "interesting" (surprising, novel). Although the literature emphasizes predictive accuracy and comprehensibility, the discovery of interesting knowledge remains a formidable challenge for data mining algorithms. In this paper we present a genetic algouithm designed &om the scratch to discover interesting rules. Our GA addresses the dependence modelling task, where different rules can predict different goal attributes. This task can be regarded as a generalization of the classification task, where all rules predict the same goal attribute.
Abstract. In the last few years, the data mining community has proposed a number of objective rule interestingness measures to select the most interesting rules, out of a large set of discovered rules. However, it should be recalled that objective measures are just an estimate of the true degree of interestingness of a rule to the user, the so-called real human interest. The latter is inherently subjective. Hence, it is not clear how effective, in practice, objective measures are. More precisely, the central question investigated in this paper is: "how effective objective rule interestingness measures are, in the sense of being a good estimate of the true, subjective degree of interestingness of a rule to the user?" This question is investigated by extensive experiments with 11 objective rule interestingness measures across eight real-world data sets.
Summary. Evolutionary Algorithms (EAs) are stochastic search algorithms inspired by the process of neo-Darwinian evolution. The motivation for applying EAs to data mining is that they are robust, adaptive search techniques that perform a global search in the solution space. This chapter first presents a brief overview of EAs, focusing mainly on two kinds of EAs, viz. Genetic Algorithms (GAs) and Genetic Programming (GP). Then the chapter reviews the main concepts and principles used by EAs designed for solving several data mining tasks, namely: discovery of classification rules, clustering, attribute selection and attribute construction. Finally, it discusses Multi-Objective EAs, based on the concept of Pareto dominance, and their use in several data mining tasks.
There exist numerous systems for mining the web in search of relevant information but few exist for the discovery of interesting information. The discovery of interesting information is an advance on basic text mining in that it aims to identify text that is novel, unexpected or surprising to a user, whilst still being relevant. This article investigates the use of Artificial Immune Systems (AIS) applied to discovery of interesting information. AIS are thought to confer the adaptability and learning required for this task. AISIID (Artificial Immune system for Interesting Information Discovery) is described in some detail, then an evaluative study is undertaken involving the subjective evaluation of the results by users. AISIID is found to discover pages rated more interesting by users than a comparative system.
In Machine Learning and Data Mining, most of the works in classification problems deal with flat classification, where each instance is classified in one of a set of possible classes and there is no hierarchical relationship between the classes. There are, however, more complex classification problems where the classes to be predicted are hierarchically related. This chapter presents a tutorial on the hierarchical classification techniques found in the literature. We also discuss how hierarchical classification techniques have been applied to the area of Bioinformatics (particularly the prediction of protein function), where hierarchical classification problems are often found.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.