A majority of contribution in the domain of rule mining overemphasize on maximizing the predictive accuracy of the discovered patterns. The user-oriented criteria such as comprehensibility and interestingness are have been given secondary importance. Recently, it has been widely acknowledged that even highly accurate discovered knowledge might be worthless if it scores low on the qualitative parameters of comprehensibility and interestingness. This paper presents a classification algorithm based on evolutionary approach that discovers comprehensible and interesting in CNF form in which along with conjunction in between various attributes there is disjunction among the values of an attribute. A flexible encoding scheme, genetic operators with appropriate syntactic constraints and a suitable fitness function to measure the goodness of rules are proposed for effective evolution of rule sets. The proposed genetic algorithm is validated on several datasets of UCI data set repository and experimental results are presented which clearly indicate lower error rates and more comprehensibility across a range of datasets. Some of the rules show the interesting and valuable nuggets of knowledge discovered from small disjuncts of high accuracy and low support which are very difficult to capture otherwise.
Feature selection is an important pre-processing task for building accurate and comprehensible classification models. Several researchers have applied filter, wrapper or hybrid approaches using genetic algorithms which are good candidates for optimization problems that involve large search spaces like in the case of feature selection. Moreover, feature selection is an inherently multi-objective problem with many competing objectives involving size, predictive power and redundancy of the feature subset under consideration. Hence, Multi-Objective Genetic Algorithms (MOGAs) are a natural choice for this problem. In this paper, we propose a hybrid approach (a wrapper guided by filter approach) for feature selection which employs a MOGA at filter phase and a simple GA at the wrapper phase. The MOGA at filter phase provides a non-dominated set of feature subsets optimized on several criteria as input to the wrapper phase. Now, Genetic Algorithm at wrapper phase does the classifier dependent optimization. We have used support vector machine (SVM) as the classification algorithm in the wrapper phase. The proposed hybrid approach has been validated on ten datasets from UCI Machine learning repository. A comparison is presented in terms of predictive accuracy, feature subset size and running time among the pure filter, pure wrapper, an earlier hybrid approach based on genetic algorithm and the proposed approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.