Feature construction is an effort to transform the input space of classification problems in order to improve the classification performance. Feature construction is particularly important for classifier inducers that cannot transform their input space intrinsically. This paper proposes GPMFC, a multiplefeature construction system for classification problems using genetic programming (GP). This paper takes a nonwrapper approach by introducing a filter-based measure of goodness for constructed features. The constructed, high-level features are functions of original input features. These functions are evolved by GP using an entropy-based fitness function that maximizes the purity of class intervals. A decomposable objective function is proposed so that the system is able to construct multiple high-level features for each problem. The constructed features are used to transform the original input space to a new space with better separability. Extensive experiments are conducted on a number of benchmark problems and symbolic learning classifiers. The results show that, in most cases, the new approach is highly effective in increasing the classification performance in rule-based and decision tree classifiers. The constructed features help improve the learning performance of symbolic learners. The constructed features, however, may lack intelligibility.
Image classification is a complex but important task especially in the areas of machine vision and image analysis such as remote sensing and face recognition. One of the challenges in image classification is finding an optimal set of features for a particular task because the choice of features has direct impact on the classification performance. However the goodness of a feature is highly problem dependent and often domain knowledge is required. To address these issues we introduce a Genetic Programming (GP) based image classification method, Two-Tier GP, which directly operates on raw pixels rather than features. The first tier in a classifier is for automatically defining features based on raw image input, while the second tier makes decision. Compared to conventional feature based image classification methods, Two-Tier GP achieved better accuracies on a range of different tasks. Furthermore by using the features defined by the first tier of these Two-Tier GP classifiers, conventional classification methods obtained higher accuracies than classifying on manually designed features. Analysis on evolved Two-Tier image classifiers shows that there are genuine features captured in the programs and the mechanism of achieving high accuracy can be revealed. The Two-Tier GP method has clear advantages in image classification, such as high accuracy, good interpretability and the removal of explicit feature extraction process.
The high number of features in many machine vision applications has a major impact on the performance of machine learning algorithms. Feature selection (FS) is an avenue to dimensionality reduction. Evolutionary search techniques have been very promising in finding solutions in the exponentially growing search space of FS problems. This paper proposes a genetic programming (GP) approach to FS where the building blocks are subsets of features and set operators. We use bit-mask representation for subsets and a set of set operators as primitive functions. The GP search, then combines these subsets and set operations to find an optimal subset of features. The task we study is a highly imbalanced face detection problem. A modified version of the Naïve Bayes classification model is used as the fitness function. Our results show that the proposed algorithm can achieve a significant reduction in dimensionality and processing time. Using the GP-selected features, the performance of certain classifiers can also be improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.