The quality of the search space is an important factor that influences the performance of any machine learning algorithm including its classification. The attributes that define the search space can be poorly understood or inadequate, thereby making it difficult to discover high quality knowledge and understanding. Feature construction (FC) and feature selection (FS) are two pre-processing steps that can be used to improve the feature space quality, by enhancing the classifier performance in terms of accuracy, complexity, speed and interpretability. While FS aims to choose a set of informative features for improving the performance, FC can enhance the classification performance by evolving new features out of the original ones. The evolved features are expected to have more predictive value than the originals that make them up. Over the past few decades, several evolutionary computation (EC) methods have been proposed in the area of FC. This paper gives an overview of the literature on EC for FC. Here, we focus mainly on filter, wrapper and embedded methods, in which the contributions of these different methods are identified. Furthermore, some open challenges and current issues are also discussed in order to identify promising areas for future research.
Feature construction (FC) refers to a process that uses the original features to construct new features with better discrimination ability. Particle Swarm Optimisation (PSO) is an effective search technique that has been successfully utilised in FC. However, the application of PSO for feature construction using high dimensional data has been a challenge due to its large search space and high computational cost. Moreover, unnecessary features that were irrelevant, redundant and contained noise were constructed when PSO was applied to the whole feature. Therefore, the main purpose of this paper is to select the most informative features and construct new features from the selected features for a better classification performance. The feature clustering methods were used to aggregate similar features into clusters, whereby the dimensionality of the data was lowered by choosing representative features from every cluster to form the final feature subset. The clustering of each features are proven to be accurate in feature selection (FS), however, only one study investigated its application in FC for classification. The study identified some limitations, such as the implementation of only two binary classes and the decreasing accuracy of the data. This paper proposes a cluster based PSO feature construction approach called ClusPSOFC. The Redundancy-Based Feature Clustering (RFC) algorithm was applied to choose the most informative
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.