Abstract:Irrelevant and redundant features may not only deteriorate the performances of classifiers, but also slow the prediction process. Another problem in prediction is the availability of a large number of classification models. How to choose a satisfactory classifier is an important yet understudied task. The goal of this paper is to propose an integrated scheme for feature selection and classifier evaluation in the context of prediction. It combines traditional feature selection techniques and multi-criteria decision making (MCDM) methods in an attempt to increase the accuracies of classification models and identify appropriate classifiers for different types of data sets. space, the algorithm space, and the performance measures. The algorithm selection was presented as a learning task by the machine learning community [7]. Data mining field suggested multi-criteria based metrics to compare classification algorithms [8]. Rokach [9] suggested that the algorithm selection can be considered as a multiple criteria decision making (MCDM) problem and MCDM methods can be utilized to systematically choose the appropriate algorithm. Peng et al.
Keywords: multi-criteria decision making (MCDM);[4] applied a set of MCDM methods to rank classification algorithms for the task of software defect detection.
Research MethodologyThis section presents the research scheme and the major components of the scheme, including feature selection methods, MCDM methods, and classification algorithms.Based on the findings of Myrtveit et al. [2], this study designs the research scheme with carful consideration of these three factors. First, multiple datasets, representing different sizes and domains, are selected for the experimental study. Second, five accuracy indicators are used to evaluate classifiers. Third, 10-fold cross-validation technique is applied to the sample datasets to select features. The research scheme is summarized in Figure 1.
Classification ClassificationCompare results
Figure 1. Research schemeThe datasets are handled by two different approaches. The first approach applies traditional feature selection and classification algorithms to the datasets to get prediction results. In the second approach, feature selection and classification are conducted in four steps. First, feature selection is conducted using traditional techniques. Features are then ranked using the proposed feature selection method. The third step employs MCDM methods to evaluate feature selection techniques and choose the better performed techniques. In the last step, the selected features are used in the classification. The classification results of the first and second approaches are compared to examine whether the proposed feature selection and MCDM methods can improve the prediction accuracy. The performances of classifiers are also evaluated using MCDM methods and a recommendation of classifiers for prediction is made based on their accuracy and reliability.
Proposed feature selection methodsAs discussed in section 2, filter and wrapper are tw...