3 4 T 2 T Abstract:2 T 3 7 T 1 T 3 7 T Feature selection is considered a key factor in classifications/decision problems. It is currently used in designing intelligent decision systems to choose the best features which allow the best performance. This paper proposes a regression-based approach to select the most important predictors to significantly increase the classification performance. Application to breast cancer detection and recurrence using publically available datasets proved the efficiency of this technique.1 T 2 T Key words:2 T 3 7 T classification, feature selection, regression, neural networks, breast cancer detection and recurrence
IntroductionRecent advances in Artificial Intelligence and Data Mining allowed researchers to develop the socalled intelligent decision models, smartly behaving like human beings. Most of them are based on a standalone machine-learning algorithm, such as artificial neural networks (NNs), genetic algorithms (GAs), support vector machines (SVMs), swarm intelligence (SI), random forests (RF), etc.[1]The use of the intelligent decision models in medical research is a current practice, supporting the doctors' decision in a wide range of specialities. Artificial neural networks are applied to predict the severe acute pancreatitis at admission to hospital [2]. A support vector machines approach has been used in epilepsy, for the seizure prediction with spectral power of EEG [3]. Support vector machines along with linear kernel and classification trees have been used for early diagnosis of Alzheimer-type dementia [4]. A competitive/collaborative neural computing system has been considered to early detect the pancreatic cancer [5]. A hybrid neural network-genetic algorithm has been applied for the breast cancer detection and recurrence [6].The direct use of a medical database in decision/classification purposes, without a previous analysis and pre-processing step, is often counterproductive. It is noteworthy that even the best classifier will perform poorly if the features are not chosen well. The feature selection (FS) represents the way of choosing the most relevant attributes from a dataset for improving the performance of the models used in the decision/classification process. There are many methodologies focused on FS, seen from different point of views [7][8]. The association rules technique [9], a hill climbing algorithm [10], particle swarm optimization [11], and genetic algorithms [12] are among the mostly used tools for FS in computer-aided medical diagnosis.This study proposes a FS procedure, based on a previous regression approach, using the decision classes as the output of the linear multiple regression model, and the features as predictors. A common approach is based on the exploratory examination of the correlation matrix involving all variables, and expecting to highlight the underlying correlation between features and the decision class. However, in such an attempt, there is no "automatic" way to weed out the "false" correlations. Different from this approach, th...