Analyzing gene expression data is a challenging task since the large number of features against theshortage of available examples can be prone to over fitting. In order to avoid this pitfall and achieve high performance, some approaches construct complex classifiers, using new or well-established strategies. In medical decision making (classification, diagnosing)
there are many situations where decision must be made effectively and reliably. Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. Conceptual simple decision making models with the possibility of automatic learning are the most appropriate for performing such tasks. Decision trees are a reliable and effective decision making technique that provide high classification accuracy with a simple representation of gathered knowledge and they have been used in different areas of medical decision making. Following recent breakthroughs in the automatic design of machine learning algorithms, a novel approach of hyper-heuristic evolutionary algorithm called HEAD-DT that evolves design components of top-down decision tree induction algorithms can be used to improve the classification accuracy.
Keywords: Automatic algorithm design, Decision trees, Evolutionary Algorithms, Genetic programming,
Hyper-heuristics, Machine Learning, Random forest
I. Introdution:The DNA microarray technology allows monitoring the expression of thousands of genes simultaneously [1] .Thus, it can lead to better understanding of many biological processes, improved diagnosis, and treatment of several diseases. However data collected by DNA microarray's are not suitable for direct human analysis, since a single experiment contains thousands of measured expression values. Several approaches have been suggested towards exploiting data mining from microarray data [2] including supervised and unsupervised machine learning algorithms where supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples [3]. Decision tree is a classifier depicted in a flowchart like tree structure which has been widely used to represent classification models, due to its comprehensible nature that resembles the human reasoning. Decision tree induction algorithms present several advantages over other learning algorithms, such as robustness to noise, low computational cost for generating the model, and ability to deal with redundant attributes. Besides, the induced model usually presents a good generalization ability [3] [4]. Most decision tree induction algorithms are based on a greedy top...