This paper proposes a new constrained-syntax genetic programming (GP) algorithm for discovering classification rules in medical data sets. The proposed GP contains several syntactic constraints to be enforced by the system using a disjunctive normal form representation, so that individuals represent valid rule sets that are easy to interpret. The GP is compared with C4.5, a very well-known decision-tree-building algorithm, and with another GP that uses boolean inputs (BGP), in five medical data sets: Chest pain, Ljubljana breast cancer, Dermatology, Wisconsin breast cancer, and Pediatric Adrenocortical Tumor. For this last data set a new preprocessing step was devised for survival prediction. Computational experiments show that, overall, the GP algorithm obtained good results with respect to predictive accuracy and rule comprehensibility, by comparison with C4.5 and BGP.
This work aims at discovering classification rules for diagnosing certain pathologies. These rules are capable of discriminating among 12 different pathologies, whose main symptom is chest pain. In order to discover these rules we have used genetic programming as well as some concepts of data mining, with emphasis on the discovery of comprehensible knowledge. The fitness function used combines a measure of rule comprehensibility with two usual indicators in medical domain: sensitivity and specificity. Results regarding the predictive accuracy of the discovered rule set as a whole and the predictive accuracy of individual rules are presented and compared to other approaches.
Abstract. This paper proposes a constrained-syntax genetic programming (GP) algorithm for discovering classification rules in medical data sets. The proposed GP contains several syntactic constraints to be enforced by the system using a disjunctive normal form representation, so that individuals represent valid rule sets that are easy to interpret. The GP is compared with C4.5 in a real-world medical data set. This data set represents a difficult classification problem, and a new preprocessing method was devised for mining the data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.