Abstract. Symbolic induction is a promising approach to constructing decision models by extracting regularities from a data set of examples. The predominant type of model is a classification rule (or set of rules) that maps a set of relevant environmental features into specific categories or values. Classifying loan risk based on borrower profiles, consumer choice from purchase data, or supply levels based on operating conditions are all examples of this type of model-building task. Although current inductive approaches, such as ID3 and CN2, perform well on certain problems, their potential is limited by the incremental nature of their search. Genetic ,algorithms (GA) have shown great promise on complex search domains, and hence suggest a means for overcoming these limitations. However, effective use of genetic search in this context requires a framework that promotes the fundamental model-building objectives of predictive accuracy and model simplicity. In this article we describe, COGIN, a GAbased inductive system that exploits the conventions of induction from examples to provide this framework. The novelty of COGIN lies in its use of training set coverage to simultaneously promote competition in various classification niches within the model and constrain overall model complexity. Experimental comparisons 'with NewID and CN2 provide evidence of the effectiveness of the COGIN framework and the viability of the GA approach.
Abstract. Symbolic induction is a promising approach to constructing decision models by extracting regularities from a data set of examples. The predominant type of model is a classification rule (or set of rules) that maps a set of relevant environmental features into specific categories or values. Classifying loan risk based on borrower profiles, consumer choice from purchase data, or supply levels based on operating conditions are all examples of this type of model-building task. Although current inductive approaches, such as ID3 and CN2, perform well on certain problems, their potential is limited by the incremental nature of their search. Genetic ,algorithms (GA) have shown great promise on complex search domains, and hence suggest a means for overcoming these limitations. However, effective use of genetic search in this context requires a framework that promotes the fundamental model-building objectives of predictive accuracy and model simplicity. In this article we describe, COGIN, a GAbased inductive system that exploits the conventions of induction from examples to provide this framework. The novelty of COGIN lies in its use of training set coverage to simultaneously promote competition in various classification niches within the model and constrain overall model complexity. Experimental comparisons 'with NewID and CN2 provide evidence of the effectiveness of the COGIN framework and the viability of the GA approach.
Promoting and maintaining diversity is a critical requirement of search in learning classifier systems (LCSs). What is required of the genetic algorithm (GA) in an LCS context is not convergence to a single global maximum, as in the standard optimization framework, but instead the generation of individuals (i.e., rules) that collectively cover the overall problem space. COGIN (COverage-based Genetic INduction) is a system designed to exploit genetic recombination for the purpose of constructing rule-based classification models from examples. The distinguishing characteristic of COGIN is its use of coverage of training set examples as an explicit constraint on the search, which acts to promote appropriate diversity in the population of rules over time. By treating training examples as limited resources, COGIN creates an ecological model that simultaneously accommodates a dynamic range of niches while encouraging superior individuals within a niche, leading to concise and accurate decision models. Previous experimental studies with COGIN have demonstrated its performance advantages over several well-known symbolic induction approaches. In this paper, we examine the effects of two modifications to the original system configuration, each designed to inject additional diversity into the search: increasing the carrying capacity of training set examples (i.e., increasing coverage redundancy) and increasing the level of disruption in the recombination operator used to generate new rules. Experimental results are given that show both types of modifications to yield substantial improvements to previously published results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.