This paper presents a novel approach to clustering using an accuracy-based Learning Classifier System. Our approach achieves this by exploiting the generalization mechanisms inherent to such systems. The purpose of the work is to develop an approach to learning rules which accurately describe clusters without prior assumptions as to their number within a given dataset. Favourable comparisons to the commonly used k-means algorithm are demonstrated on a number of synthetic datasets.
Learning classifier systems (LCSs) are rule-based inductive learning systems that have been widely used in the field of supervised and reinforcement learning over the last few years. This paper employs sUpervised Classifier System (UCS), a supervised learning classifier system, that was introduced in 2003 for classification tasks in data mining. We present an adaptive framework of UCS on top of a self-organized map (SOM) neural network. The overall classification problem is decomposed adaptively and in real time by the SOM into subproblems, each of which is handled by a separate UCS. The framework is also tested with replacing UCS by a feedforward artificial neural network (ANN). Experiments on several synthetic and real data sets, including a very large real data set, show that the accuracy of classifications in the proposed distributed environment is as good or better than in the nondistributed environment, and execution is faster. In general, each UCS attached to a cell in the SOM has a much smaller population size than a single UCS working on the overall problem; since each data instance is exposed to a smaller population size than in the single population approach, the throughput of the overall system increases. The experiments show that the proposed framework can decompose a problem adaptively into subproblems, maintaining or improving accuracy and increasing speed.
There is a great deal of research undertaken for pruning away features and hidden units in order to reduce the size of Artificial Neural Networks (ANNs). However, none of these methods mentions about the relationship between the pruned unit and the number of epochs needed for retraining when the unit is pruned away from the network. In this paper, we present two heuristics for determining the pruning orders, which lead to the near smallest number of retraining epochs. The heuristics are based on the employment of the modified information gain calculated from all features in training data. Then, we test our proposed heuristics on an exclusive-or data set. The experimental results show the success of using information gain as a criterion for determining the pruning orders.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.