In many data mining applications that address classification problems, feature and model selection are considered as key tasks. That is, appropriate input features of the classifier must be selected from a given (and often large) set of possible features and structure parameters of the classifier must be adapted with respect to these features and a given data set. This paper describes an evolutionary algorithm (EA) that performs feature and model selection simultaneously for radial basis function (RBF) classifiers. In order to reduce the optimization effort, various techniques are integrated that accelerate and improve the EA significantly: hybrid training of RBF networks, lazy evaluation, consideration of soft constraints by means of penalty terms, and temperature-based adaptive control of the EA. The feasibility and the benefits of the approach are demonstrated by means of four data mining problems: intrusion detection in computer networks, biometric signature verification, customer acquisition with direct marketing methods, and optimization of chemical production processes. It is shown that, compared to earlier EA-based RBF optimization techniques, the runtime is reduced by up to 99% while error rates are lowered by up to 86%, depending on the application. The algorithm is independent of specific applications so that many ideas and solutions can be transferred to other classifier paradigms.
In this paper, we propose a novel design of evolving fuzzy classifiers (EFCs) to handle online multiclass classification problems in a data-streaming context. Therefore, we exploit the concept of all-pairs (AP), a.k.a. all-versus-all, classification using binary classifiers for each pair of classes. This benefits from less complex decision boundaries to be learned, as opposed to a direct multiclass approach, and achieves a higher efficiency in terms of incremental training time than one-versus-rest classification techniques. For the binary classifiers, we apply fuzzy classifiers with singleton class labels in the consequences, as well as Takagi-Sugeno (T-S) fuzzy models to conduct regression on [0, 1] for each class pair. Both are evolved and incrementally trained in a data-streaming context, yielding a permanent update of the whole AP collection of classifiers, thus being able to properly react to dynamic changes in the streams. The classification phase considers a novel strategy by using the preference levels of each pair of classes that are collected in a preference relation matrix and performing a weighted voting scheme on this matrix. This is done by investigating the reliability of the classifiers in their predictions: 1) integrating the degree of ignorance on samples to be classified as weights for the preference levels and 2) new conflict models used in the single binary classifiers and when calculating the final class response based on the preference relation matrix. The advantage of the new EFC concept over the single model (using a direct multiclass classification concept) and multimodel architectures (using a one-versus-rest classification concept) will be underlined by empirical evaluations and comparisons at the end of this paper based on high-dimensional real-world multiclass classification problems. The results also show that integrating conflict and ignorance concepts into the preference relations can boost classifier accuracies.
Humans do not only learn by their own experience but also by rules obtained from other humans. It is a challenging idea to enable distributed, intelligent computer systems to follow this human archetype. A basic technique needed for such an "organic" system is the fusion of functional knowledge in form of symbolic rules that are gained from several sources (nodes of the distributed system). We assume that these nodes are equipped with self-learning classifiers on the basis of a hybrid radial basis function network / fuzzy system paradigm. We provide methods for the fusion of fuzzy-type rules extracted from such classifiers. These methods aim at preserving the consistency and comprehensibility of a found rule set (e.g. low number of rules, distinguishability of membership functions) by means of a regularization approach.
Radial hasis function (RBF) networks are used in many applications, e.g. for pattern classification or nonlinear regression. For a given application, parameters of an RBF network such as centers and radii of hasis functions or weights must be adapted. npically, either stochastic, iterative training algorithms (e.g. gradient-based methods such as backpropagation or second-order techniques such as scaled conjugate gradients) or clustering methods in combination with a linear optimization technique (e.g. e-means and singular value decomposition for a linear least-squares problem) are used for this task. This article shows that a combination of the two approaches leads to significant improvements concerning the training time as well as the approximation and generalization properties of the networks. In the particular marketing application investigated here (prediction of customer behavior), the overall training time could he reduced compared to backpropagation and the prediction accuracy could be increased compard to e-means plus singular value decomposition. This article also describes a new idea for the initialization of basis function centers. Basically, this approach is a modification of the standard c-means algorithm that leads to a linear least-squares problem for which solvability can he guaranteed. This idea rises the reliability of the training procedure without additional costs regarding the rnn time as well as the quality of results.
Knowledge transfer is one of the most important mechanisms of human evolution. The ontogeny of humans enables them to act efficiently in a very dynamic environment. Thus, it would be highly desirable to enable "intelligent" artificial systems to behave in a similar way. This article introduces basic technologies that are needed for that purpose. With these technologies -components of a future knowledge transfer toolboxit is possible to detect novel concepts that arise in the input space of a classifier or existing classification rules that become obsolete. Then, prototypes of new rules can be created automatically using an on-line clustering mechanism. These prototypes are compared to already existing rules, rated, and eventually accepted or discarded. In case of acceptance, a human expert labels the rules which are then both integrated into the "own" classifier and sent to other classifiers. Thus, knowledge transfer between "intelligent" artificial systems becomes possible and the overall system is provided with a new kind of self-optimization ability.
The understandability of rule sets is an important issue in knowledge discovery, where classification rules, for example, are extracted from large data sets. An important criterion in this context is the goodness of fit of a given classifier, i.e., a measure that gives an quantitative answer to the question, how good a classifier fits to the data it has to classify. In this article we provide an appropriate measure for a Mamdani-type fuzzy classifier with Gaussians and singletons as membership functions, sum-prod inference, and height method for defuzzification. That is, goodness of fit must be measured for multivariate Gaussian mixture models. Therefore, we adopt conventional test methods for univariate, unimodal probability distributions (e.g., Kolmogorov-Smirnov for chi-square), provide a measure for the goodness of fit of our fuzzy classifier, and discuss its properties. In a second step we go even beyond this point by showing how this measure could be extended to an analysis tool that gives detailed hints which rules or which membership functions are not suitably realized.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.