This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We report an extensive experimental study comprising data sets from thematic and non-thematic text classification as well as from image classification. Our study shows the validity of the proposed method; in fact, we show that TWSs learned with the genetic program outperform traditional schemes and other * Corresponding author.TWSs proposed in recent works. Further, we show that TWSs learned from a specific domain can be effectively used for other tasks.
Nearest-neighbor (NN) methods are highly effective and widely used pattern classification techniques. There are, however, some issues that hinder their application for large scale and noisy data sets; including, its high storage requirements, its sensitivity to noisy instances, and the fact that test cases must be compared to all of the training instances. Prototype (PG) and feature generation (FG) techniques aim at alleviating these issues to some extent; where, traditionally, both techniques have been implemented separately. This paper introduces a genetic programming approach to tackle the simultaneous generation of prototypes and features to be used for classification with a NN classifier. The proposed method learns to combine instances and attributes to produce a set of prototypes and a new feature space for each class of the classification problem via genetic programming. An heterogeneous representation is proposed together with ad-hoc genetic operators. The proposed approach overcomes some limitations of NN without degradation in its classification performance. Experimental results are reported and compared with several other techniques. The empirical assessment provides evidence of the effectiveness of the proposed approach in terms of classification accuracy and instance/feature reduction.
Surrogate-based methods aim at reducing the evaluation of expensive fitness functions in optimization processes. Several surrogate-based methods for evolutionary optimization have been proposed so far, including those based on granular computing / clustering. Granular computing provides granules as an assemblage of entities arranged together by their similarity, functional or physical adjacency, indistinguishability, coherency, or the like. Techniques like this avoid multiple and unnecessary evaluations of individuals repeatedly. In this paper, with the aim of granular computing as a method of grouping data, such information is exploited to obtain knowledge of the structure and parameters of individuals and then, design a Neuro-Fuzzy network that adapts granules' parameters, providing convergence to acceptable solutions with a reduced number of evaluations of the fitness function. We implement this adaptive surrogate in a genetic algorithm and show its performance using benchmark functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.