Esta es la versión de autor de la comunicación de congreso publicada en: This is an author produced version of a paper published in:
AbstractWe present a novel ensemble pruning method based on reordering the classifiers obtained from bagging and then selecting a subset for aggregation. Ordering the classifiers generated in bagging makes it possible to build subensembles of increasing size by including first those classifiers that are expected to perform best when aggregated. Ensemble pruning is achieved by halting the aggregation process before all the classifiers generated are included into the ensemble. Pruned subensembles containing between 15% and 30% of the initial pool of classifiers, besides being smaller, improve the generalization performance of the full bagging ensemble in the classification problems investigated.
El acceso a la versión del editor puede requerir la suscripción del recurso Access to the published version may require subscription
Switching Class Labels to Generate Classification EnsemblesGonzalo Martínez-Muñoz * and Alberto Suárez Escuela Politécnica Superior, Universidad Autónoma de Madrid, C/ Francisco Tomás y Valiente, 11, Madrid E-28049, Spain
AbstractEnsembles that combine the decisions of classifiers generated by using perturbed versions of the training set where the classes of the training examples are randomly switched can produce a significant error reduction, provided that large numbers of units and high class switching rates are used. The classifiers generated by this procedure have statistically uncorrelated errors in the training set. Hence, the ensembles they form exhibit a similar dependence of the training error on ensemble size, independently of the classification problem. In particular, for binary classification problems, the classification performance of the ensemble on the training data can be analysed in terms of a Bernoulli process. Experiments on several UCI datasets demonstrate the improvements in classification accuracy that can be obtained using these class-switching ensembles.
Esta es la versión de autor del artículo publicado en: This is an author produced version of a paper published in:
AbstractThe performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set m wor = n. Without-replacement methods typically use half samples m wr = n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of the classification performance of the ensemble. We propose to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio. Ensembles of classifiers trained on independent samples whose size is such that the out-of-bag error of the ensemble is as low as possible generally improve the performance of standard bagging and can be efficiently built.
Esta es la versión de autor del artículo publicado en: This is an author produced version of a paper published in:Pattern Recognition Letters 28.1 (2007)
AbstractBoosting is used to determine the order in which classifiers are aggregated in a bagging ensemble. Early stopping in the aggregation of the classifiers in the ordered bagging ensemble allows the identification of subensembles that require less memory for storage, classify faster and can improve the generalization accuracy of the original bagging ensemble. In all the classification problems investigated pruned ensembles with 20 % of the original classifiers show statistically significant improvements over bagging. In problems where boosting is superior to bagging, these improvements are not sufficient to reach the accuracy of the corresponding boosting ensembles. However, ensemble pruning preserves the performance of bagging in noisy classification tasks, where boosting often has larger generalization errors. Therefore, pruned bagging should generally be preferred to complete bagging and, if no information about the level of noise is available, it is a robust alternative to AdaBoost.
Abstract-This paper proposes an image classification method based on extracting image features using Haar random forests and combining them with a spatial matching kernel SVM. The method works by combining multiple efficient, yet powerful, learning algorithms at every stage of the recognition process. On the task of identifying aquatic stonefly larvae, the method has state-of-the-art or better performance, but with much higher efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.