Identification of outlier bootstrap samples

García, Joaquín Muñoz; Pino‐Mejías, Rafael; Muñoz-Pichardo, J. M.; Cubiles-de-la-Vega, María-Dolores

doi:10.1080/02664769723729

Cited by 9 publications

(6 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As the number of unique items in a bootstrap sample is an important determinant of the behaviours of prediction rules learned on it, the distribution of this quantity should be of interest to researchers working on their development and validation. While related distributions have long been studied in a purer mathematical context [11], and this distribution has been identified before in this setting [16,1], nowhere were we able to find a concise and accessible summary of the relevant information for the benefit of researchers in machine learning. Our aim here is to fill this gap by presenting this distribution along with its key properties, and to make it easier for others who to understand or modify resampling techniques in a machine learning context.…”

Section: Introductionmentioning

confidence: 98%

What is the distribution of the number of unique original items in a bootstrap sample?

Mendelson,

Zuluaga,

Hutton

et al. 2016

Preprint

View full text Add to dashboard Cite

Sampling with replacement occurs in many settings in machine learning, notably in the bagging ensemble technique and the .632+ validation scheme. The number of unique original items in a bootstrap sample can have an important role in the behaviour of prediction models learned on it. Indeed, there are uncontrived examples where duplicate items have no effect. The purpose of this report is to present the distribution of the number of unique original items in a bootstrap sample clearly and concisely, with a view to enabling other machine learning researchers to understand and control this quantity in existing and future resampling techniques. We describe the key characteristics of this distribution along with the generalisation for the case where items come from distinct categories, as in classification. In both cases we discuss the normal limit, and conduct an empirical investigation to derive a heuristic for when a normal approximation is permissible.

show abstract

Section: Introductionmentioning

confidence: 98%

What is the distribution of the number of unique original items in a bootstrap sample?

Mendelson,

Zuluaga,

Hutton

et al. 2016

Preprint

View full text Add to dashboard Cite

show abstract

“…Several empirical studies carried out in [7] showed closer estimations of the parameters under study and a reduction of the standard deviations of such estimations. These results were theoretically confirmed in [10].…”

Section: Reduced Bootstrapmentioning

confidence: 99%

“…However, this simulation process is affected by a series of errors and variabilities, as is formalized in [7]. For this reason, several alternative techniques have been proposed, as those recorded by [4], [8], [9].…”

Section: Reduced Bootstrapmentioning

confidence: 99%

“…In [7] we defined a variation of Efron´s method II based on the outlier bootstrap sample concept, namely OBS, that is based on only considering those bootstrap samples having a number of distinct original observations d n greater or equal to some value computed from the distribution of such random variable d n . Several empirical studies carried out in [7] showed closer estimations of the parameters under study and a reduction of the standard deviations of such estimations.…”

Section: Reduced Bootstrapmentioning

confidence: 99%

See 1 more Smart Citation

Bagging Classification Models with Reduced Bootstrap

Pino‐Mejías

Cubiles-de-la-Vega

López-Coello

et al. 2004

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract.Bagging is an ensemble method proposed to improve the predictive performance of learning algorithms, being specially effective when applied to unstable predictors. It is based on the aggregation of a certain number of prediction models, each one generated from a bootstrap sample of the available training set. We introduce an alternative method for bagging classification models, motivated by the reduced bootstrap methodology, where the generated bootstrap samples are forced to have a number of distinct original observations between two values k 1 and k 2 . Five choices for k 1 and k 2 are considered, and the five resulting models are empirically studied and compared with bagging on three real data sets, employing classification trees and neural networks as the base learners. This comparison reveals for this reduced bagging technique a trend to diminish the mean and the variance of the error rate.

show abstract

“…The method is an extension of the one introduced by Muñoz-García et al [5], that takes k 2 = n. Note that ordinary bootstrap is a particular case of reduced bootstrap with k 1 = 1 and k 2 = n.…”

Section: Introductionmentioning

confidence: 99%

A Monte Carlo comparison of three consistent bootstrap procedures

Pino‐Mejías

Jiménez-Gamero

Enguix-González

2009

Journal of Statistical Computation and Simulation

View full text Add to dashboard Cite

Since bootstrap samples are simple random samples with replacement from the original sample, the information content of some bootstrap samples can be very low. To avoid this fact, some authors have proposed several variants of the classical bootstrap. In this paper we consider two of them: the sequential or Poisson bootstrap and the reduced bootstrap. Both of them, like ordinary bootstrap, can yield second order accurate distribution estimators, that is, the three bootstrap procedures are asymptotically equivalent. The question that naturally arises is which of them should be used in a practical situation, in other words, which of them should be used for finite sample sizes. To try to answer this question, we have carried out a simulation study. Although no method was found to exhibit best performance in all the considered situations, some recommendations are given.

show abstract

Identification of outlier bootstrap samples

Cited by 9 publications

References 1 publication

What is the distribution of the number of unique original items in a bootstrap sample?

What is the distribution of the number of unique original items in a bootstrap sample?

Bagging Classification Models with Reduced Bootstrap

A Monte Carlo comparison of three consistent bootstrap procedures

Contact Info

Product

Resources

About