Optimal aggregation of classifiers in statistical learning

Tsybakov, Alexandre B.

doi:10.1214/aos/1079120131

Cited by 461 publications

(550 citation statements)

References 28 publications

Supporting

Mentioning

524

Contrasting

Order By: Relevance

“…Lugosi (2002) and Devroye and Lugosi (1995) then confirmed these reservations by studying the interpolation case where the best classification error inf t∈S P[t(X ) = Y ] of a given class S is nonzero but small. By further analyzing the problem, Mammen and Tsybakov (1999), Tsybakov (2004) and Massart and Nédélec (2005) show that the behavior of the regression function η : x → P[Y = 1|X = x] around 1/2 is crucial. They indeed introduce some margin conditions that can be written in the following general way:…”

Section: Resultsmentioning

confidence: 99%

Model selection by bootstrap penalization for classification

Fromont

2006

Mach Learn

View full text Add to dashboard Cite

We consider the binary classification problem. Given an i.i.d. sample drawn from the distribution of an X × {0, 1}−valued random pair, we propose to estimate the so-called Bayes classifier by minimizing the sum of the empirical classification error and a penalty term based on Efron's or i.i.d. weighted bootstrap samples of the data. We obtain exponential inequalities for such bootstrap type penalties, which allow us to derive non-asymptotic properties for the corresponding estimators. In particular, we prove that these estimators achieve the global minimax risk over sets of functions built from Vapnik-Chervonenkis classes. The obtained results generalize Koltchinskii (2001) and Bartlett et al.'s (2002) ones for Rademacher penalties that can thus be seen as special examples of bootstrap type penalties. To illustrate this, we carry out an experimental study in which we compare the different methods for an intervals model selection problem.

show abstract

Section: Resultsmentioning

confidence: 99%

Model selection by bootstrap penalization for classification

Fromont

2006

Mach Learn

View full text Add to dashboard Cite

show abstract

“…First, we recall the definition of the margin assumption introduced in [30]. Margin Assumption(MA): The probability measure π satisfies the margin assumption MA(κ), where κ ≥ 1 if we have…”

Section: Suboptimality Of Penalized Erm Proceduresmentioning

confidence: 99%

Suboptimality of Penalized Empirical Risk Minimization in Classification

Lecué

Learning Theory

View full text Add to dashboard Cite

Abstract. Let F be a set of M classification procedures with values in [−1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((log M )/n) 1/2 or (log M )/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation.

show abstract

“…The assumption that the noise rates are not equal to 1/2 can be relaxed (at the cost of error values no longer approaching zero) if we assume the weight of the area with noise rate close to 1/2 is bounded (e.g., by applying Tsybakov's noise condition [15]). …”

Section: Noise Rates Different From 1/2mentioning

confidence: 99%

PAC-Learning with General Class Noise Models

Jabbari

Holte

Zilles

2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We introduce a framework for class noise, in which most of the known class noise models for the PAC setting can be formulated. Within this framework, we study properties of noise models that enable learning of concept classes of finite VC-dimension with the Empirical Risk Minimization (ERM) strategy. We introduce simple noise models for which classical ERM is not successful. Aiming at a more generalpurpose algorithm for learning under noise, we generalize ERM to a more powerful strategy. Finally, we study general characteristics of noise models that enable learning of concept classes of finite VC-dimension with this new strategy.

show abstract

Optimal aggregation of classifiers in statistical learning

Cited by 461 publications

References 28 publications

Model selection by bootstrap penalization for classification

Model selection by bootstrap penalization for classification

Suboptimality of Penalized Empirical Risk Minimization in Classification

PAC-Learning with General Class Noise Models

Contact Info

Product

Resources

About