Suboptimality of Penalized Empirical Risk Minimization in Classification

Lecué, Guillaume

doi:10.1007/978-3-540-72927-3_12

Cited by 17 publications

(20 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Let us finally mention a related result in a close but slightly different framework. In the classification framework, under a global margin condition with ϕ(x) ∝ x 2κ with κ ≥ 1, Theorem 3 in [18] shows that for any M n ≥ 2, a family (u m ) m∈Mn of M n classifiers exists for which, for any selection procedure m, some distribution P exists such that…”

Section: Lower Bound For Some Non-nested Modelsmentioning

confidence: 99%

“…This result and Theorem 2 focus on different problems. In [18], the margin condition is only assumed to hold globally, and the focus is on the dependence of the remainder term on the cardinality M n of M n . Therefore, the counterexample given in [18] implies nothing about local margin conditions for (f m ) m∈Mn .…”

Section: Lower Bound For Some Non-nested Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Margin-adaptive model selection in statistical learning

Arlot¹,

Bartlett²

2011

Bernoulli

View full text Add to dashboard Cite

A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows one to take into account that learning within a small model can be much easier than within a large one. Requiring this "strong margin adaptivity" makes the model selection problem more challenging. We first prove, in a general framework, that some penalization procedures (including local Rademacher complexities) exhibit this adaptivity when the models are nested. Contrary to previous results, this holds with penalties that only depend on the data. Our second main result is that strong margin adaptivity is not always possible when the models are not nested: for every model selection procedure (even a randomized one), there is a problem for which it does not demonstrate strong margin adaptivity.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ288 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

show abstract

Section: Lower Bound For Some Non-nested Modelsmentioning

confidence: 99%

Section: Lower Bound For Some Non-nested Modelsmentioning

confidence: 99%

Margin-adaptive model selection in statistical learning

Arlot¹,

Bartlett²

2011

Bernoulli

View full text Add to dashboard Cite

show abstract

“…One can show [10,12,15,17] that, if (x, y) = (x − y) 2 , then for every random map A there exists a constant c depending only on the map and on δ, such that for every n, H (n, M, δ) ≥ c/ √ n. In fact, the result is even stronger-this lower bound holds for every individual class F that satisfies certain conditions (rather than a lower bound for the "worst" case) and for a wider class of loss functions.…”

Section: Introductionmentioning

confidence: 98%

Aggregation via empirical risk minimization

Lecué

Mendelson

2008

Probab. Theory Relat. Fields

Self Cite

View full text Add to dashboard Cite

Given a finite set F of estimators, the problem of aggregation is to construct a new estimator whose risk is as close as possible to the risk of the best estimator in F. It was conjectured that empirical minimization performed in the convex hull of F is an optimal aggregation method, but we show that this conjecture is false. Despite that, we prove that empirical minimization in the convex hull of a well chosen, empirically determined subset of F is an optimal aggregation method.

show abstract

“…It has been proved that ERM is suboptimal for the aggregation problem (cf. Proposition 2.1 in [19] or Chapter 3.5 in [7], Theorem 1.1 in [24], Theorem 3 in [22], Theorem 2 in [26] and Theorem 2.1 in [29]). Somehow, this procedure does not take advantage of the convexity of the loss since the class of functions on which the empirical risk is minimized to construct the ERM is F , a finite set.…”

mentioning

confidence: 99%

Optimal learning with Q-aggregation

Lecué¹,

Rigollet

2014

Ann. Statist.

Self Cite

View full text Add to dashboard Cite

We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang [Ann. Statist. 40 (2012) 1878-1905 for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the Q-aggregation procedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled.

show abstract

Suboptimality of Penalized Empirical Risk Minimization in Classification

Cited by 17 publications

References 29 publications

Margin-adaptive model selection in statistical learning

Margin-adaptive model selection in statistical learning

Aggregation via empirical risk minimization

Optimal learning with Q-aggregation

Contact Info

Product

Resources

About