Learning Theory
DOI: 10.1007/978-3-540-72927-3_12
|View full text |Cite
|
Sign up to set email alerts
|

Suboptimality of Penalized Empirical Risk Minimization in Classification

Abstract: Abstract. Let F be a set of M classification procedures with values in [−1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((log M )/n) 1/2 or (log M )/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Mini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
19
0

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 17 publications
(20 citation statements)
references
References 29 publications
1
19
0
Order By: Relevance
“…Let us finally mention a related result in a close but slightly different framework. In the classification framework, under a global margin condition with ϕ(x) ∝ x 2κ with κ ≥ 1, Theorem 3 in [18] shows that for any M n ≥ 2, a family (u m ) m∈Mn of M n classifiers exists for which, for any selection procedure m, some distribution P exists such that…”
Section: Lower Bound For Some Non-nested Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Let us finally mention a related result in a close but slightly different framework. In the classification framework, under a global margin condition with ϕ(x) ∝ x 2κ with κ ≥ 1, Theorem 3 in [18] shows that for any M n ≥ 2, a family (u m ) m∈Mn of M n classifiers exists for which, for any selection procedure m, some distribution P exists such that…”
Section: Lower Bound For Some Non-nested Modelsmentioning
confidence: 99%
“…This result and Theorem 2 focus on different problems. In [18], the margin condition is only assumed to hold globally, and the focus is on the dependence of the remainder term on the cardinality M n of M n . Therefore, the counterexample given in [18] implies nothing about local margin conditions for (f m ) m∈Mn .…”
Section: Lower Bound For Some Non-nested Modelsmentioning
confidence: 99%
“…One can show [10,12,15,17] that, if (x, y) = (x − y) 2 , then for every random map A there exists a constant c depending only on the map and on δ, such that for every n, H (n, M, δ) ≥ c/ √ n. In fact, the result is even stronger-this lower bound holds for every individual class F that satisfies certain conditions (rather than a lower bound for the "worst" case) and for a wider class of loss functions.…”
Section: Introductionmentioning
confidence: 98%
“…It has been proved that ERM is suboptimal for the aggregation problem (cf. Proposition 2.1 in [19] or Chapter 3.5 in [7], Theorem 1.1 in [24], Theorem 3 in [22], Theorem 2 in [26] and Theorem 2.1 in [29]). Somehow, this procedure does not take advantage of the convexity of the loss since the class of functions on which the empirical risk is minimized to construct the ERM is F , a finite set.…”
mentioning
confidence: 99%