2007
DOI: 10.1214/009053606000001217
|View full text |Cite
|
Sign up to set email alerts
|

Fast learning rates for plug-in classifiers

Abstract: It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than $n^{-1/2}$. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order $n^{-1}$, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

28
461
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 287 publications
(489 citation statements)
references
References 21 publications
(35 reference statements)
28
461
0
Order By: Relevance
“…3 This flavor of result is "nonasymptotic" in that it can be phrased in a way that gives the probability of misclassification for any training data set size; we do not need an asymptotic assumption that the amount of training data goes to infinity. Chaudhuri and Dasgupta's result subsumes or matches classical results by Fix and Hodges (1951), Devroye et al (1994), Cérou and Guyader (2006), and Audibert and Tsybakov (2007), while providing a perhaps more intuitive explanation for when nearest neighbor classification works, accounting for the metric used and the distribution from which the data are sampled. Moreover, we show that their analysis can be translated to the regression setting, yielding theoretical guarantees that nearly match the best of existing regression results.…”
Section: Nearest Neighbor Methods In Theorysupporting
confidence: 66%
“…3 This flavor of result is "nonasymptotic" in that it can be phrased in a way that gives the probability of misclassification for any training data set size; we do not need an asymptotic assumption that the amount of training data goes to infinity. Chaudhuri and Dasgupta's result subsumes or matches classical results by Fix and Hodges (1951), Devroye et al (1994), Cérou and Guyader (2006), and Audibert and Tsybakov (2007), while providing a perhaps more intuitive explanation for when nearest neighbor classification works, accounting for the metric used and the distribution from which the data are sampled. Moreover, we show that their analysis can be translated to the regression setting, yielding theoretical guarantees that nearly match the best of existing regression results.…”
Section: Nearest Neighbor Methods In Theorysupporting
confidence: 66%
“…In the supplementary material, we prove that, for a submanifold M with a bounded second fundamental form and for a reasonably general assumption on the underlying probability distribution P on M ⊂ R n (see [1] for details), the estimated second fundamental form and the shape operator converge point-wise to the true second fundamental form and the shape operator as the number of data points tends to infinity, while the diameter of the neighborhood N k (X α ) tends to zero. We also demonstrate this with a toy example.…”
Section: Regularization On a Point Cloud In R Nmentioning
confidence: 99%
“…This similarity allows us to draw on recent theoretical results for classification by Devroye et al (1996), Tsybakov (2004), Massart and Nédélec (2006), Audibert and Tsybakov (2007), and Kerkyacharian et al (2014), among others, and to adapt them to the treatment choice problem. The minimax rate optimality of the EWM treatment choice rule (proved in Theorems 2.1 and 2.2 below) is analogous to the minimax rate optimality of the Empirical Risk Minimization classifier in the classification problem shown by Devroye and Lugosi (1995).…”
Section: Related Literaturementioning
confidence: 96%