2017
DOI: 10.48550/arxiv.1704.00642
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Local nearest neighbour classification with applications to semi-supervised learning

Abstract: We derive a new asymptotic expansion for the global excess risk of a local-k-nearest neighbour classifier, where the choice of k may depend upon the test point. This expansion elucidates conditions under which the dominant contribution to the excess risk comes from the decision boundary of the optimal Bayes classifier, but we also show that if these conditions are not satisfied, then the dominant contribution may arise from the tails of the marginal distribution of the features. Moreover, we prove that, provid… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(19 citation statements)
references
References 23 publications
0
19
0
Order By: Relevance
“…The purpose of adding 1 to Kn q in (13) is to ensure that k is at least 1. Our method shares some similarity with [19], which uses the result of kernel density estimate to determine k. However, [19] requires a sufficiently large number of unlabeled data to ensure that the estimated density function is sufficiently close to the real density function, so that the adaptive kNN algorithm converges as fast as the case in which f (x) is known and the selection of k is based on the real f (x). On the contrary, our method does not require unlabeled data, and we do not hope to have an accurate estimation of the density.…”
Section: B Proposed Adaptive Knn Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The purpose of adding 1 to Kn q in (13) is to ensure that k is at least 1. Our method shares some similarity with [19], which uses the result of kernel density estimate to determine k. However, [19] requires a sufficiently large number of unlabeled data to ensure that the estimated density function is sufficiently close to the real density function, so that the adaptive kNN algorithm converges as fast as the case in which f (x) is known and the selection of k is based on the real f (x). On the contrary, our method does not require unlabeled data, and we do not hope to have an accurate estimation of the density.…”
Section: B Proposed Adaptive Knn Methodsmentioning
confidence: 99%
“…Here, we use (17) to replace the Hölder condition, so that it is possible to impose an assumption that is approximately the same as requiring the second-order smoothness of η. Assumption 1 (d) is the minimum probability mass assumption, which was already used in existing works [18,19].…”
Section: Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…In the context of binary classification, estimation properties through various forms of dimension reduction have been studied by, among others Chapelle et al (2009); Grandvalet and Bengio (2005); Zhu (2005); Zhu and Goldberg (2009). For example, Cannings et al (2017) recently proposed a local k-nearest neighbors which uses fewer neighbors by utilizing the additional data to estimate the marginal density of features. They provide asymptotic results for the excess risk over R p , O(n −4/(p+4) ) when E X 4+c 2 < ∞ and n 2+p/r = O(m), where c > 0, r ∈ (0, 2] is the smoothness of the covariates' density.…”
Section: Asymptotic Theorymentioning
confidence: 99%
“…Second, although the regret of k-NN deteriorates polynomially with respect to the corruption level, it actually achieves the best possible accuracy for testing randomly perturbed data (under a fine tuned choice of k). Hence k-NN classifier is rate-minimax for both clean data testing task [2,22,4] and randomly perturbed data testing task.…”
Section: Introductionmentioning
confidence: 99%