2016
DOI: 10.1186/s40064-016-2941-7
|View full text |Cite
|
Sign up to set email alerts
|

The distance function effect on k-nearest neighbor classification for medical datasets

Abstract: IntroductionK-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output.Case descriptionSince the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, espec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
164
0
10

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 326 publications
(178 citation statements)
references
References 6 publications
1
164
0
10
Order By: Relevance
“…The basis of kNN is the assumption that biologically similar samples will have similar measured values across most of their metabolites (Troyanskaya et al, 2001). To impute a missing value for one target sample, the k most similar samples are found based on a defined distance metric calculated using the values of metabolites that are present in both the target sample and a candidate neighbor sample (Hu et al, 2016; Kim et al, 2005). Here we test kNN with Euclidean distance in-depth for all methods of missing value generation and also examine kNN with Pearson correlation when comparing NS-kNN to KNN-TN using the more realistic MM approach.…”
Section: Methodsmentioning
confidence: 99%
“…The basis of kNN is the assumption that biologically similar samples will have similar measured values across most of their metabolites (Troyanskaya et al, 2001). To impute a missing value for one target sample, the k most similar samples are found based on a defined distance metric calculated using the values of metabolites that are present in both the target sample and a candidate neighbor sample (Hu et al, 2016; Kim et al, 2005). Here we test kNN with Euclidean distance in-depth for all methods of missing value generation and also examine kNN with Pearson correlation when comparing NS-kNN to KNN-TN using the more realistic MM approach.…”
Section: Methodsmentioning
confidence: 99%
“…Finally, for many applications, it is important to define a similarity or distance measure between two data points in the feature space. The simplest distance measure would be the Euclidean distance:dfalse(A,Bfalse)=false∑i=1n(ai-bi)2between the numerical feature vectors of two data points A and B , for features i=1n, but depending on the type of data we are dealing with there can be many other and sometimes much more complex distance or similarity measures, such as cosine similarity or similarity scores of two biological sequences …”
Section: Data and Featuresmentioning
confidence: 99%
“…between the numerical feature vectors of two data points A and B, for features i = 1 … n, but depending on the type of data we are dealing with there can be many other and sometimes much more complex distance or similarity measures, such as cosine similarity 15 or similarity scores of two biological sequences. 16…”
Section: Data and Featuresmentioning
confidence: 99%
“…There are different ways in which the distance (degree of similarity) can be computed and this often depends on the nature of the data. The Euclidean distance measure is the most popular but others like Chi-square distance, Minskowsky and cosine similarity measure also exist [7].…”
Section: Introductionmentioning
confidence: 99%
“…The kNN algorithm is often said to be a lazy machine learning classifier as there is no training per se [7], i.e. there is no learning or no model is actually built as it is an examplebased classifier.…”
Section: Introductionmentioning
confidence: 99%