2006
DOI: 10.1007/11671299_58
|View full text |Cite
|
Sign up to set email alerts
|

Improving kNN Text Categorization by Removing Outliers from Training Set

Abstract: Abstract. We show that excluding outliers from the training data significantly improves kNN classifier, which in this case performs about 10% better than the best know method-Centroid-based classifier. Outliers are the elements whose similarity to the centroid of the corresponding category is below a threshold.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2006
2006
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 6 publications
(7 reference statements)
0
12
0
Order By: Relevance
“…Many of them focus on reducing classification time [3,4]. Other algorithms focus on increasing classification rates, either changing the method to find nearest neighbors [5], varying the voting schema [6] or improving the training data [7].…”
Section: Introductionmentioning
confidence: 99%
“…Many of them focus on reducing classification time [3,4]. Other algorithms focus on increasing classification rates, either changing the method to find nearest neighbors [5], varying the voting schema [6] or improving the training data [7].…”
Section: Introductionmentioning
confidence: 99%
“…to construct a cross-lingual feature space and uniformly represent different language texts. Secondly, they use traditional monolingual text classification methods to classify, e.g., K-nearest Neighbor [Shin, Abraham and Han (2006)], Naive Bayes [Kim, Han, Rim et al (2006)], Support Vector Machines [Martens, Huysmans, Setiono et al (2008)] and so on. The main difference between different methods is the construction of cross-lingual feature space.…”
Section: Related Workmentioning
confidence: 99%
“…• k-NN classifier is noise tolerant since it uses all training data as relevant, even when training documents contain noise or unbalanced data [28,37].…”
Section: K-nn Improvements For Tcmentioning
confidence: 99%