2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) 2018
DOI: 10.1109/icmla.2018.00189
|View full text |Cite
|
Sign up to set email alerts
|

Centroid Estimation Based on Symmetric KL Divergence for Multinomial Text Classification Problem

Abstract: We define a new method to estimate centroid for text classification based on the symmetric KL-divergence between the distribution of words in training documents and their class centroids. Experiments on several standard data sets indicate that the new method achieves substantial improvements over the traditional classifiers.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
1
1

Relationship

3
2

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 7 publications
(6 reference statements)
0
8
0
Order By: Relevance
“…For example, we can treat each class as a multinomial distribution, and the corresponding documents are samples generated by the distribution. With this assumption, we desire to find the centroid for every class, by either using the maximum likelihood function or defining other different objective functions [2] in both supervised and unsupervised learning version [7]. Although the assumption of this method is not exact in this task, Naive Bayes achieves high accuracy in practical problems.…”
Section: Related Workmentioning
confidence: 99%
“…For example, we can treat each class as a multinomial distribution, and the corresponding documents are samples generated by the distribution. With this assumption, we desire to find the centroid for every class, by either using the maximum likelihood function or defining other different objective functions [2] in both supervised and unsupervised learning version [7]. Although the assumption of this method is not exact in this task, Naive Bayes achieves high accuracy in practical problems.…”
Section: Related Workmentioning
confidence: 99%
“…KL divergence is an asymmetric metric, however, the symmetric version of the KL divergence is often used [18], [19].…”
Section: B Kullback-leibler Divergencementioning
confidence: 99%
“…1a, b, we see the testing error is only decreasing slightly as t increasing from 0.1 to 2. We summarize this fact as follows 3 We take 10 largest groups in Reuter-21578 dataset (a) and 20 news group dataset (b), and take 90% of the data as training set. The y-axis is the accuracy, and the x-axis is the class index Proposition 6.1 For prediction purpose, the correlation factor t can take value in the interval…”
Section: Robustness Of T For Predictionmentioning
confidence: 99%
“…There are some researches on how to relax this restriction, such as the feature weighting approach [12,33] and instanceweighting approach [32]. [3] proposed a method that finds better estimation of centroid, which helps improve the accuracy of Naive Bayes estimation. In order to tackle the situation where there does not exist enough labelled data for each class, we propose a novel estimation method.…”
Section: Introductionmentioning
confidence: 99%