2021
DOI: 10.1007/s10115-021-01581-5
|View full text |Cite
|
Sign up to set email alerts
|

On entropy-based term weighting schemes for text categorization

Abstract: Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs. The widely used term weighting scheme in text categorization, i.e., tf.idf, is originated from information retrieval (IR) field. The intuition behind idf for text categorization seems less reasonable than IR. In this paper, we introduce inverse category frequency (icf) into term weighting scheme and propose two novel approaches, i.e., tf.icf and icf-based supervised term weighting schemes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 73 publications
(65 reference statements)
0
14
0
Order By: Relevance
“…In the supervised text mining task of document classification, different approaches utilizing class information are proposed to estimate the collection-based term weighting factors (Wang and Zhang, 2013;Debole and Sebastiani, 2003;Lan et al, 2009). Inverse category frequency (icf) (Wang and Zhang, 2013) has been shown to produce better classification result than the traditional idf factor with the cosine similarity measure. It considers the distribution of a term among classes rather than among documents in the given collection.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…In the supervised text mining task of document classification, different approaches utilizing class information are proposed to estimate the collection-based term weighting factors (Wang and Zhang, 2013;Debole and Sebastiani, 2003;Lan et al, 2009). Inverse category frequency (icf) (Wang and Zhang, 2013) has been shown to produce better classification result than the traditional idf factor with the cosine similarity measure. It considers the distribution of a term among classes rather than among documents in the given collection.…”
Section: Discussionmentioning
confidence: 99%
“…It considers the distribution of a term among classes rather than among documents in the given collection. The intuition behind icf is that the fewer classes a term t i occurs in, the more discriminating power the term t i contributes to classification (Wang and Zhang, 2013). If C and c i are the total number of classes and the number of classes in which t i occurs at least once in at least one document, then the icf factor is estimated as: icf (t i ) = log 1 + C c i .…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations