2009
DOI: 10.1109/tpami.2008.110
|View full text |Cite
|
Sign up to set email alerts
|

Supervised and Traditional Term Weighting Methods for Automatic Text Categorization

Abstract: In vector space model (VSM), text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different importance in a text. The term weighting methods assign appropriate weights to the terms to improve the performance of text categorization. In this study, we inves… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
66
0
5

Year Published

2014
2014
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 426 publications
(74 citation statements)
references
References 18 publications
3
66
0
5
Order By: Relevance
“…Even so, STW seems to be more reasonable and promising than TF-IDF. Later, STW has attracted a lot of interest from researchers, e.g., Lan, Tan, Su, and Lu (2009) , Altinçay and Erenel (2010), Liu, Loh and Sun (2009), Wang and Zhang (2013), Ren and Sohrab (2013), Nguyen, Chang, and Hui (2013) , Peng, Liu, and Zuo (2014) , and Deng et al (2014) , etc. Term weighting becomes one of the hot research topics in text classification and various new STW schemes have been proposed from time to time.…”
Section: Introductionmentioning
confidence: 99%
“…Even so, STW seems to be more reasonable and promising than TF-IDF. Later, STW has attracted a lot of interest from researchers, e.g., Lan, Tan, Su, and Lu (2009) , Altinçay and Erenel (2010), Liu, Loh and Sun (2009), Wang and Zhang (2013), Ren and Sohrab (2013), Nguyen, Chang, and Hui (2013) , Peng, Liu, and Zuo (2014) , and Deng et al (2014) , etc. Term weighting becomes one of the hot research topics in text classification and various new STW schemes have been proposed from time to time.…”
Section: Introductionmentioning
confidence: 99%
“…Due to the binarization of the original multi-class problems, we need a way to average the multiple F -scores from the individual problems. One reasonable approach introduced in [19], also adopted in this paper, is in two folds: (i) microaveraged F -score defined as 2P R P +R where P = c T P (c) c P P (c) and R = c T P (c) c W P (c) are based on the averaged positives (here T P (c), P P (c), and RP (c) are from the c-th binary problem) and (ii) macro-averaged F -score defined to be 2P R P +R P = (1/K) c p(c) and R = (1/K) c r(c) with the averaged precision and recall (here p(c) and r(c) are the precision and the recall for the c-th problem).…”
Section: Discussionmentioning
confidence: 99%
“…Lan et al (2009) suggested a measure named Relevance Frequency (RF) and proposed a supervised term weighting method tf * rf by considering the distribution of relevant documents in the collection. The basic idea of tf * rf is that, if a high frequency term is more concentrated in the positive category than in the negative category, then it makes more contributions in selecting the positive samples than the negative samples.…”
Section: Jcsmentioning
confidence: 99%
“…But for categorization tasks, training data is available with class labels and this rich source of information can be utilized in weighting the features. For text categorization, different supervised term weighting methods are suggested in the literature (Debole and Sebastani, 2003;Lan et al, 2009). By assigning higher weights for relevant terms, the performance of classification can be improved (Lan et al, 2009).…”
Section: Introductionmentioning
confidence: 99%