Relative term-frequency based feature selection for text categorization

Yang, Shihao; Wu, Xiaobin; Deng, Zhihong; Zhang, Ming; Yang, Dongqing

doi:10.1109/icmlc.2002.1167443

Cited by 373 publications

(555 citation statements)

References 9 publications

Supporting

Mentioning

530

Contrasting

Unclassified

Order By: Relevance

“…Yang et al [5] has investigated several feature selections for text classification. They found that information gain and χ 2 statistic is most effective on English text dataset among five feature selection methods.…”

Section: Modified χ 2 Statisticmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Text Classification Using Term Projection

Zheng

Liu

Teng

et al. 2009

Information Retrieval Technology

View full text Add to dashboard Cite

Abstract. In this paper, we propose an efficient text classification method using term projection. Firstly, we use a modified χ 2 statistic to project terms into predefined categories, which is more efficient compared to other clustering methods. Afterwards, we utilize the generated clusters as features to represent the documents. The classification is then performed in a rule-based manner or via SVM. Experiment results show that our modified χ 2 statistic feature selection method outperforms traditional χ 2 statistic especially at lower dimensionalities. And our method is also more efficient than Latent Semantic Analysis (LSA) on homogeneous dataset. Meanwhile, we can reduce the feature dimensionality by three orders of magnitude to save training and testing cost, and maintain comparable accuracy. Moreover, we could use a small training set to gain an approximately 4.3% improvement on heterogeneous dataset as compared to traditional method, which indicates that our method has better generalization capability.

show abstract

Section: Modified χ 2 Statisticmentioning

confidence: 99%

“…A sophisticated methodology to reduce feature dimensionality is feature selection [5], such as χ 2 statistic, mutual information and information gain. In [6], they show introduce our term projection method in section 3.…”

Section: Introductionmentioning

confidence: 99%

Efficient Text Classification Using Term Projection

Zheng

Liu

Teng

et al. 2009

Information Retrieval Technology

View full text Add to dashboard Cite

show abstract

“…When classifying texts, words included in them are used as classification features [21]. Undoubtedly, Markovian models are now regarded as one of the most significant state-of-the-art approaches for sequence learning.…”

Section: Hidden Markov Modelmentioning

confidence: 99%

“…In addition, most studies do not use SVM as the classification algorithm. For (1) (2) instance, Yang [12] and Pedersen [21] use kNN, and Mladenic and Grobelnic [22] use Naive Bayes [31] in their studies on keyword selection metrics. Later studies reveal that SVM performs consistently better than these classification algorithms.…”

Section: Hidden Markov Modelmentioning

confidence: 99%

A New Text Mining Approach Based on HMM-SVM for Web News Classification

Krishnalal¹,

Rengarajan²,

Srinivasagan³

2010

IJCA

View full text Add to dashboard Cite

Since the emergence of WWW, it is essential to handle a very large amount of electronic data of which the majority is in the form of text. This scenario can be effectively handled by various Data Mining techniques. This paper proposes an intelligent system for online news classification based on Hidden Markov Model (HMM) and Support Vector Machine (SVM). An intelligent system is designed to extract the keywords from the online news paper content and classify it according to the pre defined categories. Three different stages are designed to classify the content of online newspapers such as (1) Text

show abstract

“…Therefore, following Yang and Pedersen (1997), for each question we calculate the information gain of each feature of these types on the training set. We then remove those features having the lowest information gain as well as those features occurring less than ten times in the dataset.…”

Section: Feature Selectionmentioning

confidence: 99%

Vote Prediction on Comments in Social Polls

Persing¹,

Ng²

2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

A poll consists of a question and a set of predefined answers from which voters can select. We present the new problem of vote prediction on comments, which involves determining which of these answers a voter selected given a comment she wrote after voting. To address this task, we exploit not only the information extracted from the comments but also extra-textual information such as user demographic information and inter-comment constraints. In an evaluation involving nearly one million comments collected from the popular SodaHead social polling website, we show that a vote prediction system that exploits only textual information can be improved significantly when extended with extra-textual information.

show abstract

Relative term-frequency based feature selection for text categorization

Cited by 373 publications

References 9 publications

Efficient Text Classification Using Term Projection

Efficient Text Classification Using Term Projection

A New Text Mining Approach Based on HMM-SVM for Web News Classification

Vote Prediction on Comments in Social Polls

Contact Info

Product

Resources

About