MSVM-kNN: Combining SVM and k-NN for Multi-class Text Classification

Yuan, Pingpeng; Chen, Yuqin; Jin, Hai; Huang, Li

doi:10.1109/wscs.2008.36

Cited by 33 publications

(15 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance measures used are Recall, Precision, F1 -measure and Accuracy [28]. These can be calculated in following equations 7, 8 and 9 respectively:…”

Section: Results and Performance Evaluatonmentioning

confidence: 99%

A Novel Weighted Classification Approach using Linguistic Text Mining

Jindal¹,

Taneja²

2017

IJCA

View full text Add to dashboard Cite

Text categorization is the process of automatically assigning labels or categories to new or previously unseen text documents. The text documents may be unstructured or semi structured in nature. In our work, we have used concepts of natural language processing for text categorization. That is, a lexical approach for text categorization. We have developed an algorithm which automatically classifies articles into their categories. The algorithm identifies tokens and assigns them weights in the abstracts of journal articles. We have implemented our approach using K Nearest Neighbor (KNN) classifier as it is the most widely used classifier in research. The proposed algorithm Lexical KNN (LKNN) has been evaluated on two datasets. One is set of journal articles of computer science discipline and the other is a collection of medical documents (Ohsumed collection).The experimental results show that our proposed algorithm Lexical KNN (LKNN) performs better than the other existing classifiers.

show abstract

“…The performance measures used are Recall, Precision, F1 -measure and Accuracy [28]. These can be calculated in following equations 7, 8 and 9 respectively:…”

Section: Results and Performance Evaluatonmentioning

confidence: 99%

A Novel Weighted Classification Approach using Linguistic Text Mining

Jindal¹,

Taneja²

2017

IJCA

View full text Add to dashboard Cite

show abstract

“…In another approach [6], four different accuracy measurements are utilized to compare three different algorithms (Shereen Khoja Stemmer, Tim Buckwalter Morphological analyzer and Tri-literal Root Extraction Algorithm) with gold standard. The methods of each stemmer to remove affixes are different, for example, Khoja extracted the word to get the stem by removing the longest affixes whilst Buckwalter used all prefixes to compile only one lexicon and Tri-literal used weighting of word depending on their position.…”

Section: Related Workmentioning

confidence: 99%

“…In the best state, the registered accuracy algorithm was 75%. According to the results, the accuracy of the Khoja stemmer had the highest ranked place, then tri-literal algorithm and then followed by Buckwalter morphological analyzer [6].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification

et al. 2018

View full text Add to dashboard Cite

Stemming is one of the most significant preprocessing. stages in text categorization that most of the academic investigators aim to improve and optimize the accuracy of the classification task. High dimensionality of feature space is one of the challenges in text classification that can be decreased by many techniques. In stemming, high dimensionality of feature space is decreased by grouping those words that they have same grammatical forms and then getting their root. This work is dedicated to build an approach for Kurdish language classification using Reber Stemmer. Thus, an innovative approach is investigated to get the stem of words in Kurdish language by removing longest suffix and prefixes of words. This approach has a strong capability and meets the requirements in responding to the process of deleting as many of the required affixes as possible to get the stem of words in Kurdish language. The advantage of this stemmer is that it ignores the ordering list of affixes that receives correct stem for more than one words that have the same format. The stemming technique is implemented on KDC-4007 dataset that consists of eight classes. Support Vector Machine (SVM) and Decision Tree (DT or C 4.5) are used for the classification. This stemmer has been successfully compared with the Longest-Match stemmer technique. According to results, the F-measure of Reber stemmer and Longest-Match method in SVM is higher than DT. Reber stemmer in SVM for classes (religion, sport, health and education) obtained higher F-measure, while the rest of classes are lower in Longest-Match. Reber stemmer in DT for classes (religion, sport and art) had higher F-measure for Reber stemmer while in Longest match the rest of classes showed lower F-measure.

show abstract

“…Various supervised machine learning techniques have been proposed in literature for the automatic classification of text documents such as Naïve Bayes [1] [17], Neural Networks [20], SVM (Support Vector Machine) [22] [23] [24], Decision Tree and also by combining approaches [12] [21] [25].…”

Section: Modeling: Selection Of Appropriate Machine Learning Techniqumentioning

confidence: 99%

Automatic Text Classification: A Technical Review

Dalal¹,

Zaveri²

2011

IJCA

111

View full text Add to dashboard Cite

Automatic Text Classification is a semi-supervised machine learning task that automatically assigns a given document to a set of pre-defined categories based on its textual content and extracted features. Automatic Text Classification has important applications in content management, contextual search, opinion mining, product review analysis, spam filtering and text sentiment mining. This paper explains the generic strategy for automatic text classification and surveys existing solutions to major issues such as dealing with unstructured text, handling large number of attributes and selecting a machine learning technique appropriate to the text-classification application.

show abstract

MSVM-kNN: Combining SVM and k-NN for Multi-class Text Classification

Cited by 33 publications

References 13 publications

A Novel Weighted Classification Approach using Linguistic Text Mining

A Novel Weighted Classification Approach using Linguistic Text Mining

An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification

Automatic Text Classification: A Technical Review

Contact Info

Product

Resources

About