2014
DOI: 10.1155/2014/717092
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

Abstract: With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve B… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
24
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 54 publications
(24 citation statements)
references
References 8 publications
(11 reference statements)
0
24
0
Order By: Relevance
“…Otherwise, MI value reaches the maximum when feature distribution is in intra-class only. The work presented in [26] suggested that features may convey similar information in the feature space. In conjecture to that, features evininge similar information are grouped to select the most representative features from each group.…”
Section: Feature Subset Selection (Fss)mentioning
confidence: 99%
See 2 more Smart Citations
“…Otherwise, MI value reaches the maximum when feature distribution is in intra-class only. The work presented in [26] suggested that features may convey similar information in the feature space. In conjecture to that, features evininge similar information are grouped to select the most representative features from each group.…”
Section: Feature Subset Selection (Fss)mentioning
confidence: 99%
“…The k-means clustering algorithm works iteratively to assign features to one of the k clusters based on the similar information features. To determine the optimal number of clusters (k) as mentioned in [26]…”
Section: Feature Subset Selection (Fss)mentioning
confidence: 99%
See 1 more Smart Citation
“…A number of works can be traced in recent years addressing the problem of text classification through feature selection. Feature selection algorithms such as chisquare, information gain, and mutual information (Yang and Pedersen., 1997) though seem to be powerful techniques for text data, a number of novel feature selection algorithms based on genetic algorithm (Bharti and Singh., 2016;Ghareb et al, 2016), ant colony optimization (Dadaneh et al, 2016;Moradi and Gholampour., 2016;Uysal., 2016;Meena et al, 2012), Bayesian principle Zhang et al, 2016;Feng et al, 2012;Fenga et al, 2015;Sarkar et al, 2014), clustering of features (Bharti and Singh., 2015), global information gain (Shang et al, 2013), adaptive keyword (Tasci and Gungor., 2013), global ranking (Pinheiro et al, 2012;Pinheiro et al, 2015) are proposed.…”
Section: Related Workmentioning
confidence: 99%
“…To achieve high classification result of the Web Page Classification (WPC) system, an excellent representation of textual data (Preprocessing/DR) should contain as much information as possible from the original document [8]. Also, the accuracy of most classification algorithms depends on the quality and size of training data which is inherently dependent on the document representation technique [9]. Several researchers have contributed to the document representation stage of the web page classification system because irrelevant and redundant features often degrade the performance of the classification algorithms both in speed and classification accuracy and also its tendency to reduce overfitting [10].…”
Section: Introductionmentioning
confidence: 99%