Classification and Optimization Scheme for Text Data using Machine Learning Naïve Bayes Classifier

Venkatesh,; Ranjitha, K. V.

doi:10.1109/wsce.2018.8690536

Cited by 26 publications

(9 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For supervised classification, if we assume all the categories follow independent multinomial distribution and each document is a sample generated by the distribution, a straight-forward idea would be applying some linear model to do classification, such as Support Vector Machine [4,13], which is used to find the maximum-margin hyper-plane that divides the documents with different labels. Under these assumptions, another important method is Naive Bayes (NB) [7,15,26,31], which uses scores based on the 'probabilities' of each document conditioned on the categories. NB classifier learns from training data to estimate the distribution of each category, then computes the conditional probability of each document given the class label by applying Bayes rule.…”

Section: Introductionmentioning

confidence: 99%

Improved Naive Bayes with optimal correlation factor for text classification

et al. 2019

View full text Add to dashboard Cite

Naive Bayes (NB) estimator is widely-used in text classification problems. However, it does not perform well with smallsize training datasets. Most previous literature focuses on either creating and modifying features or combing clustering to improve the performance of NB. We directly tackle the problem by constructing a new estimator, called Naive Bayes with correlation factor. We introduce a correlation factor to NB estimator that incorporates overall correlation among the different classes. This effectively exploits the idea of bootstrapping, which reuses data for all classes even if they only belong to one class. Moreover, we obtain a formula for the optimal correlation factor by balancing bias and variance of the estimator. Experimental results on real-world data show that our estimator achieves better accuracy compared with traditional Naive Bayes, yet at the same time maintaining the simplicity of NB.

show abstract

Section: Introductionmentioning

confidence: 99%

Improved Naive Bayes with optimal correlation factor for text classification

et al. 2019

View full text Add to dashboard Cite

show abstract

“…The research in [8] showed that in the classification of those three different datasets, the accuracy of the proposed method is higher than the accuracy of the classical Gaussian Naïve Bayes classifier. While explaining the drawbacks of Hadoop MapReduce in performing text classification, the authors in [7] argued that their proposed machine learning approach to classifying text data is less time consuming than that achieved with Hadoop MapReduce. Classification in Hadoop uses K-Means Clustering, which requires a large amount of time to perform the classification, thereby increasing the latency [7].…”

Section: Text Classification Using Machine Learningmentioning

confidence: 99%

“…While explaining the drawbacks of Hadoop MapReduce in performing text classification, the authors in [7] argued that their proposed machine learning approach to classifying text data is less time consuming than that achieved with Hadoop MapReduce. Classification in Hadoop uses K-Means Clustering, which requires a large amount of time to perform the classification, thereby increasing the latency [7]. Motivated by this, the authors in [7] proposed a machine learning method based on a Naïve Bayes classifier.…”

Section: Text Classification Using Machine Learningmentioning

confidence: 99%

“…Classification in Hadoop uses K-Means Clustering, which requires a large amount of time to perform the classification, thereby increasing the latency [7]. Motivated by this, the authors in [7] proposed a machine learning method based on a Naïve Bayes classifier. A medical dataset was used for classification.…”

Section: Text Classification Using Machine Learningmentioning

confidence: 99%

“…A medical dataset was used for classification. Based on the disease, the proposed method checks for a class label that indicates whether a person is suffering from a disease or not [7]. The research presented in [28] proposed a novel classification method by improving the Naïve Bayes algorithm based on Improving Term Frequency-Inverse Document Frequency (ITF-IDF).…”

Section: Text Classification Using Machine Learningmentioning

confidence: 99%

See 2 more Smart Citations

A Parallel Processing Technique for Filtering and Storing User Specified Data

Chanda¹

View full text Add to dashboard Cite

Users are often interested in a specific type of data (user-preferred data) from a largevolume dataset. An efficient system that only stores user-preferred data from the large dataset can reduce the search latency, which allows the users to search for relevant information in a timely manner. The motivation behind this thesis is to devise a technique that filters a large dataset and stores only the filtered data, thereby saving storage space for the user. Running the filtering operation can be CPU-intensive, which can lead to high latency in extracting preferred data from the dataset. To solve this problem, the technique employs parallel processing and machine learning. A proof-of-concept prototype for this technique has been built on Apache Spark. The performance of the prototype subjected to synthetic datasets is analyzed. The analysis of experimental results shows the viability of this technique and provides insights into the system behavior and performance.

show abstract

Evaluating Binary Classifiers with Word Embedding Techniques for Public Grievances

Shah

Joshi

2022

Communications in Computer and Information Science

View full text Add to dashboard Cite

Classification and Optimization Scheme for Text Data using Machine Learning Naïve Bayes Classifier

Cited by 26 publications

References 7 publications

Improved Naive Bayes with optimal correlation factor for text classification

Improved Naive Bayes with optimal correlation factor for text classification

A Parallel Processing Technique for Filtering and Storing User Specified Data

Evaluating Binary Classifiers with Word Embedding Techniques for Public Grievances

Contact Info

Product

Resources

About