Optimization of Feature Selection Using Genetic Algorithm in Naïve Bayes Classification for Incomplete Data

Khotimah, Bain Khusnul; Miswanto, Miswanto; Suprajitno, Herry

doi:10.22266/ijies2020.0229.31

Cited by 17 publications

(14 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This is due to many reasons, which can be summarized as follows; (1) NB can provide fast predictions rather than other classification algorithms because the training time has an order O(N) with the dataset, (2) it can be easily trained with small amount of input training dataset and it can be used also for large datasets as well, (3) the simplicity and easy implementation with the ability of real-time training for new items, (4) the implementation of this classifier has no required adjusting parameters or domain knowledge, (5) It handles both continuous and discrete data, (6) NB is less sensitive to missing data, (7) NB has high capability to handle the noise in the dataset, (8) NB is an Incremental learning approach because its functions work from an approximation of low-order probabilities which are extracted from the training data. Hence, these can be quickly updated as new training data are obtained, (9) If the Naive Bayes conditional independence assumption holds, then it will converge quicker than discriminative models like logistic regression, (10) NB can be used for both binary and multiclass classification problems and (11) NB is sufficient for real-time applications such as diseases diagnoses because it relies on a set of pre-computed probabilities that make the classification done in a very short time (Khotimah et al 2020;Kaur and Oberoi 2020), Although NB has proven efficiency with real-time applications, its performance is sometimes thumping in many cases because of the unrealistic assumption that all features have the same degree of importance and are independent of the given class value. Hence, this unrealistic assumption should be mitigated to overcome such hurdles.…”

Section: Introductionmentioning

confidence: 99%

Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy

Mansour¹,

Saleh

Badawy

et al. 2021

J Ambient Intell Human Comput

View full text Add to dashboard Cite

The outbreak of Coronavirus has spread between people around the world at a rapid rate so that the number of infected people and deaths is increasing quickly every day. Accordingly, it is a vital process to detect positive cases at an early stage for treatment and controlling the disease from spreading. Several medical tests had been applied for COVID-19 detection in certain injuries, but with limited efficiency. In this study, a new COVID-19 diagnosis strategy called Feature Correlated Naïve Bayes (FCNB) has been introduced. The FCNB consists of four phases, which are; Feature Selection Phase (FSP), Feature Clustering Phase (FCP), Master Feature Weighting Phase (MFWP), and Feature Correlated Naïve Bayes Phase (FCNBP). The FSP selects only the most effective features among the extracted features from laboratory tests for both COVID-19 patients and non-COVID-19 people by using the Genetic Algorithm as a wrapper method. The FCP constructs many clusters of features based on the selected features from FSP by using a novel clustering technique. These clusters of features are called Master Features (MFs) in which each MF contains a set of dependent features. The MFWP assigns a weight value to each MF by using a new weight calculation method. The FCNBP is used to classify patients based on the weighted Naïve Bayes algorithm with many modifications as the correlation between features. The proposed FCNB strategy has been compared to recent competitive techniques. Experimental results have proven the effectiveness of the FCNB strategy in which it outperforms recent competitive techniques because it achieves the maximum (99%) detection accuracy.

show abstract

Section: Introductionmentioning

confidence: 99%

Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy

Mansour¹,

Saleh

Badawy

et al. 2021

J Ambient Intell Human Comput

View full text Add to dashboard Cite

show abstract

“…This algorithm is a statistical prediction model that predicts output variables using the input in a way that none of the input variables affect each other. On the other hand, no combinational input variables have strength for determining the probability of occurring the output variable (28,29). MLP: This algorithm consists of computational units known as neurons that exist in the input, hidden, and out layers of the artificial neural network (ANN).…”

Section: Model Development and Assessmentmentioning

confidence: 99%

Comparing Data Mining Algorithms for Breast Cancer Diagnosis

et al. 2022

View full text Add to dashboard Cite

Background: Early screening and diagnosis of breast cancer (BC) is critical for improving the quality of care and reducing the mortality rate. Objectives: This study aimed to construct and compare the performance of several machine learning (ML) algorithms in predicting BC. Methods: This descriptive and applied study included 1,052 samples (442 BC and 710 non-BC) with 30 features related to positive and negative BC diagnoses. The data mining (DM) process was implemented using the selected algorithm, including J-48 and random forest (RF) decision tree (DT), multilayer perceptron (MLP), Naïve Bayes (NB), Adaboost (AB), and logistics regression (LR) classifier. Then, we obtained the best algorithm by comparing their performances using the confusion matrix and area under the receiver operator characteristics (ROC) curve (AUC). Finally, we adopted the best model for BC prognosis. Results: The results of evaluating various DM algorithms revealed that the J-48 DT algorithm had the best performance (AUC = 0.922), followed by the AB, MLP, LR, and RF algorithms (AUC: 0.899, 0819, 0.716, and 0.703, respectively). Also, the NB algorithm achieved the lowest performance in this regard (AUC = 0.669). Conclusions: The ML presents a reasonable level of accuracy for an early diagnosis and screening of breast malignancies. Also, the empirical results showed that the J-48 DT algorithm yielded higher performance than other classifiers.

show abstract

“…Pada proses evolusi, sejumlah gen penyusun kromosom akan mengalami proses persilangan dan mutasi. Genetic Algorithm menggunakan transisi probabilistik untuk memilih kromosom terbaik untuk mendapatkan solusi yang optimal [23].…”

Section: Genetic Algorithmunclassified

Analisis Optimasi Algoritma Klasifikasi Naive Bayes menggunakan Genetic Algorithm dan Bagging

Nugroho

Religia

2021

RESTI

View full text Add to dashboard Cite

The increasing demand for credit applications to banks has motivated the banking world to switch to more sophisticated techniques for analyzing the level of credit risk. One technique for analyzing the level of credit risk is the data mining approach. Data mining provides a technique for finding meaningful information from large amounts of data by way of classification. However, bank marketing data is a type of imbalance data so that if the classification is done the results are less than optimal. The classification algorithm that can be used for imbalance data types can use naïve Bayes. Naïve Bayes performs well in terms of classification. However, optimization is needed in order to obtain more optimal classification results. Optimization techniques in handling imbalance data have been developed with several approaches. Bagging and Genetic Algorithms can be used to overcome imbalance data. This study aims to compare the accuracy level of the naïve Bayes algorithm after optimization using the bagging and genetic algorithm. The results showed that the combination of bagging and a genetic algorithm could improve the performance of Naive Bayes by 4.57%.

show abstract

Optimization of Feature Selection Using Genetic Algorithm in Naïve Bayes Classification for Incomplete Data

Cited by 17 publications

References 27 publications

Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy

Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy

Comparing Data Mining Algorithms for Breast Cancer Diagnosis

Analisis Optimasi Algoritma Klasifikasi Naive Bayes menggunakan Genetic Algorithm dan Bagging

Contact Info

Product

Resources

About