Abstract:Breast cancer is one of the second leading causes of cancerdeath in women. Despite the fact that cancer is preventable and curable in primary stages, the huge number of patients are diagnosed with cancer very late. Conventional methods of detecting and diagnosing cancer mainly depend on skilled physicians, with the help of medical imaging, to detect certain symptoms that usually appear in the later stages of cancer [1]. The objective of this paper is to find the smallest subset of features that can ensure high… Show more
“…Naïve Bayesian (NB) classifier relies on applying Bayes" theorem to estimate the most probable membership of a given event in one of a set of possible classes. It is described as being naïve, since it assumes independence among variables used in the classification process [15], [17], [18].…”
Section: Naïve Bayesian Classifiermentioning
confidence: 99%
“…Consider the duality problem in Soft Margin SVM for nearly linearly differentiated data: (18) Inside: N: number of data point pairs in training set.…”
Diabetes is a chronic disease whereby blood glucose is not metabolized in the body. Electronic health records (EHRs) for each individual or a population have become important to standing developing trends of diseases. Machine/Deep Learning helps provide accurate predictions higher than actual assessments. The main problem that we are trying to apply Machine/Deep learning model and using EHRs that combines the strength of a machine learning model with various features and Hyper-parameter optimization or tuning. The Hyper-parameter optimization uses the random search optimization which minimizes a predefined loss function on given independent data. The evaluation on the method comparisons indicated that Machine/Deep Learning models (Logistic Regression, Artificial Neural Network, Naïve Bayesian Classifier, Support Vector Machine and XGBoost) has improved results compared to the majority of previous models increasing the ratio of metrics (Accuracy, Recall, F1 and AUC score) on the same public dataset that is reprocessed. This shows that the proposed XGBoost model implemented in Amazon SageMaker (Amazon SageMaker was a Cloud Computing service) has the best performance evaluation results. This work is also one of the contributions to the global economic recovery in general and the reduction of medical equipment supply for the care and treatment of diabetics in particular during the Covid-19 pandemic.
“…Naïve Bayesian (NB) classifier relies on applying Bayes" theorem to estimate the most probable membership of a given event in one of a set of possible classes. It is described as being naïve, since it assumes independence among variables used in the classification process [15], [17], [18].…”
Section: Naïve Bayesian Classifiermentioning
confidence: 99%
“…Consider the duality problem in Soft Margin SVM for nearly linearly differentiated data: (18) Inside: N: number of data point pairs in training set.…”
Diabetes is a chronic disease whereby blood glucose is not metabolized in the body. Electronic health records (EHRs) for each individual or a population have become important to standing developing trends of diseases. Machine/Deep Learning helps provide accurate predictions higher than actual assessments. The main problem that we are trying to apply Machine/Deep learning model and using EHRs that combines the strength of a machine learning model with various features and Hyper-parameter optimization or tuning. The Hyper-parameter optimization uses the random search optimization which minimizes a predefined loss function on given independent data. The evaluation on the method comparisons indicated that Machine/Deep Learning models (Logistic Regression, Artificial Neural Network, Naïve Bayesian Classifier, Support Vector Machine and XGBoost) has improved results compared to the majority of previous models increasing the ratio of metrics (Accuracy, Recall, F1 and AUC score) on the same public dataset that is reprocessed. This shows that the proposed XGBoost model implemented in Amazon SageMaker (Amazon SageMaker was a Cloud Computing service) has the best performance evaluation results. This work is also one of the contributions to the global economic recovery in general and the reduction of medical equipment supply for the care and treatment of diabetics in particular during the Covid-19 pandemic.
“…Most of the research papers that have published on the predictive model for breast cancer have shown relatively high prediction accuracies [7], [8], [21]. However, a widespread problem in medical data is a class imbalance, which was failed to be addressed by any of these previous papers.…”
A widespread global health concern among women is the incidence of the second most leading cause of fatality which is breast cancer. Predicting the occurrence of breast cancer based on the risk factors will pave the way to an early diagnosis and an efficient treatment in a quicker time. Although there are many predictive models developed for breast cancer in the past, most of these models are generated from highly imbalanced data. The imbalanced data is usually biased towards the majority class but in cancer diagnosis, it is crucial to diagnose the patients with cancer correctly which are oftentimes the minority class. This study attempts to apply three different class balancing techniques namely oversampling (Synthetic Minority Oversampling Technique (SMOTE)), undersampling (SpreadSubsample) and a hybrid method (SMOTE and SpreadSubsample) on the Breast Cancer Surveillance Consortium (BCSC) dataset before constructing the supervised learning methods. The algorithms employed in this study include Naïve Bayes, Bayesian Network, Random Forest and Decision Tree (C4.5). The balancing method which yields the best performance across all the four classifiers were tested using the validation data to determine the final predictive model. The performances of the classifiers were evaluated using a Receiver Operating Characteristic (ROC) curve, sensitivity, and specificity.
“…The comparative study on different classifiers namely, Naï ve Bayes, SVM, and ensemble classifiers were implemented on the processed dataset. Naï ve Bayes yields the optimum accuracy of 97.39% on classifying the breast cancer with time complexity of 0.1020 milliseconds [11]. Further, study have shown improvement using the Sequential Minimal Optimization (SMO) to overcome the quadratic programming problem arises during the SVM training [12].…”
Survivability of patients suffering from breast cancer varies according to the stages. The early detection of breast cancer increase the longevity of patients. However, the number of risk factors involved in the detection exponentially increases with the medical examinations. The need for automated data mining techniques to enable cost-effective and early prediction of cancer is rapidly becoming a trend in healthcare industry. The optimal techniques for prediction and diagnosis differs significantly due to the risk factors. This study reviews article provides a holistic view of the types of data mining techniques used in prediction of breast cancer. On a whole, the computer-aided automatic data mining techniques that are commonly employed in diagnosis and prognosis of chronic diseases include Decision Tree, Naï ve Bayes, Association rule, Multilayer Perceptron (MLP), Random Forest, and Support Vector Machines (SVM), among others. The accuracy and overall performance of the classifiers differ for every dataset and thereby this article attempts to provide a mean to understand the approaches involved in the early prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.