Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data

Rajendran, Keerthana; Jayabalan, Manoj; Thiruchelvam, Vinesh

doi:10.14569/ijacsa.2020.0110808

Cited by 20 publications

(14 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, by Ayvaci MU et al [ 63 ], the analysis of demographic, mammography, and biopsy data using logistic regression resulted in an AUC of 0.84. Rajendran k et al [ 64 ] analyzed 2.4 million records of mammography screening and demographic risk factors associated with breast cancer to predict breast cancer using the Naïve Bayes, RF, and C4.5 techniques; the findings indicated the highest AUC (0.993) for Naïve Bayes.…”

Section: Discussionmentioning

confidence: 99%

Prediction of Breast Cancer using Machine Learning Approaches

Rabiei

2022

J Biomed Phys Eng

View full text Add to dashboard Cite

Background: Breast cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social, and economic factors. Machine learning has the potential to predict breast cancer based on features hidden in data. Objective: This study aimed to predict breast cancer using different machine-learning approaches applying demographic, laboratory, and mammographic data. Material and Methods: In this analytical study, the database, including 5,178 independent records, 25% of which belonged to breast cancer patients with 24 attributes in each record was obtained from Motamed cancer institute (ACECR), Tehran, Iran. The database contained 5,178 independent records, 25% of which belonged to breast cancer patients containing 24 attributes in each record. The random forest (RF), neural network (MLP), gradient boosting trees (GBT), and genetic algorithms (GA) were used in this study. Models were initially trained with demographic and laboratory features (20 features). The models were then trained with all demographic, laboratory, and mammographic features (24 features) to measure the effectiveness of mammography features in predicting breast cancer. Results: RF presented higher performance compared to other techniques (accuracy 80%, sensitivity 95%, specificity 80%, and the area under the curve (AUC) 0.56). Gradient boosting (AUC=0.59) showed a stronger performance compared to the neural network. Conclusion: Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans. Collection, storage, and management of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management.

show abstract

Section: Discussionmentioning

confidence: 99%

Prediction of Breast Cancer using Machine Learning Approaches

Rabiei

2022

J Biomed Phys Eng

View full text Add to dashboard Cite

show abstract

“…C4.5 and EC4.5 are the two famous and most widely used DT algorithms [ 12 ]. DT is used extensively by following reference literature: [ 13 , 14 , 15 , 16 ].…”

Section: Basics and Backgroundmentioning

confidence: 99%

Machine-Learning-Based Disease Diagnosis: A Comprehensive Review

2022

View full text Add to dashboard Cite

Globally, there is a substantial unmet need to diagnose various diseases effectively. The complexity of the different disease mechanisms and underlying symptoms of the patient population presents massive challenges in developing the early diagnosis tool and effective treatment. Machine learning (ML), an area of artificial intelligence (AI), enables researchers, physicians, and patients to solve some of these issues. Based on relevant research, this review explains how machine learning (ML) is being used to help in the early identification of numerous diseases. Initially, a bibliometric analysis of the publication is carried out using data from the Scopus and Web of Science (WOS) databases. The bibliometric study of 1216 publications was undertaken to determine the most prolific authors, nations, organizations, and most cited articles. The review then summarizes the most recent trends and approaches in machine-learning-based disease diagnosis (MLBDD), considering the following factors: algorithm, disease types, data type, application, and evaluation metrics. Finally, in this paper, we highlight key results and provides insight into future trends and opportunities in the MLBDD area.

show abstract

“…In the field of breast cancer predictions, some studies used the logistic regression approaches (Bernal et al, 2017;Oyewola et al, 2017;Westerdijk, 2018;Teja et al, 2020), while other studies used neural networks (Wang and Yoon, 2015;Kourou et al, 2015;Hou et al 2020). Other data mining algorithms were used like decision trees (Rajendran et al, 2020), Naïve Bayes methods (Rajendran et al, 2020;Shieh et al, 2016;Williams et al, 2016), Support Vector Machines (Westerdijk, 2018;Mochen and Sundararajan, 2018;Vard et al, 2018), Random Forests (RF) (Oyewola et al, 2017;Westerdijk, 2018;Hou et al, 2020;Rajendran et al, 2020), optimization algorithms (Vard et al, 2018), etc.…”

Section: Research Articlementioning

confidence: 99%

Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning

Khozama

Mayya

2021

Asian Pac J Cancer Prev

View full text Add to dashboard Cite

Objective: Early prediction of breast cancer is one of the most essential fields of medicine. Many studies have introduced prediction approaches to facilitate the early prediction and estimate the future occurrence based on mammography periodic tests. In the current research, we introduce a novel machine learning tool for the early prediction of breast cancer. Methods: Three basic resources are used to identify the most essential risk factors; including the BCSC (Breast Cancer Surveillance Consortium) dataset, a medical questionnaire, and multiple international breast cancer reports. The BCSC dataset has been normalized and balanced; consequently, the questionnaire and the medical reports are analyzed in order to define the degree of importance and a potential weight factor of each risk factor. These weights are used to scale risk factors and then the optimizable tree-based ML model is trained using the balanced weighted risk factors datasets. Results: Three balanced versions of the BCSC dataset are used; oversampled, down-sampled and mixed datasets. Each risk factor has a weight (1, 2 or 4) assigned based on a mathematical modelling of the questionnaire and the international breast cancer reports. The experiments are applied on the weighted and non-weighted versions of the database, and they indicate that the performance increases significantly by using the weighted version of the risk factors. The tests prove that the down-weighting of the non-essential risk factor increases the accuracy and reduces errors. The overall accuracy of the weighted balanced datasets reaches 100%, 95.8% and 95.9% for down-sampled, oversampled and mixed datasets respectively. Conclusion: Weighting the risk factors of the BCSC dataset improves the performance by increasing the accuracy and reducing the false rejection and false discovery rates for all versions of balanced datasets. The weighting approach can also be used to improve the estimation score of breast cancer by scaling the individual scores of risk factors.

show abstract

Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data

Cited by 20 publications

References 29 publications

Prediction of Breast Cancer using Machine Learning Approaches

Prediction of Breast Cancer using Machine Learning Approaches

Machine-Learning-Based Disease Diagnosis: A Comprehensive Review

Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning

Contact Info

Product

Resources

About