2023
DOI: 10.3390/cancers15030681
|View full text |Cite
|
Sign up to set email alerts
|

Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning

Abstract: Breast cancer is one of the most common invasive cancers in women and it continues to be a worldwide medical problem since the number of cases has significantly increased over the past decade. Breast cancer is the second leading cause of death from cancer in women. The early detection of breast cancer can save human life but the traditional approach for detecting breast cancer disease needs various laboratory tests involving medical experts. To reduce human error and speed up breast cancer detection, an automa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
2

Relationship

2
8

Authors

Journals

citations
Cited by 21 publications
(11 citation statements)
references
References 40 publications
(46 reference statements)
0
4
0
Order By: Relevance
“…In this study, we tested several machine learning algorithms which have significant applications in different domains, such as the health care [ 40 ], Internet of Things (IoT) [ 41 ], machine vision [ 42 ], edge computing [ 43 ], education [ 44 , 45 ], and many others. In order to conduct a fair comparative evaluation of our proposed SSC model for the detection of thyroid disease, we chose the following machine learning classifiers: RF due to its effectiveness, interpretability, non-parametric nature, and high accuracy rate across a range of data types; GBM, which has various benefits including adaptability, robust tolerance to anomalous inputs, and high accuracy; AdaBoost, since it is less susceptible to overfitting; LR, because its training and implementation processes are simple; and Support Vector Classifier (SVC), that has advantages including efficiently handling high dimensional data [ 46 ]. For these algorithms to perform at their maximum, we optimized their hyperparameters.…”
Section: Methodsmentioning
confidence: 99%
“…In this study, we tested several machine learning algorithms which have significant applications in different domains, such as the health care [ 40 ], Internet of Things (IoT) [ 41 ], machine vision [ 42 ], edge computing [ 43 ], education [ 44 , 45 ], and many others. In order to conduct a fair comparative evaluation of our proposed SSC model for the detection of thyroid disease, we chose the following machine learning classifiers: RF due to its effectiveness, interpretability, non-parametric nature, and high accuracy rate across a range of data types; GBM, which has various benefits including adaptability, robust tolerance to anomalous inputs, and high accuracy; AdaBoost, since it is less susceptible to overfitting; LR, because its training and implementation processes are simple; and Support Vector Classifier (SVC), that has advantages including efficiently handling high dimensional data [ 46 ]. For these algorithms to perform at their maximum, we optimized their hyperparameters.…”
Section: Methodsmentioning
confidence: 99%
“…Medical image analysis domains, on the other hand, do not have access to such big datasets. Consequently, depending on the need to expand the amount of data, different augmentation techniques have been used in the existing literature [ 26 , 27 , 28 ]. In this study, the size of the training dataset was increased using these techniques.…”
Section: Methodsmentioning
confidence: 99%
“…Proper encoding ensures that categorical variables are utilized appropriately by the model, leading to an enhancement in its performance as described in study [45]. The study [46]described the data preparation approaches like oversampling, under-sampling, and the development of synthetic samples that can successfully address class imbalance issues. Equilibrating the dataset improves the model's ability to learn from underrepresented classes and forecast accurately across all categories.…”
Section: Dataset Descriptionmentioning
confidence: 99%