2022
DOI: 10.34172/jrhs.2022.90
|View full text |Cite
|
Sign up to set email alerts
|

Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data

Abstract: Background: This study aims to show the impact of imbalanced data and the typical evaluation methods in developing and misleading assessments of machine learning-based models for preoperative thyroid nodules screening. Study design: A retrospective study. Methods: The ultrasonography features for 431 thyroid nodules cases were extracted from medical records of 313 patients in Babol, Iran. Since thyroid nodules are commonly benign, the relevant data are usually unbalanced in classes. It can lead to the bias of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 34 publications
(36 reference statements)
0
3
0
Order By: Relevance
“…All SPN patients were randomly divided into the training and test sets by 7꞉3 and the patients in the training set were balanced at a ratio of 1:1 by using the synthetic minority oversampling technique, so that the number of benign SPN patients and malignant SPN patients was consistent 35 , 36 . The LASSO algorithm was used to screen features from patients’ clinical data and DLCT parameters, and 6 classical ML models were constructed in the training set based on the selected features, namely AdaBoost, GNB, LR, RF, SVM and XGBoost 37 .…”
Section: Methodsmentioning
confidence: 99%
“…All SPN patients were randomly divided into the training and test sets by 7꞉3 and the patients in the training set were balanced at a ratio of 1:1 by using the synthetic minority oversampling technique, so that the number of benign SPN patients and malignant SPN patients was consistent 35 , 36 . The LASSO algorithm was used to screen features from patients’ clinical data and DLCT parameters, and 6 classical ML models were constructed in the training set based on the selected features, namely AdaBoost, GNB, LR, RF, SVM and XGBoost 37 .…”
Section: Methodsmentioning
confidence: 99%
“…However, imbalanced data are commonly observed in electronic nose studies, but methods to address imbalanced data have seldom been reported [ 35 ]. Imbalanced data can negatively affect the accuracy of a diagnostic test by leading to biased models that perform poorly on minority classes [ 9 ]. When a diagnostic test is trained on an imbalanced dataset, the model may overfit to the majority class and underfit to the minority class, leading to better performance on the former.…”
Section: Introductionmentioning
confidence: 99%
“…If the assumptions are violated, biased estimations and improper inferences can be obtained. 8 Machine learning techniques as applied statistical methods have been considerably utilized in data analysis. These techniques do not contain pre-defined relationships between study variables, and the prediction is available without needing to understand essential mechanisms.…”
Section: Introductionmentioning
confidence: 99%