2020
DOI: 10.1007/978-3-030-58811-3_5
|View full text |Cite
|
Sign up to set email alerts
|

Comparing Statistical and Machine Learning Imputation Techniques in Breast Cancer Classification

Abstract: Missing data imputation is an important task when dealing with crucial data that cannot be discarded such as medical data. This study evaluates and compares the impacts of two statistical and two machine learning imputation techniques when classifying breast cancer patients, using several evaluation metrics. Mean, Expectation-Maximization (EM), Support Vector Regression (SVR) and K-Nearest Neighbor (KNN) were applied to impute 18% of missing data missed completely at random in the two Wisconsin datasets. There… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 51 publications
(33 reference statements)
0
1
0
Order By: Relevance
“…In this study, we utilized DT with the CART algorithm, which demonstrated high accuracy and stable performance due to the absence of marked non-homogeneous characteristics in the random simulation data. Relevant studies 11,12,26 also demonstrated that the above three imputation methods had excellent performance when applied to datasets with continuous and mixed variables. Our study supported these findings from the perspective of missing dichotomous variables.…”
Section: Discussionmentioning
confidence: 94%
“…In this study, we utilized DT with the CART algorithm, which demonstrated high accuracy and stable performance due to the absence of marked non-homogeneous characteristics in the random simulation data. Relevant studies 11,12,26 also demonstrated that the above three imputation methods had excellent performance when applied to datasets with continuous and mixed variables. Our study supported these findings from the perspective of missing dichotomous variables.…”
Section: Discussionmentioning
confidence: 94%
“…Step 4: Model comparison using the SK test The SK test [32] allows comparing several models in terms of performance in order to conclude the existence of a significant difference between them [31,[33][34][35]. The comparison was performed using the MCC criterion.…”
Section: Methodsmentioning
confidence: 99%