2018
DOI: 10.1155/2018/1817479
|View full text |Cite
|
Sign up to set email alerts
|

Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

Min-Wei Huang,
Wei-Chao Lin,
Chih-Fong Tsai

Abstract: Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(19 citation statements)
references
References 26 publications
0
19
0
Order By: Relevance
“…Meanwhile, other studies have emphasized the significance of detecting outliers in the observed dataset prior to imputation of missing values. [34].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Meanwhile, other studies have emphasized the significance of detecting outliers in the observed dataset prior to imputation of missing values. [34].…”
Section: Discussionmentioning
confidence: 99%
“…The model-driven imputation algorithm requires that the observable data has no missing values in the dataset, so the characteristics of the observable data directly affect the results of the imputation [34]. Training data usually contains noisy data or outliers that will affect the final performance of the trained model [35,36].…”
Section: Introductionmentioning
confidence: 99%
“…As the latter does not apply to this dataset, methods that were applied were trimming and winsorization. It is often the case that clinical research (such as epidemiologic studies, multiple sclerosis, Parkinson’s disease, and AD studies, and reports having issues with incomplete datasets) increasingly employs various data preprocessing techniques, including winsorization, in resolving these issues [38, 39, 40, 41, 42, 43, 44, 45, 46], as it has been performed in the current study.…”
Section: Discussionmentioning
confidence: 99%
“…Guideline: Present Details about Whether and How "Bad Experiments" or "Bad Values" Were Removed from Graphs and Analyses. The removal of outliers can be legitimate or even necessary but can also lead to type I errors (false positive) and exaggerated results (Bakker and Wicherts, 2014;Huang et al, 2018).…”
Section: Explanation Of the Guidelinesmentioning
confidence: 99%