Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

Huang, Min-Wei; Lin, Wei-Chao; Tsai, Chih-Fong

doi:10.1155/2018/1817479

Cited by 27 publications

(19 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Meanwhile, other studies have emphasized the significance of detecting outliers in the observed dataset prior to imputation of missing values. [34].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Normalization and Outlier Removal in Class Center-Based Firefly Algorithm for Missing Value Imputation

Nugroho

Utama

Surendro

2021

Preprint

View full text Add to dashboard Cite

Missing data is one of the factors often causing incomplete data in research. Data normalization and missing value handling were considered major problems in the data pre-processing stage, while classification algorithms were adopted to handle numerical features. Furthermore, in cases where the observed data contains outliers, the missing values’ estimated results are sometimes unreliable, or even differ greatly from the true values. This study aims to proposed combination of normalization and outlier removal’s before imputing missing values using several methods, mean, random value, regression, multiple imputation, KNN, and C3-FA. Experimental results on the sonar dataset show normalization and outlier removal’s effect in these imputation methods. In the proposed C3-FA method, this produced accuracy, F1-Score, Precision, and Recall values of 0.906, 0.906, 0.908, and 0.906, respectively. Based on the KNN classifier evaluation results, this value outperformed the other five (5) methods. Meanwhile, the results for RMSE, Dks, and r obtained from combining normalization and outlier removal’s in the C3-FA method were 0.02, 0.04, and 0.935, respectively. This shows that the proposed method is able to reproduce the real values of the data or the prediction accuracy and maintain the distribution of the values or the distribution accuracy.

show abstract

“…Meanwhile, other studies have emphasized the significance of detecting outliers in the observed dataset prior to imputation of missing values. [34].…”

Section: Discussionmentioning

confidence: 99%

“…The model-driven imputation algorithm requires that the observable data has no missing values in the dataset, so the characteristics of the observable data directly affect the results of the imputation [34]. Training data usually contains noisy data or outliers that will affect the final performance of the trained model [35,36].…”

Section: Introductionmentioning

confidence: 99%

Normalization and Outlier Removal in Class Center-Based Firefly Algorithm for Missing Value Imputation

Nugroho

Utama

Surendro

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…As the latter does not apply to this dataset, methods that were applied were trimming and winsorization. It is often the case that clinical research (such as epidemiologic studies, multiple sclerosis, Parkinson’s disease, and AD studies, and reports having issues with incomplete datasets) increasingly employs various data preprocessing techniques, including winsorization, in resolving these issues [38, 39, 40, 41, 42, 43, 44, 45, 46], as it has been performed in the current study.…”

Section: Discussionmentioning

confidence: 99%

SOMAscan-based proteomic measurements of plasma brain natriuretic peptide are decreased in mild cognitive impairment and in Alzheimer's dementia patients

et al. 2019

View full text Add to dashboard Cite

Alzheimer's disease represents the most common age-related neurodegenerative disorder and a leading cause of progressive cognitive impairment. Predicting cognitive decline is challenging but would be invaluable in an increasingly aging population which also experiences a rising cardiovascular risk. In order to examine whether plasma measurements of one of the established biomarkers of heart failure, brain natriuretic peptide (BNP), reflect a decline in cognitive function, associated with Alzheimer's disease neurodegeneration, BNP levels were analysed, by using a novel assay called a SOMAscan, in 1. cognitively healthy, control subjects; 2. subjects with mild cognitive impairment, and 3. subjects with Alzheimer's disease. The results of our study show that the levels of the BNP were significantly different between the three types of diagnoses ( p < 0.05), whereby subjects with mild cognitive impairment had the lowest mean BNP value, and healthy subjects had the highest BNP value. Importantly, our results show that the levels of the BNP are influenced by the presence of at least one APOE4 allele in the healthy ( p < 0.05) and in the Alzheimer's disease groups of subjects ( p < 0.1). As the levels of the BNP appear to be independent of the APOE4 genotype in subjects with mild cognitive impairment, the results of our study support inclusion of measurements of plasma levels of the BNP in the list of the core Alzheimer's disease biomarkers for identification of the mild cognitive impairment group of patients. In addition, the results of our study warrant further investigations into molecular links between Alzheimer's disease-type cognitive decline and cardiovascular disorders.

show abstract

“…Guideline: Present Details about Whether and How "Bad Experiments" or "Bad Values" Were Removed from Graphs and Analyses. The removal of outliers can be legitimate or even necessary but can also lead to type I errors (false positive) and exaggerated results (Bakker and Wicherts, 2014;Huang et al, 2018).…”

Section: Explanation Of the Guidelinesmentioning

confidence: 99%

New Author Guidelines for Displaying Data and Reporting Data Analysis and Statistical Methods in Experimental Biology

Michel¹,

Murphy²,

Motulsky³

2019

J Pharmacol Exp Ther

View full text Add to dashboard Cite

The American Society for Pharmacology and Experimental Therapeutics has revised the Instructions to Authors for Drug Metabolism and Disposition, Journal of Pharmacology and Experimental Therapeutics, and Molecular Pharmacology. These revisions relate to data analysis (including statistical analysis) and reporting but do not tell investigators how to design and perform their experiments. Their overall focus is on greater granularity in the description of what has been done and found. Key recommendations include the need to differentiate between preplanned, hypothesis-testing, and exploratory experiments or studies; explanations of whether key elements of study design, such as sample size and choice of specific statistical tests, had been specified before any data were obtained or adapted thereafter; and explanation of whether any outliers (data points or entire experiments) were eliminated and when the rules for doing so had been defined. Variability should be described by S.D. or interquartile range, and precision should be described by confidence intervals; S.E. should not be used. P values should be used sparingly; in most cases, reporting differences or ratios (effect sizes) with their confidence intervals will be preferred. Depiction of data in figures should provide as much granularity as possible, e.g., by replacing bar graphs with scatter plots wherever feasible and violin or box-and-whisker plots when not. This editorial explains the revisions and the underlying scientific rationale. We believe that these revised guidelines will lead to a less biased and more transparent reporting of research findings.

show abstract

Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

Cited by 27 publications

References 26 publications

Normalization and Outlier Removal in Class Center-Based Firefly Algorithm for Missing Value Imputation

Normalization and Outlier Removal in Class Center-Based Firefly Algorithm for Missing Value Imputation

SOMAscan-based proteomic measurements of plasma brain natriuretic peptide are decreased in mild cognitive impairment and in Alzheimer's dementia patients

New Author Guidelines for Displaying Data and Reporting Data Analysis and Statistical Methods in Experimental Biology

Contact Info

Product

Resources

About