2019
DOI: 10.1080/08839514.2019.1637138
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Performance of Data Imputation Methods for Numeric Dataset

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
101
0
11

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 203 publications
(117 citation statements)
references
References 33 publications
2
101
0
11
Order By: Relevance
“…The data analysis pipeline was implemented in Python (version 3.7), using the numpy (version 1.19), pandas (version 1.1) and scikit-learn (version 0.23) libraries. For imputation, the multivariate k-nearest neighbors algorithm was used [26], with k=5. For feature-selection, the recursive feature-elimination algorithm was used [27].…”
Section: Machine Learning Experimental Designmentioning
confidence: 99%
“…The data analysis pipeline was implemented in Python (version 3.7), using the numpy (version 1.19), pandas (version 1.1) and scikit-learn (version 0.23) libraries. For imputation, the multivariate k-nearest neighbors algorithm was used [26], with k=5. For feature-selection, the recursive feature-elimination algorithm was used [27].…”
Section: Machine Learning Experimental Designmentioning
confidence: 99%
“…The kNN algorithm is increasingly used to impute missing data in research with high volume data such as genetics and metabolomics studies [22,23]. In several recent reports the kNN algorithm was shown to produce the smallest imputation error compared to methods such as mean and median imputation, Bayesian linear regression, K-Means, K-Medoids clustering algorithms [24,25]. However, some studies reported that simpler methods such as mean or median replacement were as adequate as methods like kNN when imputation was followed by clustering of genetic data [26].…”
Section: Discussionmentioning
confidence: 99%
“…But for a small percentage of missingness, imputation using the k-nearest neighbour algorithm could be used, which are more accurate than using mean/median values. 13 With the introduction of newer medications, the model performance might be affected. This limitation needs to be assessed and necessary changes in covariates should be updated to ensure a good performance of the model.…”
Section: Advantag E S and Challeng E Smentioning
confidence: 99%