2020
DOI: 10.3390/app10072344
|View full text |Cite
|
Sign up to set email alerts
|

The Feature Selection Effect on Missing Value Imputation of Medical Datasets

Abstract: In practice, many medical domain datasets are incomplete, containing a proportion of incomplete data with missing attribute values. Missing value imputation can be performed to solve the problem of incomplete datasets. To impute missing values, some of the observed data (i.e., complete data) are generally used as the reference or training set, and then the relevant statistical and machine learning techniques are employed to produce estimations to replace the missing values. Since the collected dataset usually … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(18 citation statements)
references
References 41 publications
0
18
0
Order By: Relevance
“…RF is one of the excellent choices; it is tolerable in data noise and trains with reasonable accuracy. The decision trees are independent of each other [20]. The tree rules can be generated using two techniques.…”
Section: Random Forest Algorithmmentioning
confidence: 99%
“…RF is one of the excellent choices; it is tolerable in data noise and trains with reasonable accuracy. The decision trees are independent of each other [20]. The tree rules can be generated using two techniques.…”
Section: Random Forest Algorithmmentioning
confidence: 99%
“…The percentage of missing data was calculated and variables with more than 30% missing data were rejected from the analysis since data imputation for larger amounts of missingness may become very imprecise [52]. Using MATLAB version 2019b (The MathWorks, Inc., Natick, MA, USA), missing data for the remaining variables were imputed by the k-nearest neighbors technique using k = 3 [53]. Each ∆ variable was normalized with respect to its absolute maximum value.…”
Section: Statistical Analysesmentioning
confidence: 99%
“…Considering HDLSS data, reducing the number of features is crucial to perform nonoverparameterized classification-based analyses [54]. Feature selection techniques can be classified into filter, wrapper, and embedded methods [53]. Filter methods can be combined with any machine learning model and are much faster and less prone to overfitting than wrapper and embedded methods [55].…”
Section: Feature Selectionmentioning
confidence: 99%
“…The simplest one is arguably to substitute the missing values with their ensemble average [7]. More sophisticated imputation strategies obtain better results employing, e.g., multilayer perceptrons, extreme gradient boosting machines, and support vector machines [8][9][10]. For example, Vivar et al [11] improved the classification of individuals in the datasets of the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Parkinson's Progression Markers Initiative (PPMI) by employing a multiple recurrent graph convolutional network to impute the missing features (the brain volumes obtained from magnetic resonance imaging (MRI)).…”
Section: Introductionmentioning
confidence: 99%