2019
DOI: 10.1007/s42452-019-0383-x
|View full text |Cite
|
Sign up to set email alerts
|

Missing data imputation with fuzzy feature selection for diabetes dataset

Abstract: Missing data in datasets remain as a difficulty in terms of data analysis in various research fields, especially in the medical field, as it affects the treatment and diagnosis that the patient should receive. In this research, Fuzzy c-means (FCM) are used to impute the missing data. However, like in most data imputation methods, FCM do not consider the presence of irrelevant features. Irrelevant features can increase the computational time of the imputation process and decrease the accuracy of the prediction.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
9
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 40 publications
(10 citation statements)
references
References 27 publications
1
9
0
Order By: Relevance
“…This means that the reduction of Core and Reduct dimensions increases the results of Fuzzy C-Means clustering. This applies to all distance functions.Few of the aforementioned results are linear with the previous research[25][26]. R. Zhao, L. Gu, dan X. Zhu also did research in the same field as this research.…”
supporting
confidence: 79%
“…This means that the reduction of Core and Reduct dimensions increases the results of Fuzzy C-Means clustering. This applies to all distance functions.Few of the aforementioned results are linear with the previous research[25][26]. R. Zhao, L. Gu, dan X. Zhu also did research in the same field as this research.…”
supporting
confidence: 79%
“…Accuracy in high dimensional setting, generalization of the approach (Leke et al, 2017) Deep belief network Performs well for larger missing ratio Deep neural network High approximation power Generative adversarial nets Effectively recover the data with a few parameters of the input data (Qu et al, 2018) Long-short-term memory þ support vector regression Performs well for time series block missing pattern with a high missing ratio (Li et al, 2019) Swarm intelligence Impute missing data in a high-dimensional data set (Leke and Marwala, 2016) Transfer learning Use evolutionary searches and neural networks applied in the context of transfer learning (Gupta et al, 2019) Dimensionality reduction Principal component analysis (PCA) Better classification accuracy and faster computational time (Dzulkalnine and Sallehuddin, 2019) Suitable for high level of missingness (Lai and Kuok, 2019) (continued ) k-nearest neighbors (kNN) Objective, data-driven and generic, and they can be easily applied for estimating missing precipitation (Pan et al, 2015) Accounts for MNAR (Jiang and Yang, 2015) Addresses the correlation between attributes (Lee and Styczynski, 2018) Attention to feature relevance (Liu et al, 2020) Focused on important features dealing with missing observations (Daberdaku et al, 2020) Improved performance on large data sets, cost effective, computation efficient and accurate (Keerin et al, 2012) Imputes missing data regardless of missing intervals (Teegavarapu, 2014) Local data clustering being incorporated for improved quality and efficiency (Kim et al, 2017) Missing data imputation of longitudinal clinical data (Sanjar et al, 2020) Application of...…”
Section: Cuckoo Searchmentioning
confidence: 99%
“…Deep learning-cuckoo search (DL-CS) imputation technique exhibited 87% accuracy with high-dimensional data sets and outperformed other similar deep learning imputation methods (Gupta et al, 2019). Fuzzy c-means imputation using significant features produced much lower RMSE value of 0.049 compared to 4.930 obtained with grey fuzzy neural network (GFNN) on the experimentation data set (Dzulkalnine and Sallehuddin, 2019). Even at 60% missing rate, the semi-supervised RF imputation method showed an accuracy of 87% (Ishioka, 2013).…”
Section: Rq3: Evaluation Of Imputationmentioning
confidence: 99%
“…Biessmann et al [14] uses the deep learning model for truthful imputation of non-numeric values. Dzulkalnine et al [15] implement the feature selection hybrid model to impute the missing data by the integration of the Fuzzy Principle component analysis (FPCA), support vector machine, and the Fuzzy c-means (FCM) to select the relevant features only in the missed data treatment process. Sherif et al [16] offered a new approach using clustering, the local least square imputation method, then select the smallest Euclidian distance to catch the missed data value from a similar cluster to the missed value.…”
Section: Missing Data Handlingmentioning
confidence: 99%