Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021)

Hasan, Md. Kamrul; Alam, Md. Ashraful; Roy, Shidhartho; Dutta, Aishwariya; Jawad, Md. Tasnim; Das, Sunanda

doi:10.1016/j.imu.2021.100799

Cited by 72 publications

(44 citation statements)

References 217 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Consequently, a number of literature [10]- [12] discusses recent machine learning-based imputation techniques in solving incomplete dataset problems. Nevertheless, with respect to MVI of nature-inspired metaheuristic techniques, the literature receives limited attention.…”

Section: Introductionmentioning

confidence: 99%

Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic Review

et al. 2022

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic Review

et al. 2022

View full text Add to dashboard Cite

show abstract

“…We use different metrics, such as recall, precision, F1-score, and accuracy, to evaluate our multi-tasking CVR-Net for COVID-19 recognition, which is mathematically defined [88] as follows:

where the TP, FN, FP, and TN respectively denote true positive (patient with coronavirus symptoms recognized as the positive patient), false negative (patient with coronavirus symptoms recognized as the negative patient), false positive (patient without coronavirus symptoms recognized as the positive patient), and true negative (patient without coronavirus symptoms recognized as the negative patient). The recall quantifies the type-II error (the patient, with the positive syndromes, inappropriately fails to be nullified), and precision quantifies the positive predictive values (percentage of truly positive recognition among all the positive recognition).…”

Section: Methodsmentioning

confidence: 99%

Challenges of deep learning methods for COVID-19 detection using public datasets

Hasan

Alam

Dahal

et al. 2022

Informatics in Medicine Unlocked

Self Cite

View full text Add to dashboard Cite

“…The cause for that phenomenon is that the imputation results of FTLRI put forward in this paper only depend on the first five and the last three complete data points, which are highly relevant to the data point with missing values in terms of time and attributes. Instead of relying on all the other complete data points, like other imputation approaches, FTLRI selects the eight data points highly correlated with the missing data point to impute the missing value, which is beneficial for imputation performance [46,47]. Therefore, the increasing of the number of data points and the changing of missing rates will not affect the performance of FTLRI, that is, FTLRI can provide superior imputation results on datasets with different missing rates and different numbers of data points.…”

Section: Ma E Comparison Of Real and Imputed Concentration Values Of ...mentioning

confidence: 99%

A Novel Missing Data Imputation Approach for Time Series Air Quality Data Based on Logistic Regression

et al. 2022

View full text Add to dashboard Cite

Missing values in air quality datasets bring trouble to exploration and decision making about the environment. Few imputation methods aim at time series air quality data so that they fail to handle the timeliness of the data. Moreover, most imputation methods prefer low-missing-rate datasets to relatively high-missing-rate datasets. This paper proposes a novel missing data imputation method, called FTLRI, for time series air quality data based on the traditional logistic regression and a presented “first Five & last Three” model, which can explain relationships between disparate attributes and extract data that are extremely relevant, both in terms of time and attributes, to the missing data, respectively. To investigate the performance of FTLRI, it is benchmarked with five classical baselines and a new dynamic imputation method using a neural network with average hourly concentration data of pollutants from three disparate stations in Lanzhou in 2019 under different missing rates. The results show that FTLRI has a significant advantage over the compared imputation approaches, both in the particular short-term and long-term time series air quality data. Furthermore, FTLRI has good performance on datasets with a relatively high missing rate, since it only selects the data extremely related to the missing values instead of relying on all the other data like other methods.

show abstract

Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021)

Cited by 72 publications

References 217 publications

Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic Review

Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic Review

Challenges of deep learning methods for COVID-19 detection using public datasets

A Novel Missing Data Imputation Approach for Time Series Air Quality Data Based on Logistic Regression

Contact Info

Product

Resources

About