Iterative Robust Semi-Supervised Missing Data Imputation

Fazakis, Nikos; Kostopoulos, Georgios; Kotsiantis, Sotiris; Mporas, Iosif

doi:10.1109/access.2020.2994033

Cited by 23 publications

(11 citation statements)

References 50 publications

(84 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As future work, at first, it would be beneficial to apply different techniques for handling of missing values such as [82] and experiment with even more feature selection techniques. Moreover, it would be interesting to evaluate the impact of dimentionality reduction with techniques such as principal component analysis [83] in T2DM prediction This work is licensed under a Creative Commons Attribution 4.0 License.…”

Section: Discussionmentioning

confidence: 99%

Machine Learning Tools for Long-Term Type 2 Diabetes Risk Prediction

et al. 2021

Self Cite

View full text Add to dashboard Cite

working and living environments supporting active and healthy ageing.ABSTRACT A steady rise has been observed in the percentage of elderly people who want and are still able to contribute to society. Therefore, early retirement or exit from the labour market, due to healthrelated issues, poses a significant problem. Nowadays, thanks to technological advances and various data from different populations, the risk factors investigation and health issues screening are moving towards automation. In the context of this work, a worker-centric, IoT enabled unobtrusive users health, wellbeing and functional ability monitoring framework, empowered with AI tools, is proposed. Diabetes is a high-prevalence chronic condition with harmful consequences for the quality of life and high mortality rate for people worldwide, in both developed and developing countries. Hence, its severe impact on humans' life, e.g., personal, social, working, can be considerably reduced if early detection is possible, but most research works in this field fail to provide a more personalized approach both in the modeling and prediction process. In this direction, our designed system concerns diabetes risk prediction in which specific components of the Knowledge Discovery in Database (KDD) process are applied, evaluated and incorporated. Specifically, dataset creation, features selection and classification, using different Supervised Machine Learning (ML) models are considered. The ensemble WeightedVotingLRRFs ML model is proposed to improve the prediction of diabetes, scoring an Area Under the ROC Curve (AUC) of 0.884. Concerning the weighted voting, the optimal weights are estimated by their corresponding Sensitivity and AUC of the ML model based on a bi-objective genetic algorithm. Also, a comparative study is presented among the Finnish Diabetes Risk Score (FINDRISC) and Leicester risk score systems and several ML models, using inductive and transductive learning. The experiments were conducted using data extracted from the English Longitudinal Study of Ageing (ELSA) database.INDEX TERMS T2DM, long-term health risk prediction, machine learning, ensemble learning

show abstract

Section: Discussionmentioning

confidence: 99%

Machine Learning Tools for Long-Term Type 2 Diabetes Risk Prediction

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Second, it is based on machine learning. Third, it is based on deep learning [6]. Statistical analysis methods include mean imputation, regression imputation, hot deck imputation, multiple imputation, and multiple implications by chained equations (MICE).…”

Section: Background Theorymentioning

confidence: 99%

Semi-GAN: An Improved GAN-Based Missing Data Imputation Method for the Semiconductor Industry

Lee

Connerton

Lee³

et al. 2022

IEEE Access

View full text Add to dashboard Cite

Complete data are required for the operation, maintenance, and detection of faults in semiconductor equipment. Missing data occur frequently because of defects such as sensor, data storage, and communication faults, leading to reductions in yield, quality, and productivity. Although many attempts have been made to solve this problem in other fields, few studies have specifically addressed data imputation in the semiconductor industry. In this study, an improved generative adversarial network (GAN)-based missing data imputation for the semiconductor industry called Semi-GAN is proposed. This study introduces a machine learning approach for dealing with data imputation in the semiconductor industry. The proposed method was applied to real data and evaluated using traditional techniques. In particular, the proposed method showed excellent results compared to traditional attribution methods when all missing data ratios in the experiments were less than 20%. It was also observed to be superior when simple and repetitive patterns were omitted rather than repetitive but not simple patterns.

show abstract

“…Considering that the quality of data is essential for building effective and robust ML models [27], a preprocess analysis was performed for cleaning and preparing the data before applying a ML algorithm. For this purpose, the missing values of the numerical attributes were imputed employing the mean imputation method.…”

Section: Data Descriptionmentioning

confidence: 99%