Classification of datasets with imputed missing values: does imputation quality matter?

Roberts, Michael S.; Stanczuk, Jan; Gilbey, Julian; Teare, Philip; Dittmer, Sören; Thorpe, Matthew; Viñas, Ramón; Sala, Evis; Lió, Píetro; Patel, Mishal; Collaboration, AIX-COVNET; Rudd, James H.F.; Mirtti, Tuomas; Rannikko, Antti; Aston, John A. D.; Tang, Jing; Schönlieb, Carola‐Bibiane

doi:10.48550/arxiv.2206.08478

Cited by 2 publications

(1 citation statement)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most important of all, the data management or completeness of data for automated decision making plays an essential role when it comes to statistics and machine learning. The integration between statistics and machine learning can be used to train automated models with imputed missing values in the data, which would improve the generalizability and robustness of models [ 100 ].…”

Section: Discussionmentioning

confidence: 99%

Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis

et al. 2022

View full text Add to dashboard Cite

The practice of medical decision making is changing rapidly with the development of innovative computing technologies. The growing interest of data analysis with improvements in big data computer processing methods raises the question of whether machine learning can be integrated with conventional statistics in health research. To help address this knowledge gap, this paper presents a review on the conceptual integration between conventional statistics and machine learning, focusing on the health research. The similarities and differences between the two are compared using mathematical concepts and algorithms. The comparison between conventional statistics and machine learning methods indicates that conventional statistics are the fundamental basis of machine learning, where the black box algorithms are derived from basic mathematics, but are advanced in terms of automated analysis, handling big data and providing interactive visualizations. While the nature of both these methods are different, they are conceptually similar. Based on our review, we conclude that conventional statistics and machine learning are best to be integrated to develop automated data analysis tools. We also strongly believe that machine learning could be explored by health researchers to enhance conventional statistics in decision making for added reliable validation measures.

show abstract

Section: Discussionmentioning

confidence: 99%

Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis

et al. 2022

View full text Add to dashboard Cite

show abstract

A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data

et al. 2023

View full text Add to dashboard Cite

The National COVID-19 Chest Imaging Database (NCCID) is a centralized UK database of thoracic imaging and corresponding clinical data. It is made available by the National Health Service Artificial Intelligence (NHS AI) Lab to support the development of machine learning tools focused on Coronavirus Disease 2019 (COVID-19). A bespoke cleaning pipeline for NCCID, developed by the NHSx, was introduced in 2021. We present an extension to the original cleaning pipeline for the clinical data of the database. It has been adjusted to correct additional systematic inconsistencies in the raw data such as patient sex, oxygen levels and date values. The most important changes will be discussed in this paper, whilst the code and further explanations are made publicly available on GitLab. The suggested cleaning will allow global users to work with more consistent data for the development of machine learning tools without being an expert. In addition, it highlights some of the challenges when working with clinical multi-center data and includes recommendations for similar future initiatives.

show abstract

Classification of datasets with imputed missing values: does imputation quality matter?

Cited by 2 publications

References 37 publications

Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis

Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis

A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data

Contact Info

Product

Resources

About