Machine learning for health must be reproducible to ensure reliable clinical use. We evaluated 511 scientific papers across several machine learning subfields and found that machine learning for health compared poorly to other areas regarding reproducibility metrics, such as dataset and code accessibility. We propose recommendations to address this problem.
Machine Intelligence (MI) is rapidly becoming an important approach across biomedical discovery, clinical research, medical diagnostics/devices, and precision medicine. Such tools can uncover new possibilities for researchers, physicians, and patients, allowing them to make more informed decisions and achieve better outcomes. When deployed in healthcare settings, these approaches have the potential to enhance efficiency and effectiveness of the health research and care ecosystem, and ultimately improve quality of patient care. In response to the increased use of MI in healthcare, and issues associated when applying such approaches to clinical care settings, the National Institutes of Health (NIH) and National Center for Advancing Translational Sciences (NCATS) co-hosted a Machine Intelligence in Healthcare workshop with the National Cancer Institute (NCI) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) on 12 July 2019. Speakers and attendees included researchers, clinicians and patients/ patient advocates, with representation from industry, academia, and federal agencies. A number of issues were addressed, including: data quality and quantity; access and use of electronic health records (EHRs); transparency and explainability of the system in contrast to the entire clinical workflow; and the impact of bias on system outputs, among other topics. This whitepaper reports on key issues associated with MI specific to applications in the healthcare field, identifies areas of improvement for MI systems in the context of healthcare, and proposes avenues and solutions for these issues, with the aim of surfacing key areas that, if appropriately addressed, could accelerate progress in the field effectively, transparently, and ethically.npj Digital Medicine (2020) 3:47 ; https://doi.
Characterizing COVID-19 and Influenza Illnesses in the Real World via Person-Generated Health Data Highlights d We use data from smartphones and wearables from~7,000 people to compare flu and COVID-19 d While symptoms have some overlap, patients report longer COVID-19 illnesses than flu d Elevated resting heart rate measures are more frequent around illness symptoms onset d It is important to consider flu as a confounder in COVID-19 real-world studies
Commercial wearable devices are surfacing as an appealing mechanism to detect COVID-19 and potentially other public health threats, due to their widespread use. To assess the validity of wearable devices as population health screening tools, it is essential to evaluate predictive methodologies based on wearable devices by mimicking their real-world deployment. Several points must be addressed to transition from statistically significant differences between infected and uninfected cohorts to COVID-19 inferences on individuals. We demonstrate the strengths and shortcomings of existing approaches on a cohort of 32,198 individuals who experience influenza like illness (ILI), 204 of which report testing positive for COVID-19. We show that, despite commonly made design mistakes resulting in overestimation of performance, when properly designed wearables can be effectively used as a part of the detection pipeline. For example, knowing the week of year, combined with naive randomised test set generation leads to substantial overestimation of COVID-19 classification performance at 0.73 AUROC. However, an average AUROC of only 0.55 ± 0.02 would be attainable in a simulation of real-world deployment, due to the shifting prevalence of COVID-19 and non-COVID-19 ILI to trigger further testing. In this work we show how to train a machine learning model to differentiate ILI days from healthy days, followed by a survey to differentiate COVID-19 from influenza and unspecified ILI based on symptoms. In a forthcoming week, models can expect a sensitivity of 0.50 (0-0.74, 95% CI), while utilising the wearable device to reduce the burden of surveys by 35%. The corresponding false positive rate is 0.22 (0.02-0.47, 95% CI). In the future, serious consideration must be given to the design, evaluation, and reporting of wearable device interventions if they are to be relied upon as part of frequent COVID-19 or other public health threat testing infrastructures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.