Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data

Wong, Jenna; Horwitz, Mara E. Murray; Zhou, Li; Toh, Sengwee

doi:10.1007/s40471-018-0165-9

Cited by 64 publications

(32 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Performance in the development of such algorithms demands a strong collaboration between clinicians and data scientists from the precise formulation of problems to a common and balanced interpretation of results . Moreover, the data‐driven processes inherent to machine learning applications need multiple internal and external validations before dissemination to multisite application and generalization . An important limitation associated with such Big Data analyses is related to data complexity and heterogeneity in quality with significant rates of anomalies that have been identified when assessing the quality of metadata records .…”

Section: The Challenges Of Big Datamentioning

confidence: 99%

Big Data in sleep apnoea: Opportunities and challenges

2019

View full text Add to dashboard Cite

Sleep apnoea is now regarded as a highly prevalent systemic, multimorbid, chronic disease requiring a combination of long-term home-based treatments. Optimization of personalized treatment strategies requires accurate patient phenotyping. Data to describe the broad variety of phenotypes can come from electronic health records, health insurance claims, socio-economic administrative databases, environmental monitoring, social media, etc. Connected devices in and outside homes collect vast amount of data amassed in databases. All this contributes to 'Big Data' that, if used appropriately, has great potential for the benefit of health, well-being and therapeutics. Sleep apnoea is particularly well placed with regards to Big Data because the primary treatment is positive airway pressure (PAP). PAP devices, used every night over long periods by millions of patients across the world, generate an enormous amount of data. In this review, we discuss how different types of Big Data have, and could be, used to improve our understanding of sleep-disordered breathing, to identify undiagnosed sleep apnoea, to personalize treatment and to adapt health policies and better allocate resources. We discuss some of the challenges of Big Data including the need for appropriate data management, compilation and analysis techniques employing innovative statistical approaches alongside machine learning/artificial intelligence; closer collaboration between data scientists and physicians; and respect of the ethical and regulatory constraints of collecting and using Big Data. Lastly, we consider how Big Data can be used to overcome the limitations of randomized clinical trials and advance real-life evidence-based medicine for sleep apnoea.Big Data and sleep apnoea tory, Grenoble Alpes University) for the literature search and contribution to the pharmacovigilance and environmentalRespirology (2020) 25, 486-494

show abstract

Section: The Challenges Of Big Datamentioning

confidence: 99%

Big Data in sleep apnoea: Opportunities and challenges

2019

View full text Add to dashboard Cite

show abstract

“…With signi cant evolution in recent years [10], machine learning methods are powerful tools in supporting medical diagnoses. Studies [11,9,12] have shown that these methods are capable of predicting and identifying diseases based on laboratory tests and clinical data with similar accuracy to a human specialist. Other studies [13,14,15] have also been able to assist in the diagnosis of diabetes by making use of machine learning techniques.…”

Section: Introductionmentioning

confidence: 99%

Prediction of Glycated Haemoglobin Based on Routine Blood Count Tests to Support the Diagnosis of Diabetes Mellitus

Cardozo

Andreis

Cossul

et al. 2020

Preprint

View full text Add to dashboard Cite

Abstract Background Currently 8.8% of the World's population aged from 20 to 79 have diabetes mellitus (DM); of this total is estimated that 50% have not been diagnosed and do not know they have the disease. The most common laboratory tests used for diagnosis include blood glucose (FPG) and glycated haemoglobin (HbA1c). The HbA1c test has advantages over FPG, therefore being recommended in diagnoses of DM. Early diagnoses are essential to prevent complications caused by DM; however, the symptoms of the initial stage are present in only 40% of the carriers, symptomless carriers oddly pursue the DM test. In a lifetime patients performs a series of laboratory exams for health analysis which is stored as laboratory data, the computational approach offers enormous potential in health data analysis discovering relevant results overlooked by physicians. Use machine learn approach on data stored of routine blood count laboratory tests to predict HbA1c diagnosis. Method: Using laboratory results from data stored of HbA1c, was formed six data groups composed of individuals: healthy and pre-diabetic (HP); healthy and diabetic (HD); pre-diabetic and diabetic (PD); healthy and non-healthy (HN); non-diabetic and diabetic (ND); and healthy, pre-diabetic and diabetic (HPD) patients. For each data group, was tested the K nearest neighbours (KNN), support vector machines (SVM), random forests (RF), naive Bayes (NB) and artificial neural network (ANN) models. Assessment of model performance was carried out using sensitivity, specificity, precision and negative prediction. Results The KNN model applied to the ND group had the best performance in the diagnosis of diabetes, resulting in a sensitivity value of 53.6% and an accuracy of 90.1%. The classification after regression with the neural network model (ANNr) and the ND group had a more general result, with a sensitivity of 74.3% and an accuracy of 77.2%. Analysing only the values for the regression, the neural network model presented a mean square error of 0.36 for the final test base with a correlation of 0.85. Conclusions We conclude that machine learning-based computational models can predict HbA1c values from other routine laboratory tests. Thus, they can assist in the detection of diabetes and act as a warning for undiagnosed cases.

show abstract

“…Because inconsistencies and inaccuracies are a reality when using EHR to gather data, depending solely on one indication -like a particular diagnostic code or test result -for even well-defined health outcomes (e.g. hypertension) does not always lead to accurate classifications(Wong, Horwitz, et al, 2018). Two proposed solutions to allow incomplete EHR data to train more predictive models are to include more features in the training data and to use multiple data types to identify a single target(Wong, Horwitz et al, 2018).…”

mentioning

confidence: 99%

“…hypertension) does not always lead to accurate classifications(Wong, Horwitz, et al, 2018). Two proposed solutions to allow incomplete EHR data to train more predictive models are to include more features in the training data and to use multiple data types to identify a single target(Wong, Horwitz et al, 2018). Including more features is particularly useful when the relationship between the target and the variables is intricate.…”

mentioning

confidence: 99%

“…Some of these data types, like unstructured data, are difficult to extract in a form useful for training algorithms. Machine learning is particularly valuable in this data pre-processing step.Unstructured data like images and free text were estimated to make up 80% of the patient data in EHR systems, and these formats are not easily queried in an automated way(Wong, Horwitz et al, 2018;Murdoch & Detsky, 2013). This unstructured data requires manual human work to find and arrange it in a way that can be used in the algorithm training process(Ford, Carroll et al, 2016;Araujo et al, 2017).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Ethical Issues Arising Due to Bias in Training A.I. Algorithms in Healthcare and Data Sharing as a Potential Solution

Gaonkar¹,

Kim²,

Macyszyn³

2020

AIEJ

View full text Add to dashboard Cite

Machine learning algorithms have been shown to be capable of diagnosing cancer, Alzheimer's disease and even selecting treatment options. However, the majority of machine learning systems implemented in the healthcare setting tend to be based on the supervised machine learning paradigm. These systems tend to rely on previously collected data annotated by medical personnel from specific populations. This leads to 'learnt' machine learning models that lack generalizability. In other words, the machine's predictions are not as accurate for certain populations and can disagree with recommendations of medical experts who did not annotate the data used to train these models. With each human-decided aspect of building supervised machine learning models, human bias is introduced into the machine's decision-making. This human bias is the source of numerous ethical concerns. In this article, we describe and discuss three challenges to generalizability which affect real world deployment of machine learning systems in clinical practice. First, there is bias which occurs due to the characteristics of the population from which data was collected. Second, the bias which occurs due to the prejudice of the expert annotator involved. And third, the bias by the timing of when A.I. processes start training themselves. We also discuss the future implications of these biases. More importantly, we describe how responsible data sharing can help mitigate the effects of these biases-and allow for the development of novel algorithms which may be able to train in an unbiased manner. We discuss environmental and regulatory hurdles which hinder the sharing of data in medicine-and discuss possible updates to current regulations that may enable ethical data sharing for machine learning. With these updates in mind, we also discuss emerging algorithmic frameworks being used to create medical machine learning systems, which can eventually learn to be free from population-and expert-induced bias. These models can then truly be deployed to clinics worldwide, making medicine both cheaper and more accessible for the world at large.

show abstract

Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data

Cited by 64 publications

References 64 publications

Big Data in sleep apnoea: Opportunities and challenges

Big Data in sleep apnoea: Opportunities and challenges

Prediction of Glycated Haemoglobin Based on Routine Blood Count Tests to Support the Diagnosis of Diabetes Mellitus

Ethical Issues Arising Due to Bias in Training A.I. Algorithms in Healthcare and Data Sharing as a Potential Solution

Contact Info

Product

Resources

About