Today, despite decades of developments in medicine and the growing interest in precision healthcare, vast majority of diagnoses happen once patients begin to show noticeable signs of illness. Early indication and detection of diseases, however, can provide patients and carers with the chance of early intervention, better disease management, and efficient allocation of healthcare resources. The latest developments in machine learning (more specifically, deep learning) provides a great opportunity to address this unmet need. In this study, we introduce BEHRT: A deep neural sequence transduction model for EHR (electronic health records), capable of multitask prediction and disease trajectory mapping. When trained and evaluated on the data from nearly 1.6 million individuals, BEHRT shows a striking absolute improvement of 8.0-10.8%, in terms of Average Precision Score, compared to the existing state-of-the-art deep EHR models (in terms of average precision, when predicting for the onset of 301 conditions). In addition to its superior prediction power, BEHRT provides a personalised view of disease trajectories through its attention mechanism; its flexible architecture enables it to incorporate multiple heterogeneous concepts (e.g., diagnosis, medication, measurements, and more) to improve the accuracy of its predictions; and its (pre-)training results in disease and patient representations that can help us get a step closer to interpretable predictions.
Predicting the incidence of complex chronic conditions such as heart failure is challenging. Deep learning models applied to rich electronic health records may improve prediction but remain unexplainable hampering their wider use in medical practice. We aimed to develop a deep-learning framework for accurate and yet explainable prediction of 6-month incident heart failure (HF). Using 100,071 patients from longitudinal linked electronic health records across the UK, we applied a novel Transformer-based risk model using all community and hospital diagnoses and medications contextualized within the age and calendar year for each patient's clinical encounter. Feature importance was investigated with an ablation analysis to compare model performance when alternatively removing features and by comparing the variability of temporal representations. A post-hoc perturbation technique was conducted to propagate the changes in the input to the outcome for feature contribution analyses. Our model achieved 0.93 area under the receiver operator curve and 0.69 area under the precision-recall curve on internal 5-fold cross validation and outperformed existing deep learning models. Ablation analysis indicated medication is important for predicting HF risk, calendar year is more important than chronological age, which was further reinforced by temporal variability analysis. Contribution analyses identified risk factors that are closely related to HF. Many of them were consistent with existing knowledge from clinical and epidemiological research but several new associations were revealed which had not been considered in expert-driven risk prediction models. In conclusion, the results highlight that our deep learning model, in addition high predictive performance, can inform data-driven risk factor identification.
Background Myocardial infarction (MI), stroke and diabetes share underlying risk factors and commonalities in clinical management. We examined if their combined impact on mortality is proportional, amplified or less than the expected risk separately of each disease and whether the excess risk is explained by their associated comorbidities. Methods Using large-scale electronic health records, we identified 2,007,731 eligible patients (51% women) and registered with general practices in the UK and extracted clinical information including diagnosis of myocardial infarction (MI), stroke, diabetes and 53 other long-term conditions before 2005 (study baseline). We used Cox regression to determine the risk of all-cause mortality with age as the underlying time variable and tested for excess risk due to interaction between cardiometabolic conditions. Results At baseline, the mean age was 51 years, and 7% (N = 145,910) have had a cardiometabolic condition. After a 7-year mean follow-up, 146,994 died. The sex-adjusted hazard ratios (HR) (95% confidence interval [CI]) of all-cause mortality by baseline disease status, compared to those without cardiometabolic disease, were MI = 1.51 (1.49–1.52), diabetes = 1.52 (1.51–1.53), stroke = 1.84 (1.82–1.86), MI and diabetes = 2.14 (2.11–2.17), MI and stroke = 2.35 (2.30–2.39), diabetes and stroke = 2.53 (2.50–2.57) and all three = 3.22 (3.15–3.30). Adjusting for other concurrent comorbidities attenuated these estimates, including the risk associated with having all three conditions (HR = 1.81 [95% CI 1.74–1.89]). Excess risks due to interaction between cardiometabolic conditions, particularly when all three conditions were present, were not significantly greater than expected from the individual disease effects. Conclusion Myocardial infarction, stroke and diabetes were associated with excess mortality, without evidence of any amplification of risk in people with all three diseases. The presence of other comorbidities substantially contributed to the excess mortality risks associated with cardiometabolic disease multimorbidity.
Advances in public health and medical care have enabled better pregnancy and birth outcomes. The rates of perinatal health indicators such as maternal mortality and morbidity; fetal, neonatal, and infant mortality; low birthweight; and preterm birth have reduced over time. However, they are still a public health concern, and considerable disparities exist within and between countries. For perinatal researchers who are engaged in unraveling the tangled web of causation for maternal and child health outcomes and for clinicians involved in the care of pregnant women and infants, artificial intelligence offers novel approaches to prediction modeling, diagnosis, early detection, and monitoring in perinatal health. Machine learning, a commonly used artificial intelligence method, has been used to predict preterm birth, birthweight, preeclampsia, mortality, hypertensive disorders, and postpartum depression. Real-time electronic health recording and predictive modeling using artificial intelligence have found early success in fetal monitoring and monitoring of women with gestational diabetes especially in low-resource settings. Artificial intelligence–based methodologies have the potential to improve prenatal diagnosis of birth defects and outcomes in assisted reproductive technology too. In this scenario, we envision artificial intelligence for perinatal research to be based on three goals: (1) availability of population-representative, routine clinical data (rich multimodal data of large sample size) for perinatal research; (2) modification and application of current state-of-the-art artificial intelligence for prediction and classification in health care research to the field of perinatal health; and (3) development of methods for explaining the decision-making processes of artificial intelligence models for perinatal health indicators. Achieving these three goals via a multidisciplinary approach to the development of artificial intelligence tools will enable trust in these tools and advance research, clinical practice, and policies to ensure optimal perinatal health.
Observational causal inference is useful for decisionmaking in medicine when randomized clinical trials (RCTs) are infeasible or nongeneralizable. However, traditional approaches do not always deliver unconfounded causal conclusions in practice. The rise of "doubly robust" nonparametric tools coupled with the growth of deep learning for capturing rich representations of multimodal data offers a unique opportunity to develop and test such models for causal inference on comprehensive electronic health records (EHRs). In this article, we investigate causal modeling of an RCT-established causal association: the effect of classes of antihypertensive on incident cancer risk. We develop a transformer-based model, targeted bidirectional EHR transformer (T-BEHRT) coupled with doubly robust estimation to estimate average risk ratio (RR). We compare our model to benchmark statistical and deep learning models for causal inference in multiple experiments on semi-synthetic derivations of our dataset with various types and intensities of confounding. In order to further test the reliability of our approach, we test our model on situations of limited data. We find that our model provides more accurate estimates of relative risk [least sum absolute error (SAE) from ground truth] compared with benchmark estimations. Finally, our model provides an estimate of class-wise antihypertensive effect on cancer risk that is consistent with results derived from RCTs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.