IMPORTANCE Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia, and its early detection could lead to significant improvements in outcomes through the appropriate prescription of anticoagulation medication. Although a variety of methods exist for screening for AF, a targeted approach, which requires an efficient method for identifying patients at risk, would be preferred. OBJECTIVE To examine machine learning approaches applied to electronic health record data that have been harmonized to the Observational Medical Outcomes Partnership Common Data Model for identifying risk of AF. DESIGN, SETTING, AND PARTICIPANTS This diagnostic study used data from 2 252 219 individuals cared for in the UCHealth hospital system, which comprises 3 large hospitals in Colorado, from January 1, 2011, to October 1, 2018. Initial analysis was performed in December 2018; follow-up analysis was performed in July 2019. EXPOSURES All Observational Medical Outcomes Partnership Common Data Model-harmonized electronic health record features, including diagnoses, procedures, medications, age, and sex. MAIN OUTCOMES AND MEASURES Classification of incident AF in designated 6-month intervals, adjudicated retrospectively, based on area under the receiver operating characteristic curve and F1 statistic. RESULTS Of 2 252 219 individuals (1 225 533 [54.4%] women; mean [SD] age, 42.9 [22.3] years), 28 036 (1.2%) developed incident AF during a designated 6-month interval. The machine learning model that used the 200 most common electronic health record features, including age and sex, and random oversampling with a single-layer, fully connected neural network provided the optimal prediction of 6-month incident AF, with an area under the receiver operating characteristic curve of 0.800 and an F1 score of 0.110. This model performed only slightly better than a more basic logistic regression model composed of known clinical risk factors for AF, which had an area under the receiver operating characteristic curve of 0.794 and an F1 score of 0.079. CONCLUSIONS AND RELEVANCE Machine learning approaches to electronic health record data offer a promising method for improving risk prediction for incident AF, but more work is needed to show improvement beyond standard risk factors.
Background With cardiovascular disease increasing, substantial research has focused on the development of prediction tools. We compare deep learning and machine learning models to a baseline logistic regression using only ‘known’ risk factors in predicting incident myocardial infarction (MI) from harmonized EHR data. Methods Large-scale case-control study with outcome of 6-month incident MI, conducted using the top 800, from an initial 52 k procedures, diagnoses, and medications within the UCHealth system, harmonized to the Observational Medical Outcomes Partnership common data model, performed on 2.27 million patients. We compared several over- and under- sampling techniques to address the imbalance in the dataset. We compared regularized logistics regression, random forest, boosted gradient machines, and shallow and deep neural networks. A baseline model for comparison was a logistic regression using a limited set of ‘known’ risk factors for MI. Hyper-parameters were identified using 10-fold cross-validation. Results Twenty thousand Five hundred and ninety-one patients were diagnosed with MI compared with 2.25 million who did not. A deep neural network with random undersampling provided superior classification compared with other methods. However, the benefit of the deep neural network was only moderate, showing an F1 Score of 0.092 and AUC of 0.835, compared to a logistic regression model using only ‘known’ risk factors. Calibration for all models was poor despite adequate discrimination, due to overfitting from low frequency of the event of interest. Conclusions Our study suggests that DNN may not offer substantial benefit when trained on harmonized data, compared to traditional methods using established risk factors for MI.
Background: Drug-induced QT prolongation is a potentially preventable cause of morbidity and mortality, however there are no widespread clinical tools utilized to predict which individuals are at greatest risk. Machine learning (ML) algorithms may provide a method for identifying these individuals, and could be automated to directly alert providers in real time. Objective: This study applies ML techniques to electronic health record (EHR) data to identify an integrated risk-prediction model that can be deployed to predict risk of drug-induced QT prolongation. Methods: We examined harmonized data from the UCHealth EHR and identified inpatients who had received a medication known to prolong the QT interval. Using a binary outcome of the development of a QTc interval >500 ms within 24 hours of medication initiation or no ECG with a QTc interval >500 ms, we compared multiple machine learning methods by classification accuracy and performed calibration and rescaling of the final model. Results: We identified 35,639 inpatients who received a known QT-prolonging medication and an ECG performed within 24 hours of administration. Of those, 4,558 patients developed a QTc > 500 ms and 31,081 patients did not. A deep neural network with random oversampling of controls was found to provide superior classification accuracy (F1 score 0.404; AUC 0.71) for the development of a long QT interval compared with other methods. The optimal cutpoint for prediction was determined and was reasonably accurate (sensitivity 71%; specificity 73%). Conclusions: We found that deep neural networks applied to EHR data provide reasonable prediction of which individuals are most susceptible to drug-induced QT prolongation. Future studies are needed to validate this model in novel EHRs and within the physician order entry system to assess the ability to improve patient safety.
Data Mining performs a major role in healthcare services because disease recognition and investigation contains a vast amount of data. These conditions generate several data managing problems, and to operate efficiently. The healthcare datasets are undefined and influential and it is extremely monotonous to manage and to operate. To get better of the exceeding problems, numerous analyses present various ML algorithms for different disease examination and prediction. The undertaking of disease identification and prediction is an element of classification and forecasting. In this paper, diabetes is estimated by major characteristics and the relation of contradictory characteristics is also categorized. Significant features selection was done via the recursive feature elimination with random forest. The estimation of our system specifies a powerful alliance of diabetes with (BMI) and with glucose level was drawing out using the Apriori approach. XGBoost has examined for the estimation of diabetes. The XGBoost gives better accuracy of 78.91% compared to the ANN approach and might help support medicinal professionals through treatment decisions.
Introduction/backgroundPatients with heart failure and reduced ejection fraction (HFrEF) are consistently underprescribed guideline-directed medications. Although many barriers to prescribing are known, identification of these barriers has relied on traditional a priori hypotheses or qualitative methods. Machine learning can overcome many limitations of traditional methods to capture complex relationships in data and lead to a more comprehensive understanding of the underpinnings driving underprescribing. Here, we used machine learning methods and routinely available electronic health record data to identify predictors of prescribing.MethodsWe evaluated the predictive performance of machine learning algorithms to predict prescription of four types of medications for adults with HFrEF: angiotensin converting enzyme inhibitor/angiotensin receptor blocker (ACE/ARB), angiotensin receptor-neprilysin inhibitor (ARNI), evidence-based beta blocker (BB), or mineralocorticoid receptor antagonist (MRA). The models with the best predictive performance were used to identify the top 20 characteristics associated with prescribing each medication type. Shapley values were used to provide insight into the importance and direction of the predictor relationships with medication prescribing.ResultsFor 3,832 patients meeting the inclusion criteria, 70% were prescribed an ACE/ARB, 8% an ARNI, 75% a BB, and 40% an MRA. The best-predicting model for each medication type was a random forest (area under the curve: 0.788–0.821; Brier score: 0.063–0.185). Across all medications, top predictors of prescribing included prescription of other evidence-based medications and younger age. Unique to prescribing an ARNI, the top predictors included lack of diagnoses of chronic kidney disease, chronic obstructive pulmonary disease, or hypotension, as well as being in a relationship, nontobacco use, and alcohol use.Discussion/conclusionsWe identified multiple predictors of prescribing for HFrEF medications that are being used to strategically design interventions to address barriers to prescribing and to inform further investigations. The machine learning approach used in this study to identify predictors of suboptimal prescribing can also be used by other health systems to identify and address locally relevant gaps and solutions to prescribing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.