Abstract:Labeling clinical data from electronic health records (EHR) in health systems requires extensive knowledge of human expert, and painstaking review by clinicians. Furthermore, existing phenotyping algorithms are not uniformly applied across large datasets and can suffer from inconsistencies in case definitions across different algorithms. We describe here quantitative disease risk scores based on almost unsupervised methods that require minimal input from clinicians, can be applied to large datasets, and allevi… Show more
“…The potential interaction between the genetic feature and clinical risk factors beyond age and sex was not recognized and implemented in the genotype simulation, which may prevent the conclusion made from the MyCode sample could fully be extended to the larger nonMyCode sample, leading to more uncertainty of the discriminative power in the model with the simulated genetic feature included. The importance of the genetic feature was ranked lowest in the Logistic Regression-based model, suggesting Logistic Regression may underestimate the contribution of the genetic variant for the prediction of CDI, highlighting the importance of capturing multi-way interactions when assessing the value of common genetic variants with a small effect size in prediction models 25 .…”
With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal bias, (2) performing simulation studies to determine the predictive power in samples without genetic information, (3) propensity score matching to control for the confoundings, (4) selecting machine learning algorithms to capture complex feature interactions, (5) performing oversampling to address data imbalance, and (6) optimizing models and ensuring proper bias-variance trade-off. We evaluate the performance of prediction models of CDI when including common clinical risk factors and the benefit of incorporating genetic feature(s) into the models. We emphasize the importance of building a robust integrated pipeline to avoid systemic bias and thoroughly evaluating genetic features when integrated into the prediction models in the general population and subgroups.
“…The potential interaction between the genetic feature and clinical risk factors beyond age and sex was not recognized and implemented in the genotype simulation, which may prevent the conclusion made from the MyCode sample could fully be extended to the larger nonMyCode sample, leading to more uncertainty of the discriminative power in the model with the simulated genetic feature included. The importance of the genetic feature was ranked lowest in the Logistic Regression-based model, suggesting Logistic Regression may underestimate the contribution of the genetic variant for the prediction of CDI, highlighting the importance of capturing multi-way interactions when assessing the value of common genetic variants with a small effect size in prediction models 25 .…”
With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal bias, (2) performing simulation studies to determine the predictive power in samples without genetic information, (3) propensity score matching to control for the confoundings, (4) selecting machine learning algorithms to capture complex feature interactions, (5) performing oversampling to address data imbalance, and (6) optimizing models and ensuring proper bias-variance trade-off. We evaluate the performance of prediction models of CDI when including common clinical risk factors and the benefit of incorporating genetic feature(s) into the models. We emphasize the importance of building a robust integrated pipeline to avoid systemic bias and thoroughly evaluating genetic features when integrated into the prediction models in the general population and subgroups.
“…Electronic health records (EHRs) contain a large amount of information including historical patients’ demographics, medical examination results, tumor states, adopted treatments, and treatment outcomes (Y. Li, Fan, et al., 2020; Xu et al., 2021). It would be a valuable attempt to estimate the related data in Figure 1 by mining information from EHRs.…”
The determination of a treatment plan for a target patient with tumor is a difficult problem due to the existence of heterogeneity in patients’ responses, incomplete information about tumor states, and asymmetric knowledge between doctors and patients, and so on. In this paper, a method for quantitative risk analysis of treatment plans for patients with tumor is proposed. To reduce the impacts of the heterogeneity in patients’ responses on analysis results, the method conducts risk analysis by mining historical similar patients from Electronic Health Records (EHRs) in multiple hospitals using federated learning (FL). For this, the Recursive Feature Elimination based on the Support Vector Machine (SVM) and Deep Learning Important FeaTures (DeepLIFT) are extended into the FL framework to select key features and determine key feature weights for identifying historical similar patients. Then, in the database of each collaborative hospital, the similarities between the target patient and all historical patients are calculated, and the historical similar patients are determined. According to the statistics of tumor states and treatment outcomes of historical similar patients in all collaborative hospitals, the related data (including the probabilities of different tumor states and possible outcomes of different treatment plans) for risk analysis of the alternative treatment plans can be obtained, which can eliminate the asymmetric knowledge between doctors and patients. The related data are valuable for the doctor and patient to make their decisions. Experimental studies have been conducted to verify the feasibility and effectiveness of the proposed method.
“…When EHRs were initially developed, adopted, deployed and used in healthcare, topics of change management related to moving from paper-based charting to EHRs, EHR adoption, barriers and facilitators of the use of EHRs, best practices for EHR implementation, and healthcare provider receptivity to use of EHRs dominated early research studies [1][2][3][4][5][6][7]. More recently, now that healthcare data are routinely captured in electronic form, there has been an increase in studies related to mining the data in EHRs for use in research and quality improvement, studies of descriptive and predictive data analytic methods to analyze and use the data, and more interest in, and public policy actions requiring, information exchange among healthcare organizations [8][9][10]. Health information exchange as a verb (the process of accessing and sharing a patient's clinical and health information electronically) and as a noun (the organization that is responsible for the oversight of the exchange of information and that provides technology and services to share data) have grown over time.…”
Objectives: To summarize the recent literature and research and present a selection of the best papers published in 2021 related to health information exchange (HIE).
Methods: A systematic review of the literature was performed by the two section editors with the help of a medical librarian. We searched bibliographic databases for HIE-related papers using both MeSH headings and keywords in titles and abstracts. A shortlist of candidate 15 best papers was first selected by section editors before being peer-reviewed by independent external reviewers.
Results: Major themes of the set of 15 articles included the issues to be addressed in building and maintaining HIEs, HIE implementation barriers and facilitators, and the outcomes of using HIEs. The outcomes of using HIE encompassed the impact on patient care and the ability of HIEs to provide a repository of data for further research.
Conclusions: The growth of HIE has followed a course very similar to the growth of electronic health records (EHRs). Initial foci of research included technical issues in the deployment, followed by research on barriers to use. Now that EHRs are more widely implemented and used, the newer research involves the use of the electronic data contained in them. Although HIEs are currently at an earlier stage of maturity and development than EHRs and most of the articles in this review focused on implementation barriers, we have seen the beginning of research on the large amount of longitudinal and diverse data that HIEs can make available. As the implementation and use of HIEs continue to increase and become more widely deployed, we can expect that research about HIE and leveraging HIEs and the data they collect, will continue to increase.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.