In this proof-of-concept study, we demonstrated application of the PheWAS using large EHR biobanks to inform drug effects. The findings of an association of the IL6R SNP with reduced risk for aortic aneurysms correspond with the newest indication for IL6R blockade, giant cell arteritis, of which a major complication is aortic aneurysm.
Objective Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and Methods We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. Results The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. Conclusion The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.
In diabetes, overexpression of aldose reductase (AR) and consequent glucose-induced impairment of antioxidant defense systems may predispose to oxidative stress and the development of diabetic complications, but the mechanisms are poorly understood. Taurine (2-aminoethanesulfonic acid) functions as an antioxidant, osmolyte, and calcium modulator such that its intracellular depletion could promote cytotoxicity in diabetes. The relationships of oxidative stress and basal AR gene expression to Na+-taurine cotransporter (TT) gene expression, protein abundance, and TT activity were therefore explored in low AR-expressing human retinal pigment epithelial (RPE) 47 cells and RPE 47 cells stably transformed to overexpress AR (RPE 75). Changes in TT gene expression were determined using a 4.6-kb TT promoter-luciferase fusion gene. Compared with RPE 47 cells, in high AR-expressing RPE 75 cells, TT promoter activity was decreased by 46%, which was prevented by an AR inhibitor. TT promoter activity increased up to 900% by prooxidant exposure, which was associated with increased TT peptide abundance and taurine transport. However, induction of TT promoter activity by oxidative stress was attenuated in high AR-expressing cells and partially corrected by AR inhibitor. Finally, exposure of RPE 75 cells to high glucose increased oxidative stress, but down-regulated TT expression. These studies demonstrate for the first time that the TT is regulated by oxidative stress and that overexpression of AR and high glucose impair this response. Abnormal expression of AR may therefore impair antioxidant defense, which may determine tissue susceptibility to chronic diabetic complications.
The exposome represents the array of dietary, lifestyle, and demographic factors to which an individual is exposed. Individual components of the exposome, or groups of components, are recognized as influencing many aspects of human physiology, including cardiometabolic health. However, the influence of the whole exposome on health outcomes is poorly understood and may differ substantially from the sum of its individual components. As such, studies of the complete exposome are more biologically representative than fragmented models based on subsets of factors. This study aimed to model the system of relationships underlying the way in which the diet, lifestyle, and demographic components of the overall exposome shapes the cardiometabolic risk profile. The current study included 36,496 US Veterans enrolled in the VA Million Veteran Program (MVP) who had complete assessments of their diet, lifestyle, demography, and markers of cardiometabolic health, including serum lipids, blood pressure, and glycemic control. The cohort was randomly divided into training and validation datasets. In the training dataset, we conducted two separate exploratory factor analyses (EFA) to identify common factors among exposures (diet, demographics, and physical activity) and laboratory measures (lipids, blood pressure, and glycemic control), respectively. In the validation dataset, we used multiple normal regression to examine the combined effects of exposure factors on the clinical factors representing cardiometabolic health. The mean ± SD age of participants was 62.4 ± 13.4 years for both the training and validation datasets. The EFA revealed 19 Exposure Common Factors and 5 Physiology Common Factors that explained the observed (measured) data. Multivariate regression in the validation dataset revealed the structure of associations between the Exposure Common Factors and the Physiology Common Factors. For example, we found that the factor for fruit consumption was inversely associated with the factor summarizing total cholesterol and low-density lipoprotein cholesterol (LDLC, p = 0.008), and the latent construct describing light levels of physical activity was inversely associated with the blood pressure latent construct (p < 0.0001). We also found that a factor summarizing that participants who frequently consume whole milk are less likely to frequently consume skim milk, was positively associated with the latent constructs representing total cholesterol and LDLC as well as systolic and diastolic blood pressure (p = 0.0006 and < 0.0001, respectively). Multiple multivariable-adjusted regression analyses of exposome factors allowed us to model the influence of the exposome as a whole. In this metadata-rich, prospective cohort of US Veterans, there was evidence of structural relationships between diet, lifestyle, and demographic exposures and subsequent markers of cardiometabolic health. This methodology could be applied to answer a variety of research questions about human health exposures that utilize electronic health record data and can accommodate continuous, ordinal, and binary data derived from questionnaires. Further work to explore the potential utility of including genetic risk scores and time-varying covariates is warranted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.