Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
Many modern human genomes retain DNA inherited from interbreeding with archaic hominins, such as Neanderthals, yet the influence of this admixture on human traits is largely unknown. We analyzed the contribution of common Neanderthal variants to over 1,000 electronic health record (EHR)-derived phenotypes in ~28,000 adults of European ancestry. We discovered and replicated associations of Neanderthal alleles with neurological, psychiatric, immunological, and dermatological phenotypes. Neanderthal alleles together explain a significant fraction of the variation in risk for depression and skin lesions resulting from sun exposure (actinic keratosis), and individual Neanderthal alleles are significantly associated with specific human phenotypes, including hypercoagulation and tobacco use. Our results establish that archaic admixture influences disease risk in modern humans, provide hypotheses about the effects of hundreds of Neanderthal haplotypes and demonstrate the utility of EHR data in evolutionary analyses.
We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10(-9)). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10(-6)). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10(-5)), nodular (OR = 0.76, p = 3.1 × 10(-5)) and multinodular (OR = 0.69, p = 3.9 × 10(-5)) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10(-3)), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10(-13)), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.
Objective
To report the design and implementation of the Right Drug, Right Dose, Right Time: Using Genomic Data to Individualize Treatment Protocol that was developed to test the concept that prescribers can deliver genome guided therapy at the point-of-care by using preemptive pharmacogenomics (PGx) data and clinical decision support (CDS) integrated in the electronic medical record (EMR).
Patients and Methods
We used a multivariable prediction model to identify patients with a high risk of initiating statin therapy within 3 years. The model was used to target a study cohort most likely to benefit from preemptive PGx testing among Mayo Clinic Biobank participants with a recruitment goal of 1000 patients. Cox proportional hazards model was utilized using the variables selected through the Lasso shrinkage method. An operational CDS model was adapted to implement PGx rules within the EMR.
Results
The prediction model included age, sex, race, and 6 chronic diseases categorized by the Clinical Classifications Software for ICD-9 codes (dyslipidemia, diabetes, peripheral atherosclerosis, disease of the blood-forming organs, coronary atherosclerosis and other heart diseases, and hypertension). Of the 2000 Biobank participants invited, 50% provided blood samples, 13% refused, 28% did not respond, and 9% consented but did not provide a blood sample within the recruitment window (October 4, 2012 – March 20, 2013). Preemptive PGx testing included CYP2D6 genotyping and targeted sequencing of 84 PGx genes. Synchronous real-time CDS is integrated in the EMR and flags potential patient-specific drug-gene interactions and provides therapeutic guidance.
Conclusion
These interventions will improve understanding and implementation of genomic data in clinical practice.
By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
Clinical data in Electronic Medical Records (EMRs) is a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network or eMERGE investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73–98% and negative predictive values of 98–100%. A majority of EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
We describe here the design and initial implementation of the eMERGE-PGx project. eMERGE-PGx, a partnership of the eMERGE and PGRN consortia, has three objectives : 1) Deploy PGRNseq, a next-generation sequencing platform assessing sequence variation in 84 proposed pharmacogenes, in nearly 9,000 patients likely to be prescribed drugs of interest in a 1–3 year timeframe across several clinical sites; 2) Integrate well-established clinically-validated pharmacogenetic genotypes into the electronic health record with associated clinical decision support and assess process and clinical outcomes of implementation; and 3) Develop a repository of pharmacogenetic variants of unknown significance linked to a repository of EHR-based clinical phenotype data for ongoing pharmacogenomics discovery. We describe site-specific project implementation and anticipated products, including genetic variant and phenotype data repositories, novel variant association studies, clinical decision support modules, clinical and process outcomes, approaches to manage incidental findings, and patient and clinician education methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.