Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
IntroductionThe eMERGE (electronic MEdical Records and GEnomics) Network is an NHGRI-supported consortium of five institutions to explore the utility of DNA repositories coupled to Electronic Medical Record (EMR) systems for advancing discovery in genome science. eMERGE also includes a special emphasis on the ethical, legal and social issues related to these endeavors.OrganizationThe five sites are supported by an Administrative Coordinating Center. Setting of network goals is initiated by working groups: (1) Genomics, (2) Informatics, and (3) Consent & Community Consultation, which also includes active participation by investigators outside the eMERGE funded sites, and (4) Return of Results Oversight Committee. The Steering Committee, comprised of site PIs and representatives and NHGRI staff, meet three times per year, once per year with the External Scientific Panel.Current progressThe primary site-specific phenotypes for which samples have undergone genome-wide association study (GWAS) genotyping are cataract and HDL, dementia, electrocardiographic QRS duration, peripheral arterial disease, and type 2 diabetes. A GWAS is also being undertaken for resistant hypertension in ≈2,000 additional samples identified across the network sites, to be added to data available for samples already genotyped. Funded by ARRA supplements, secondary phenotypes have been added at all sites to leverage the genotyping data, and hypothyroidism is being analyzed as a cross-network phenotype. Results are being posted in dbGaP. Other key eMERGE activities include evaluation of the issues associated with cross-site deployment of common algorithms to identify cases and controls in EMRs, data privacy of genomic and clinically-derived data, developing approaches for large-scale meta-analysis of GWAS data across five sites, and a community consultation and consent initiative at each site.Future activitiesPlans are underway to expand the network in diversity of populations and incorporation of GWAS findings into clinical care.SummaryBy combining advanced clinical informatics, genome science, and community consultation, eMERGE represents a first step in the development of data-driven approaches to incorporate genomic information into routine healthcare delivery.
BACKGROUND-Genetic variants of the enzyme that metabolizes warfarin, cytochrome P-450 2C9 (CYP2C9), and of a key pharmacologic target of warfarin, vitamin K epoxide reductase (VKORC1), contribute to differences in patients' responses to various warfarin doses, but the role of these variants during initial anticoagulation is not clear.
Multifactor dimensionality reduction (MDR) was developed as a method for detecting statistical patterns of epistasis. The overall goal of MDR is to change the representation space of the data to make interactions easier to detect. It is well known that machine learning methods may not provide robust models when the class variable (e.g. case-control status) is imbalanced and accuracy is used as the fitness measure. This is because most methods learn patterns that are relevant for the larger of the two classes. The goal of this study was to evaluate three different strategies for improving the power of MDR to detect epistasis in imbalanced datasets. The methods evaluated were: (1) over-sampling that resamples with replacement the smaller class until the data are balanced, (2) under-sampling that randomly removes subjects from the larger class until the data are balanced, and (3) balanced accuracy [(sensitivity+specificity)/2] as the fitness function with and without an adjusted threshold. These three methods were compared using simulated data with two-locus epistatic interactions of varying heritability (0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4) and minor allele frequency (0.2, 0.4) that were embedded in 100 replicate datasets of varying sample sizes (400, 800, 1600). Each dataset was generated with different ratios of cases to controls (1 : 1, 1 : 2, 1 : 4). We found that the balanced accuracy function with an adjusted threshold significantly outperformed both over-sampling and under-sampling and fully recovered the power. These results suggest that balanced accuracy should be used instead of accuracy for the MDR analysis of epistasis in imbalanced datasets.
Warfarin dosing is correlated with polymorphisms in vitamin K epoxide reductase complex 1 (VKORC1) and the cytochrome P450 2C9 (CYP2C9) genes. Recently, the FDA revised warfarin labeling to raise physician awareness about these genetic effects. Randomized clinical trials are underway to test genetically based dosing algorithms. It is thus important to determine whether common single nucleotide polymorphisms (SNPs) in other gene(s) have a large effect on warfarin dosing. A retrospective genome-wide association study was designed to identify polymorphisms that could explain a large fraction of the dose variance. White patients from an index warfarin population (n ؍ 181) and 2 independent replication patient populations (n ؍ 374) were studied. From the approximately 550 000 polymorphisms tested, the most significant independent effect was associated with VKORC1 polymorphisms (P ؍ 6.2 ؋ 10 ؊13 ) in the index patients. CYP2C9 (rs1057910 CYP2C9*3) and rs4917639) was associated with dose at moderate significance levels (P ϳ 10 ؊4 ). Replication polymorphisms (355 SNPs) from the index study did not show any significant effects in the replication patient sets. We conclude that common SNPs with large effects on warfarin dose are unlikely to be discovered outside of the CYP2C9 and VKORC1 genes. Randomized clinical trials that account for these 2 genes should therefore produce results that are definitive and broadly applicable. IntroductionThe determination of safe yet effective doses of warfarin for individual patients is one of the most promising clinical applications of pharmacogenetics. [1][2][3] There are large variation in warfarin dose from patient to patient and significant clinical consequences of doses that produce insufficient or excessive pharmacologic effects. Thus, reducing uncertainty in establishing the therapeutic dose in individual patients could improve quality of care as well as expand the range of patients who could be treated. 4 In white patients, genetic factors are more strongly correlated with stabilized warfarin dose than all other known patient-related factors. Warfarin pharmacokinetics are affected by functional polymorphisms (*2, Arg144Cys; *3, Ile359Leu) in cytochrome P450 2C9 (CYP2C9). 5,6 In addition, warfarin's effects are modulated by polymorphisms (eg, Ϫ1639, rs9923231) in the vitamin K epoxide reductase complex 1 (VKORC1) enzyme, a critical component of the vitamin K cycle discovered in part because of its contribution to bleeding disorders and warfarin resistance. 7,8 Both VKORC1 and CYP2C9 polymorphisms independently correlate with warfarin dose 9,10 and other clinical outcomes such as time to stabilized dose, bleeding events, and time within the target therapeutic range. [11][12][13] Combined polymorphisms in VKORC1 and CYP2C9 explain approximately 30% (20%-25% for VKORC1; 5%-10% for CYP2C9) of the variance in the stabilized warfarin dose distribution. 10,14,15 The importance of these strong genetic effects was recognized by recent relabeling of warfarin by the FDA to raise awar...
Both VKORC1 and CYP2C9 polymorphisms contribute to inter-population difference in warfarin doses among the three populations, but their contribution to intra-population variability may differ within each population.
Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are 'the norm' and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics.
While hypertension is a complex disease with a well-documented genetic component, genetic studies often fail to replicate findings. One possibility for such inconsistency is that the underlying genetics of hypertension is not based on single genes of major effect, but on interactions among genes. To test this hypothesis, we studied both single locus and multilocus effects, using a case-control design of subjects from Ghana. Thirteen polymorphisms in eight candidate genes were studied. Each candidate gene has been shown to play a physiological role in blood pressure regulation and affects one of four pathways that modulate blood pressure: vasoconstriction (angiotensinogen, angiotensin converting enzyme – ACE, angiotensin II receptor), nitric oxide (NO) dependent and NO independent vasodilation pathways and sodium balance (G protein-coupled receptor kinase, GRK4). We evaluated single site allelic and genotypic associations, multilocus genotype equilibrium and multilocus genotype associations, using multifactor dimensionality reduction (MDR). For MDR, we performed systematic reanalysis of the data to address the role of various physiological pathways. We found no significant single site associations, but the hypertensive class deviated significantly from genotype equilibrium in more than 25% of all multilocus comparisons (2,162 of 8,178), whereas the normotensive class rarely did (11 of 8,178). The MDR analysis identified a two-locus model including ACE and GRK4 that successfully predicted blood pressure phenotype 70.5% of the time. Thus, our data indicate epistatic interactions play a major role in hypertension susceptibility. Our data also support a model where multiple pathways need to be affected in order to predispose to hypertension.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.