SummaryEducational attainment (EA) is strongly influenced by social and other environmental factors, but genetic factors are also estimated to account for at least 20% of the variation across individuals1. We report the results of a genome-wide association study (GWAS) for EA that extends our earlier discovery sample1,2 of 101,069 individuals to 293,723 individuals, and a replication in an independent sample of 111,349 individuals from the UK Biobank. We now identify 74 genome-wide significant loci associated with number of years of schooling completed. Single-nucleotide polymorphisms (SNPs) associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioral phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because EA is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric disease.
We conducted genome-wide association studies of three phenotypes: subjective well-being (N = 298,420), depressive symptoms (N = 161,460), and neuroticism (N = 170,910). We identified three variants associated with subjective well-being, two with depressive symptoms, and eleven with neuroticism, including two inversion polymorphisms. The two depressive symptoms loci replicate in an independent depression sample. Joint analyses that exploit the high genetic correlations between the phenotypes (|ρ̂| ≈ 0.8) strengthen the overall credibility of the findings, and allow us to identify additional variants. Across our phenotypes, loci regulating expression in central nervous system and adrenal/pancreas tissues are strongly enriched for association.
A genome-wide association study of educational attainment was conducted in a discovery sample of 101,069 individuals and a replication sample of 25,490. Three independent SNPs are genome-wide significant (rs9320913, rs11584700, rs4851266), and all three replicate. Estimated effects sizes are small (R2 ≈ 0.02%), approximately 1 month of schooling per allele. A linear polygenic score from all measured SNPs accounts for ≈ 2% of the variance in both educational attainment and cognitive function. Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus. These findings provide promising candidate SNPs for follow-up work, and our effect size estimates can anchor power analyses in social-science genetics.
Significance
We identify several common genetic variants associated with cognitive performance using a two-stage approach: we conduct a genome-wide association study of educational attainment to generate a set of candidates, and then we estimate the association of these variants with cognitive performance. In older Americans, we find that these variants are jointly associated with cognitive health. Bioinformatics analyses implicate a set of genes that is associated with a particular neurotransmitter pathway involved in synaptic plasticity, the main cellular mechanism for learning and memory. In addition to the substantive contribution, this work also serves to show a proxy-phenotype approach to discovering common genetic variants that is likely to be useful for many phenotypes of interest to social scientists (such as personality traits).
Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.
Intelligence in childhood, as measured by psychometric cognitive tests, is a strong predictor of many important life outcomes, including educational attainment, income, health and lifespan. Results from twin, family and adoption studies are consistent with general intelligence being highly heritable and genetically stable throughout the life course. No robustly associated genetic loci or variants for childhood intelligence have been reported. Here, we report the first genome-wide association study (GWAS) on childhood intelligence (age range 6–18 years) from 17 989 individuals in six discovery and three replication samples. Although no individual single-nucleotide polymorphisms (SNPs) were detected with genome-wide significance, we show that the aggregate effects of common SNPs explain 22–46% of phenotypic variation in childhood intelligence in the three largest cohorts (P = 3.9 × 10−15, 0.014 and 0.028). FNBP1L, previously reported to be the most significantly associated gene for adult intelligence, was also significantly associated with childhood intelligence (P = 0.003). Polygenic prediction analyses resulted in a significant correlation between predictor and outcome in all replication cohorts. The proportion of childhood intelligence explained by the predictor reached 1.2% (P = 6 × 10−5), 3.5% (P = 10−3) and 0.5% (P = 6 × 10−5) in three independent validation cohorts. Given the sample sizes, these genetic prediction results are consistent with expectations if the genetic architecture of childhood intelligence is like that of body mass index or height. Our study provides molecular support for the heritability and polygenic nature of childhood intelligence. Larger sample sizes will be required to detect individual variants with genome-wide significance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.