Polygenic scores are a popular tool for prediction of complex traits. However, prediction estimates in samples of unrelated participants can include effects of population stratification, assortative mating, and environmentally mediated parental genetic effects, a form of genotype-environment correlation (rGE). Comparing genome-wide polygenic score (GPS) predictions in unrelated individuals with predictions between siblings in a within-family design is a powerful approach to identify these different sources of prediction. Here, we compared within-to between-family GPS predictions of eight outcomes (anthropometric, cognitive, personality, and health) for eight corresponding GPSs. The outcomes were assessed in up to 2,366 dizygotic (DZ) twin pairs from the Twins Early Development Study from age 12 to age 21. To account for family clustering, we used mixed-effects modeling, simultaneously estimating within-and between-family effects for target-and cross-trait GPS prediction of the outcomes. There were three main findings: (1) DZ twin GPS differences predicted DZ differences in height, BMI, intelligence, educational achievement, and ADHD symptoms; (2) target and cross-trait analyses indicated that GPS prediction estimates for cognitive traits (intelligence and educational achievement) were on average 60% greater between families than within families, but this was not the case for non-cognitive traits; and (3) much of this within-and between-family difference for cognitive traits disappeared after controlling for family socioeconomic status (SES), suggesting that SES is a major source of between-family prediction through rGE mechanisms. These results provide insights into the patterns by which rGE contributes to GPS prediction, while ruling out confounding due to population stratification and assortative mating.
A genome-wide polygenic score (GPS), derived from a 2013 genome-wide association study (N=127,000), explained 2% of the variance in total years of education (EduYears). In a follow-up study (N=329,000), a new EduYears GPS explains up to 4%. Here, we tested the association between this latest EduYears GPS and educational achievement scores at ages 7, 12 and 16 in an independent sample of 5825 UK individuals. We found that EduYears GPS explained greater amounts of variance in educational achievement over time, up to 9% at age 16, accounting for 15% of the heritable variance. This is the strongest GPS prediction to date for quantitative behavioral traits. Individuals in the highest and lowest GPS septiles differed by a whole school grade at age 16. Furthermore, EduYears GPS was associated with general cognitive ability (~3.5%) and family socioeconomic status (~7%). There was no evidence of an interaction between EduYears GPS and family socioeconomic status on educational achievement or on general cognitive ability. These results are a harbinger of future widespread use of GPS to predict genetic risk and resilience in the social and behavioral sciences.
The Twins Early Development Study (TEDS) is a longitudinal twin study that recruited over 16,000 twin-pairs born between 1994 and 1996 in England and Wales through national birth records. More than 10,000 of these families are still engaged in the study. TEDS was and still is a representative sample of the population in England and Wales. Rich cognitive and emotional/behavioral data have been collected from the twins from infancy to emerging adulthood, with data collection at first contact and at ages 2, 3, 4, 7, 8, 9, 10, 12, 14, 16, 18 and 21, enabling longitudinal genetically sensitive analyses. Data have been collected from the twins themselves, from their parents and teachers, and from the UK National Pupil Database. Genotyped DNA data are available for 10,346 individuals (who are unrelated except for 3320 dizygotic co-twins). TEDS data have contributed to over 400 scientific papers involving more than 140 researchers in 50 research institutions. TEDS offers an outstanding resource for investigating cognitive and behavioral development across childhood and early adulthood and actively fosters scientific collaborations.
Background Diverse behaviour problems in childhood correlate phenotypically, suggesting a general dimension of psychopathology that has been called the p factor. The shared genetic architecture between childhood psychopathology traits also supports a genetic p. This study systematically investigates the manifestation of this common dimension across self‐, parent‐ and teacher‐rated measures in childhood and adolescence. Methods The sample included 7,026 twin pairs from the Twins Early Development Study (TEDS). First, we employed multivariate twin models to estimate common genetic and environmental influences on p based on diverse measures of behaviour problems rated by children, parents and teachers at ages 7, 9, 12 and 16 (depressive traits, emotional problems, peer problems, autism traits, hyperactivity, antisocial behaviour, conduct problems and psychopathic tendencies). Second, to assess the stability of genetic and environmental influences on p across time, we conducted longitudinal twin modelling of the first phenotypic principal components of childhood psychopathological measures across each of the four ages. Third, we created a genetic p factor in 7,026 unrelated genotyped individuals based on eight polygenic scores for psychiatric disorders to estimate how a general polygenic predisposition to mostly adult psychiatric disorders relates to childhood p. Results Behaviour problems were consistently correlated phenotypically and genetically across ages and raters. The p factor is substantially heritable (50%–60%) and manifests consistently across diverse ages and raters. However, residual variation in the common factor models indicates unique contributions as well. Genetic correlations of p components across childhood and adolescence suggest stability over time (49%–78%). A polygenic general psychopathology factor derived from studies of psychiatric disorders consistently predicted a general phenotypic p factor across development (0.3%–0.9%). Conclusions Diverse forms of psychopathology generally load on a common p factor, which is highly heritable. There are substantial genetic influences on the stability of p across childhood. Our analyses indicate genetic overlap between general risk for psychiatric disorders in adulthood and p in childhood, even as young as age 7. The p factor has far‐reaching implications for genomic research and, eventually, for diagnosis and treatment of behaviour problems.
It has recently been proposed that a single dimension, called the p factor, can capture a person’s liability to mental disorder. Relevant to the p hypothesis, recent genetic research has found surprisingly high genetic correlations between pairs of psychiatric disorders. Here, for the first time, we compare genetic correlations from different methods and examine their support for a genetic p factor. We tested the hypothesis of a genetic p factor by applying principal component analysis to matrices of genetic correlations between major psychiatric disorders estimated by three methods—family study, genome-wide complex trait analysis, and linkage-disequilibrium score regression—and on a matrix of polygenic score correlations constructed for each individual in a UK-representative sample of 7 026 unrelated individuals. All disorders loaded positively on a first unrotated principal component, which accounted for 57, 43, 35, and 22% of the variance respectively for the four methods. Our results showed that all four methods provided strong support for a genetic p factor that represents the pinnacle of the hierarchical genetic architecture of psychopathology.
Recent advances in genomics are producing powerful DNA predictors of complex traits, especially cognitive abilities. Here, we leveraged summary statistics from the most recent genome-wide association studies of intelligence and educational attainment, with highly genetically correlated traits, to build prediction models of general cognitive ability and educational achievement. To this end, we compared the performances of multi-trait genomic and polygenic scoring methods. In a representative UK sample of 7,026 children at ages 12 and 16, we show that we can now predict up to 11 percent of the variance in intelligence and 16 percent in educational achievement. We also show that predictive power increases from age 12 to age 16 and that genomic predictions do not differ for girls and boys. We found that multi-trait genomic methods were effective in boosting predictive power. Prediction accuracy varied across polygenic score approaches, however results were similar for different multi-trait and polygenic score methods. We discuss general caveats of multi-trait methods and polygenic score prediction, and conclude that polygenic scores for educational attainment and intelligence are currently the most powerful predictors in the behavioural sciences.
Background - There is considerable interest in whether genetic data can be used to improve standard cardiovascular disease risk calculators, as the latter are routinely used in clinical practice to manage preventative treatment. Methods - Using the UK Biobank (UKB) resource, we developed our own polygenic risk score (PRS) for coronary artery disease (CAD). We used an additional 60,000 UKB individuals to develop an integrated risk tool (IRT) that combined our PRS with established risk tools (either the American Heart Association/American College of Cardiology's Pooled Cohort Equations (PCE) or UK's QRISK3), and we tested our IRT in an additional, independent, set of 186,451 UKB individuals. Results - The novel CAD PRS shows superior predictive power for CAD events, compared to other published PRSs and is largely uncorrelated with PCE and QRISK3. When combined with PCE into an integrated risk tool, it has superior predictive accuracy. Overall, 10.4% of incident CAD cases were misclassified as low risk by PCE and correctly classified as high risk by the IRT, compared to 4.4% misclassified by the IRT and correctly classified by PCE. The overall net reclassification improvement for the IRT was 5.9% (95% CI 4.7-7.0). When individuals were stratified into age-by-sex subgroups the improvement was larger for all subgroups (range 8.3%-15.4%), with best performance in 40-54yo men (15.4%, 95% CI 11.6-19.3). Comparable results were found using a different risk tool (QRISK3), and also a broader definition of cardiovascular disease. Use of the IRT is estimated to avoid up to 12,000 deaths in the USA over a 5-year period. Conclusions - An integrated risk tool that includes polygenic risk outperforms current risk stratification tools and offers greater opportunity for early interventions. Given the plummeting costs of genetic tests, future iterations of CAD risk tools would be enhanced with the addition of a person's polygenic risk.
The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16–18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.