Polygenic scores are a popular tool for prediction of complex traits. However, prediction estimates in samples of unrelated participants can include effects of population stratification, assortative mating, and environmentally mediated parental genetic effects, a form of genotype-environment correlation (rGE). Comparing genome-wide polygenic score (GPS) predictions in unrelated individuals with predictions between siblings in a within-family design is a powerful approach to identify these different sources of prediction. Here, we compared within-to between-family GPS predictions of eight outcomes (anthropometric, cognitive, personality, and health) for eight corresponding GPSs. The outcomes were assessed in up to 2,366 dizygotic (DZ) twin pairs from the Twins Early Development Study from age 12 to age 21. To account for family clustering, we used mixed-effects modeling, simultaneously estimating within-and between-family effects for target-and cross-trait GPS prediction of the outcomes. There were three main findings: (1) DZ twin GPS differences predicted DZ differences in height, BMI, intelligence, educational achievement, and ADHD symptoms; (2) target and cross-trait analyses indicated that GPS prediction estimates for cognitive traits (intelligence and educational achievement) were on average 60% greater between families than within families, but this was not the case for non-cognitive traits; and (3) much of this within-and between-family difference for cognitive traits disappeared after controlling for family socioeconomic status (SES), suggesting that SES is a major source of between-family prediction through rGE mechanisms. These results provide insights into the patterns by which rGE contributes to GPS prediction, while ruling out confounding due to population stratification and assortative mating.
Background Diverse behaviour problems in childhood correlate phenotypically, suggesting a general dimension of psychopathology that has been called the p factor. The shared genetic architecture between childhood psychopathology traits also supports a genetic p. This study systematically investigates the manifestation of this common dimension across self‐, parent‐ and teacher‐rated measures in childhood and adolescence. Methods The sample included 7,026 twin pairs from the Twins Early Development Study (TEDS). First, we employed multivariate twin models to estimate common genetic and environmental influences on p based on diverse measures of behaviour problems rated by children, parents and teachers at ages 7, 9, 12 and 16 (depressive traits, emotional problems, peer problems, autism traits, hyperactivity, antisocial behaviour, conduct problems and psychopathic tendencies). Second, to assess the stability of genetic and environmental influences on p across time, we conducted longitudinal twin modelling of the first phenotypic principal components of childhood psychopathological measures across each of the four ages. Third, we created a genetic p factor in 7,026 unrelated genotyped individuals based on eight polygenic scores for psychiatric disorders to estimate how a general polygenic predisposition to mostly adult psychiatric disorders relates to childhood p. Results Behaviour problems were consistently correlated phenotypically and genetically across ages and raters. The p factor is substantially heritable (50%–60%) and manifests consistently across diverse ages and raters. However, residual variation in the common factor models indicates unique contributions as well. Genetic correlations of p components across childhood and adolescence suggest stability over time (49%–78%). A polygenic general psychopathology factor derived from studies of psychiatric disorders consistently predicted a general phenotypic p factor across development (0.3%–0.9%). Conclusions Diverse forms of psychopathology generally load on a common p factor, which is highly heritable. There are substantial genetic influences on the stability of p across childhood. Our analyses indicate genetic overlap between general risk for psychiatric disorders in adulthood and p in childhood, even as young as age 7. The p factor has far‐reaching implications for genomic research and, eventually, for diagnosis and treatment of behaviour problems.
Estimates from genome-wide association studies (GWAS) represent a combination of the effect of inherited genetic variation (direct effects), demography (population stratification, assortative mating) and genetic nurture from relatives (indirect genetic effects). GWAS using family-based designs can control for demography and indirect genetic effects, but large-scale family datasets have been lacking. We combined data on 159,701 siblings from 17 cohorts to generate population (between-family) and within-sibship (within-family) estimates of genome-wide genetic associations for 25 phenotypes. We demonstrate that existing GWAS associations for height, educational attainment, smoking, depressive symptoms, age at first birth and cognitive ability overestimate direct effects. We show that estimates of SNP-heritability, genetic correlations and Mendelian randomization involving these phenotypes substantially differ when calculated using within-sibship estimates. For example, genetic correlations between educational attainment and height largely disappear. In contrast, analyses of most clinical phenotypes (e.g. LDL-cholesterol) were generally consistent between population and within-sibship models. We also report compelling evidence of polygenic adaptation on taller human height using within-sibship data. Large-scale family datasets provide new opportunities to quantify direct effects of genetic variation on human traits and diseases.
While volunteer-based studies such as the UK Biobank have become the cornerstone of genetic epidemiology, the participating individuals are rarely representative of their target population. To evaluate the impact of selective participation, here we derived UK Biobank participation probabilities on the basis of 14 variables harmonized across the UK Biobank and a representative sample. We then conducted weighted genome-wide association analyses on 19 traits. Comparing the output from weighted genome-wide association analyses (neffective = 94,643 to 102,215) with that from standard genome-wide association analyses (n = 263,464 to 283,749), we found that increasing representativeness led to changes in SNP effect sizes and identified novel SNP associations for 12 traits. While heritability estimates were less impacted by weighting (maximum change in h2, 5%), we found substantial discrepancies for genetic correlations (maximum change in rg, 0.31) and Mendelian randomization estimates (maximum change in βSTD, 0.15) for socio-behavioural traits. We urge the field to increase representativeness in biobank samples, especially when studying genetic correlates of behaviour, lifestyles and social outcomes.
Background Identifying causal risk factors for self-harm is essential to inform preventive interventions. Epidemiological studies have identified risk factors associated with self-harm, but these associations can be subject to confounding. By implementing genetically informed methods to better account for confounding, this study aimed to better identify plausible causal risk factors for self-harm. Methods and findings Using summary statistics from 24 genome-wide association studies (GWASs) comprising 16,067 to 322,154 individuals, polygenic scores (PSs) were generated to index 24 possible individual risk factors for self-harm (i.e., mental health vulnerabilities, substance use, cognitive traits, personality traits, and physical traits) among a subset of UK Biobank participants (N = 125,925, 56.2% female) who completed an online mental health questionnaire in the period from 13 July 2016 to 27 July 2017. In total, 5,520 (4.4%) of these participants reported having self-harmed in their lifetime. In binomial regression models, PSs indexing 6 risk factors (major depressive disorder [MDD], attention deficit/hyperactivity disorder [ADHD], bipolar disorder, schizophrenia, alcohol dependence disorder, and lifetime cannabis use) predicted self-harm, with effect sizes ranging from odds ratio (OR) = 1.05 (95% CI 1.02 to 1.07, q = 0.008) for lifetime cannabis use to OR = 1.20 (95% CI 1.16 to 1.23, q = 1.33 × 10 −35) for MDD. No systematic differences emerged between suicidal and non-suicidal self-harm. To further probe causal relationships, two-sample Mendelian randomisation (MR) analyses were conducted, with MDD, ADHD, and schizophrenia emerging as the most plausible causal risk factors for self-harm. The genetic liabilities for MDD and schizophrenia were associated with self-harm independently of diagnosis and medication. Main limitations include the lack of representativeness of the UK Biobank sample, that self-harm was self-PLOS MEDICINE
Analysis of secondary data sources (such as cohort studies, survey data, and administrative records) has the potential to provide answers to science and society’s most pressing questions. However, researcher biases can lead to questionable research practices in secondary data analysis, which can distort the evidence base. While pre-registration can help to protect against researcher biases, it presents challenges for secondary data analysis. In this article, we describe these challenges and propose novel solutions and alternative approaches. Proposed solutions include approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) help ensure that pre-registered analyses will be appropriate for the data, and (4) address difficulties arising from reduced analytic flexibility in pre-registration. For each solution, we provide guidance on implementation for researchers and data guardians. The adoption of these practices can help to protect against researcher bias in secondary data analysis, to improve the robustness of research based on existing data.
Polygenic scores are increasingly powerful predictors of educational achievement. It is unclear, however, how sets of polygenic scores, which partly capture environmental effects, perform jointly with sets of environmental measures, which are themselves heritable, in prediction models of educational achievement.Here, for the first time, we systematically investigate gene-environment correlation (rGE) and interaction (GxE) in the joint analysis of multiple genome-wide polygenic scores (GPS) and multiple environmental measures as they predict tested educational achievement (EA). We predict EA in a representative sample of 7,026 16-year-olds, with 20 GPS for psychiatric, cognitive and anthropometric traits, and 13 environments (including life events, home environment, and SES) measured earlier in life. Environmental and GPS predictors were modelled, separately and jointly, in penalized regression models with out-of-sample comparisons of prediction accuracy, considering the implications that their interplay had on model performance.Jointly modelling multiple GPS and environmental factors significantly improved prediction of EA, with cognitive-related GPS adding unique independent information beyond SES, home environment and life events. We found evidence for rGE underlying variation in EA (rGE = .36; 95% CIs = .29, .43). We estimated that 38% (95% CIs = 29%, 49%) of the GPS effects on EA were mediated by environmental effects, and in turn that 18% (95% CIs =12%, 25%) of environmental effects were accounted for by the GPS model. Lastly, we did not find evidence that GxE effects collectively contributed to multivariable prediction.Our multivariable polygenic and environmental prediction model suggests widespread rGE and unsystematic GxE contributions to EA in adolescence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.