Estimates from genome-wide association studies (GWAS) of unrelated individuals capture effects of inherited variation (direct effects), demography (population stratification, assortative mating) and relatives (indirect genetic effects). Family-based GWAS designs can control for demographic and indirect genetic effects, but large-scale family datasets have been lacking. We combined data from 178,086 siblings from 19 cohorts to generate population (between-family) and within-sibship (within-family) GWAS estimates for 25 phenotypes. Within-sibship GWAS estimates were smaller than population estimates for height, educational attainment, age at first birth, number of children, cognitive ability, depressive symptoms and smoking. Some differences were observed in downstream SNP heritability, genetic correlations and Mendelian randomization analyses. For example, the within-sibship genetic correlation between educational attainment and body mass index attenuated towards zero. In contrast, analyses of most molecular phenotypes (for example, low-density lipoprotein-cholesterol) were generally consistent. We also found within-sibship evidence of polygenic adaptation on taller height. Here, we illustrate the importance of family-based GWAS data for phenotypes influenced by demographic and indirect genetic effects.
Estimates from genome-wide association studies (GWAS) represent a combination of the effect of inherited genetic variation (direct effects), demography (population stratification, assortative mating) and genetic nurture from relatives (indirect genetic effects). GWAS using family-based designs can control for demography and indirect genetic effects, but large-scale family datasets have been lacking. We combined data on 159,701 siblings from 17 cohorts to generate population (between-family) and within-sibship (within-family) estimates of genome-wide genetic associations for 25 phenotypes. We demonstrate that existing GWAS associations for height, educational attainment, smoking, depressive symptoms, age at first birth and cognitive ability overestimate direct effects. We show that estimates of SNP-heritability, genetic correlations and Mendelian randomization involving these phenotypes substantially differ when calculated using within-sibship estimates. For example, genetic correlations between educational attainment and height largely disappear. In contrast, analyses of most clinical phenotypes (e.g. LDL-cholesterol) were generally consistent between population and within-sibship models. We also report compelling evidence of polygenic adaptation on taller human height using within-sibship data. Large-scale family datasets provide new opportunities to quantify direct effects of genetic variation on human traits and diseases.
Offspring resemble their parents for both genetic and environmental reasons. Understanding the relative magnitude of these alternatives has long been a core interest in behavioral genetics research, but traditional designs, which compare phenotypic covariances to make inferences about unmeasured genetic and environmental factors, have struggled to disentangle them. Recently, Kong et al. (2018) showed that by correlating offspring phenotypic values with the measured polygenic score of parents’ nontransmitted alleles, one can estimate the effect of “genetic nurture”—a type of passive gene–environment covariation that arises when heritable parental traits directly influence offspring traits. Here, we instantiate this basic idea in a set of causal models that provide novel insights into the estimation of parental influences on offspring. Most importantly, we show how jointly modeling the parental polygenic scores and the offspring phenotypes can provide an unbiased estimate of the variation attributable to the environmental influence of parents on offspring, even when the polygenic score accounts for a small fraction of trait heritability. This model can be further extended to (a) account for the influence of different types of assortative mating, (b) estimate the total variation due to additive genetic effects and their covariance with the familial environment (i.e., the full genetic nurture effect), and (c) model situations where a parental trait influences a different offspring trait. By utilizing structural equation modeling techniques developed for extended twin family designs, our approach provides a general framework for modeling polygenic scores in family studies and allows for various model extensions that can be used to answer old questions about familial influences in new ways.
In a companion paper Balbona et al. (Behav Genet, in press), we introduced a series of causal models that use polygenic scores from transmitted and nontransmitted alleles, the offspring trait, and parental traits to estimate the variation due to the environmental influences the parental trait has on the offspring trait (vertical transmission) as well as additive genetic effects. These models also estimate and account for the gene-gene and gene-environment covariation that arises from assortative mating and vertical transmission respectively. In the current study, we simulated polygenic scores and phenotypes of parents and offspring under genetic and vertical transmission scenarios, assuming two types of assortative mating. We instantiated the models from our companion paper in the OpenMx software, and compared the true values of parameters to maximum likelihood estimates from models fitted on the simulated data to quantify the bias and precision of estimates. We show that parameter estimates from these models are unbiased when assumptions are met, but as expected, they are biased to the degree that assumptions are unmet. Standard errors of the estimated variances due to vertical transmission and to genetic effects decrease with increasing sample sizes and with increasing $$r^2$$ r 2 values of the polygenic score. Even when the polygenic score explains a modest amount of trait variation ($$r^2=.05$$ r 2 = . 05 ), standard errors of these standardized estimates are reasonable ($$< .05$$ < . 05 ) for $$n=16K$$ n = 16 K trios, and can even be reasonable for smaller sample sizes (e.g., down to 4K) when the polygenic score is more predictive. These causal models offer a novel approach for understanding how parents influence their offspring, but their use requires polygenic scores on relevant traits that are modestly predictive (e.g., $$r^2>.025)$$ r 2 > . 025 ) as well as datasets with genomic and phenotypic information on parents and offspring. The utility of polygenic scores for elucidating parental influences should thus serve as additional motivation for large genomic biobanks to perform GWAS’s on traits that may be relevant to parenting and to oversample close relatives, particularly parents and offspring.
Offspring resemble their parents for both genetic and environmental reasons. Understanding the relative magnitude of these alternatives has long been a core interest in behavioral genetics research, but traditional designs, which compare phenotypic covariances to make inferences about unmeasured genetic and environmental factors, have struggled to disentangle them. Recently, Kong et al. (2018) showed that by correlating offspring phenotypic values with the measured polygenic score of parents’ nontransmitted alleles, one can estimate the effect of “genetic nurture”— a type of passive gene-environment covariation that arises when heritable parental traits directly influence offspring traits. Here, we instantiate this basic idea in a set of causal models that provide novel insights into the estimation of parental influences on offspring. Most importantly, we show how jointly modeling the parental polygenic scores and the offspring phenotypes can provide an unbiased estimate of the variation attributable to the environmental influence of parents on offspring, even when the polygenic score accounts for a small fraction of trait heritability. This model can be further extended to a) account for the influence of assortative mating at both equilibrium and disequilibrium (after a single generation of assortment), and b) include measured parental phenotypes, allowing for the estimation of the total variation due to additive genetic effects and their covariance with the familial environment. By utilizing path analysis techniques developed for extended twin family designs, our approach provides a general framework for modeling polygenic scores in family studies and allows for various model extensions that can be used to answer old questions about familial influences in new ways.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.