Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution.Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (<xref ref-type="decision-letter" rid="SA1">see decision letter</xref>).
We introduce a class of M × M sample covariance matrices Q which subsumes and generalizes several previous models. The associated population covariance matrix Σ = EQ is assumed to differ from the identity by a matrix of bounded rank. All quantities except the rank of Σ − IM may depend on M in an arbitrary fashion. We investigate the principal components, i.e. the top eigenvalues and eigenvectors, of Q. We derive precise large deviation estimates on the generalized components w , ξ i of the outlier and non-outlier eigenvectors ξ i . Our results also hold near the so-called BBP transition, where outliers are created or annihilated, and for degenerate or near-degenerate outliers. We believe the obtained rates of convergence to be optimal. In addition, we derive the asymptotic distribution of the generalized components of the non-outlier eigenvectors. A novel observation arising from our results is that, unlike the eigenvalues, the eigenvectors of the principal components contain information about the subcritical spikes of Σ. The proofs use several results on the eigenvalues and eigenvectors of the uncorrelated matrix Q, satisfying EQ = IM , as input: the isotropic local Marchenko-Pastur law established in [10], level repulsion, and quantum unique ergodicity of the eigenvectors. The latter is a special case of a new universality result for the joint eigenvalue-eigenvector distribution.
Given a large, high-dimensional sample from a spiked population, the top sample covariance eigenvalue is known to exhibit a phase transition. We show that the largest eigenvalues have asymptotic distributions near the phase transition in the rank-one spiked real Wishart setting and its general beta analogue, proving a conjecture of Baik, Ben Arous and P\'ech\'e (2005). We also treat shifted mean Gaussian orthogonal and beta ensembles. Such results are entirely new in the real case; in the complex case we strengthen existing results by providing optimal scaling assumptions. One obtains the known limiting random Schr\"odinger operator on the half-line, but the boundary condition now depends on the perturbation. We derive several characterizations of the limit laws in which beta appears as a parameter, including a simple linear boundary value problem. This PDE description recovers known explicit formulas at beta=2,4, yielding in particular a new and simple proof of the Painlev\'e representations for these Tracy-Widom distributions.Comment: 34 pages; minor corrections, references update
Disruptive and damaging ultra-rare variants (URVs) in highly constrained (HC) genes are enriched in individuals with neurodevelopmental disorders. In the general population, this class of variants was associated with a decrease in years of education (YOE; −3.1 months; P-value=3.3×10−8). This effect was stronger among high brain-expressed genes and explained more YOE variance than pathogenic copy number variation, but less than common variants. Disruptive and damaging URVs in HC genes influence the determinants of YOE in the general population.
The top eigenvalues of rank r spiked real Wishart matrices and additively perturbed Gaussian orthogonal ensembles are known to exhibit a phase transition in the large size limit. We show that they have limiting distributions for near-critical perturbations, fully resolving the conjecture of Baik, Ben Arous and Péché [Duke Math. J. (2006) 133 205-235]. The starting point is a new (2r + 1)-diagonal form that is algebraically natural to the problem; for both models it converges to a certain random Schrödinger operator on the half-line with r × r matrix-valued potential. The perturbation determines the boundary condition and the low-lying eigenvalues describe the limit, jointly as the perturbation varies in a fixed subspace. We treat the real, complex and quaternion (β = 1, 2, 4) cases simultaneously. We further characterize the limit laws in terms of a diffusion related to Dyson's Brownian motion, or alternatively a linear parabolic PDE; here β appears simply as a parameter. At β = 2, the PDE appears to reconcile with known Painlevé formulas for these r-parameter deformations of the GUE Tracy-Widom law.
There are established associations between advanced paternal age and offspring risk for psychiatric and developmental disorders. These are commonly attributed to genetic mutations, especially de novo single nucleotide variants (dnSNVs), that accumulate with increasing paternal age. However, the actual magnitude of risk from such mutations in the male germline is unknown. Quantifying this risk would clarify the clinical significance of delayed paternity. Using parent-child trio whole-exome-sequencing data, we estimate the relationship between paternal-age-related dnSNVs and risk for five disorders: autism spectrum disorder (ASD), congenital heart disease, neurodevelopmental disorders with epilepsy, intellectual disability and schizophrenia (SCZ). Using Danish registry data, we investigate whether epidemiologic associations between each disorder and older fatherhood are consistent with the estimated role of dnSNVs. We find that paternal-age-related dnSNVs confer a small amount of risk for these disorders. For ASD and SCZ, epidemiologic associations with delayed paternity reflect factors that may not increase with age.
All of these studies have been based on SNP associations, in most cases with 81 effect sizes discovered by the GIANT Consortium, which most recently combined 79 82 individual GWAS through meta--analysis, encompassing a total of 253,288 83 individuals. [13,14] Here, we show that the selection effects described in these 84 studies are severely attenuated and in some cases no longer significant when using 85 summary statistics derived from the UK Biobank, an independent and larger single 86 study that includes 336,474 genetically unrelated individuals who derive their 87 ancestry almost entirely from British Isles (identified as "white British ancestry" by 88 the UK Biobank) (Supplementary Table S1). The UK Biobank analysis is based on a 89 single cohort drawn from a relatively homogeneous population enabling excellent 90 control of potential population stratification. Our analysis of the UK Biobank data 91 confirms that almost all genome--wide significant loci discovered by the GIANT 92 We began by estimating "polygenic height scores"-sums of allele 101 frequencies at independent SNPs from GIANT weighted by their effect sizes-to 102 study population level differences among ancient and present--day European 103 samples. We used a set of different significance thresholds and strategies to correct 104 for linkage disequilibrium as employed by previous studies, and replicated their 105 signals for significant differences in genetic height across populations. [4-11] 106 ( Figure 1a, Supplementary Figure S2). We then repeated the analysis using 107 summary statistics from a GWAS for height in the UK Biobank restricting to 108 individuals of British Isles ancestry and correcting for population stratification 109 based on the first ten principal components (UKB).ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . http://dx.doi.org/10.1101/355057 doi: bioRxiv preprint first posted online Jun. 25, 2018[15] This analysis resulted in a 110 dramatic attenuation of differences in polygenic height scores (Figure 1a, 111Supplementary Figures S2--S4). The differences between ancient European 112 populations also greatly attenuated (Figure 1a, Supplementary Figure S5). 113 Strikingly, the ordering of the scores for populations also changed depending on 114 which GWAS was used to estimate genetic height both within Europe (Figure 1a, 115 Supplementary Figures S2--S5) and globally (Supplementary Figure S6), 116 consistent with reports from a recent simulation study.[16] The height scores were 117 qualitatively similar only when we restricted to independent genome--wide 118 significant SNPs in GIANT and the UK Biobank (P < 5x10 --8 ) (Supplementary Figure 119 S2b). This replicates the originally reported significant north--south difference in the 120 allele frequency of the height--increasing allele [4] Figure S2b), and that confounding due to stratificat...
Regulatory relationships between transcription factors (TFs) and their target genes lie at the heart of cellular identity and function; however, uncovering these relationships is often labor-intensive and requires perturbations. Here, we propose a principled framework to systematically infer gene regulation for all TFs simultaneously in cells at steady state by leveraging the intrinsic variation in the transcriptional abundance across single cells. Through modeling and simulations, we characterize how transcriptional bursts of a TF gene are propagated to its target genes, including the expected ranges of time delay and magnitude of maximum covariation. We distinguish these temporal trends from the time-invariant covariation arising from cell states, and we delineate the experimental and technical requirements for leveraging these small but meaningful cofluctuations in the presence of measurement noise. While current technology does not yet allow adequate power for definitively detecting regulatory relationships for all TFs simultaneously in cells at steady state, we investigate a small-scale dataset to inform future experimental design. This study supports the potential value of mapping regulatory connections through stochastic variation, and it motivates further technological development to achieve its full potential.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.