Polygenic scores have recently been used to summarise genetic effects among an ensemble of markers that do not individually achieve significance in a large-scale association study. Markers are selected using an initial training sample and used to construct a score in an independent replication sample by forming the weighted sum of associated alleles within each subject. Association between a trait and this composite score implies that a genetic signal is present among the selected markers, and the score can then be used for prediction of individual trait values. This approach has been used to obtain evidence of a genetic effect when no single markers are significant, to establish a common genetic basis for related disorders, and to construct risk prediction models. In some cases, however, the desired association or prediction has not been achieved. Here, the power and predictive accuracy of a polygenic score are derived from a quantitative genetics model as a function of the sizes of the two samples, explained genetic variance, selection thresholds for including a marker in the score, and methods for weighting effect sizes in the score. Expressions are derived for quantitative and discrete traits, the latter allowing for case/control sampling. A novel approach to estimating the variance explained by a marker panel is also proposed. It is shown that published studies with significant association of polygenic scores have been well powered, whereas those with negative results can be explained by low sample size. It is also shown that useful levels of prediction may only be approached when predictors are estimated from very large samples, up to an order of magnitude greater than currently available. Therefore, polygenic scores currently have more utility for association testing than predicting complex traits, but prediction will become more feasible as sample sizes continue to grow.
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17–29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn’s disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.
We examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects. The combined stage 1 and 2 analysis yielded genome-wide significant associations with schizophrenia for seven loci, five of which are new (1p21.3, 2q32.3, 8p23.2, 8q21.3 and 10q24.32-q24.33) and two of which have been previously implicated (6p21.32-p22.1 and 18q21.2). The strongest new finding (P = 1.6 × 10−11) was with rs1625579 within an intron of a putative primary transcript for MIR137 (microRNA 137), a known regulator of neuronal development. Four other schizophrenia loci achieving genome-wide significance contain predicted targets of MIR137, suggesting MIR137-mediated dysregulation as a previously unknown etiologic mechanism in schizophrenia. In a joint analysis with a bipolar disorder sample (16,374 affected individuals and 14,044 controls), three loci reached genome-wide significance: CACNA1C (rs4765905, P = 7.0 × 10−9), ANK3 (rs10994359, P = 2.5 × 10−8) and the ITIH3-ITIH4 region (rs2239547, P = 7.8 × 10−9).
Schizophrenia, a devastating psychiatric disorder, has a prevalence of 0.5–1%, with high heritability (80–85%) and complex transmission.1 Recent studies implicate rare, large, high-penetrance copy number variants (CNVs) in some cases2, but it is not known what genes or biological mechanisms underlie susceptibility. Here we show that schizophrenia is significantly associated with single nucleotide polymorphisms (SNPs) in the extended Major Histocompatibility Complex (MHC) region on chromosome 6. We carried out a genome-wide association study (GWAS) of common SNPs in the Molecular Genetics of Schizophrenia (MGS) case-control sample, and then a meta-analysis of data from the MGS, International Schizophrenia Consortium (ISC) and SGENE datasets. No MGS finding achieved genome-wide statistical significance. In the meta-analysis of European-ancestry subjects (8,008 cases, 19,077 controls), significant association with schizophrenia was observed in a region of linkage disequilibrium on chromosome 6p22.1 (P = 9.54 × 10−9). This region includes a histone gene cluster and several immunity-related genes, possibly implicating etiologic mechanisms involving chromatin modification, transcriptional regulation, auto-immunity and/or infection. These results demonstrate that common schizophrenia susceptibility alleles can be detected. The characterization of these signals will suggest important directions for research on susceptibility mechanisms.
Copy number variants (CNVs) have been strongly implicated in the genetic etiology of schizophrenia (SCZ). However, genome-wide investigation of the contribution of CNV to risk has been hampered by limited sample sizes. We sought to address this obstacle by applying a centralized analysis pipeline to a SCZ cohort of 21,094 cases and 20,227 controls. A global enrichment of CNV burden was observed in cases (OR=1.11, P=5.7×10−15), which persisted after excluding loci implicated in previous studies (OR=1.07, P=1.7 ×10−6). CNV burden was enriched for genes associated with synaptic function (OR = 1.68, P = 2.8 ×10−11) and neurobehavioral phenotypes in mouse (OR = 1.18, P= 7.3 ×10−5). Genome-wide significant evidence was obtained for eight loci, including 1q21.1, 2p16.3 (NRXN1), 3q29, 7q11.2, 15q13.3, distal 16p11.2, proximal 16p11.2 and 22q11.2. Suggestive support was found for eight additional candidate susceptibility and protective loci, which consisted predominantly of CNVs mediated by non-allelic homologous recombination.
We conducted a combined genome-wide association (GWAS) analysis of 7,481 individuals affected with bipolar disorder and 9,250 control individuals within the Psychiatric Genomewide Association Study Consortium Bipolar Disorder group (PGC-BD). We performed a replication study in which we tested 34 independent SNPs in 4,493 independent bipolar disorder cases and 42,542 independent controls and found strong evidence for replication. In the replication sample, 18 of 34 SNPs had P value < 0.05, and 31 of 34 SNPs had signals with the same direction of effect (P = 3.8 × 10−7). In the combined analysis of all 63,766 subjects (11,974 cases and 51,792 controls), genome-wide significant evidence for association was confirmed for CACNA1C and found for a novel gene ODZ4. In a combined analysis of non-overlapping schizophrenia and bipolar GWAS samples we observed strong evidence for association with SNPs in CACNA1C and in the region of NEK4/ITIH1,3,4. Pathway analysis identified a pathway comprised of subunits of calcium channels enriched in the bipolar disorder association intervals. The strength of the replication data implies that increasing samples sizes in bipolar disorder will confirm many additional loci.
Association tests of multilocus haplotypes are of interest both in linkage disequilibrium mapping and in candidate gene studies. For case-parent trios, I discuss the extension of existing multilocus methods to include ambiguous haplotypes in tests of models which distinguish between the cis and trans phase. A likelihood-ratio test is proposed, using the expectation-maximization (E-M) algorithm to account for haplotype ambiguities. Assumptions about the population structure are required, but realistic situations, including population stratification, which violate the assumptions lead to conservative tests. I describe a permutation procedure for the null hypothesis of interest, which controls for violation of the assumptions. For general pedigrees, I describe extensions of the pedigree disequilibrium test to include uncertain haplotypes. The summary statistics are replaced by their expected values over prior distributions of haplotype frequencies. If prior distributions are not available, a valid test is possible by using the E-M algorithm to estimate the null distribution of haplotype frequencies. Similar methods are available for quantitative traits. Exact permutation tests are difficult to construct in small samples, but an approximate procedure is appropriate in large samples, and can be used to account for dependencies between tests of multiple haplotypes and loci.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.