Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association (GWA) meta-analysis based in 135,458 cases and 344,901 control, We identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression, and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relations of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine and define the basis of major depression and imply a continuous measure of risk underlies the clinical phenotype.
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Copy number variants (CNVs) have been strongly implicated in the genetic etiology of schizophrenia (SCZ). However, genome-wide investigation of the contribution of CNV to risk has been hampered by limited sample sizes. We sought to address this obstacle by applying a centralized analysis pipeline to a SCZ cohort of 21,094 cases and 20,227 controls. A global enrichment of CNV burden was observed in cases (OR=1.11, P=5.7×10−15), which persisted after excluding loci implicated in previous studies (OR=1.07, P=1.7 ×10−6). CNV burden was enriched for genes associated with synaptic function (OR = 1.68, P = 2.8 ×10−11) and neurobehavioral phenotypes in mouse (OR = 1.18, P= 7.3 ×10−5). Genome-wide significant evidence was obtained for eight loci, including 1q21.1, 2p16.3 (NRXN1), 3q29, 7q11.2, 15q13.3, distal 16p11.2, proximal 16p11.2 and 22q11.2. Suggestive support was found for eight additional candidate susceptibility and protective loci, which consisted predominantly of CNVs mediated by non-allelic homologous recombination.
Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (one every 17 bases) and geographically localized, such that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. Overall we conclude that, due to rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
Liability to alcohol dependence (AD) is heritable, but little is known about its complex polygenic architecture or its genetic relationship with other disorders. To discover loci associated with AD and characterize the relationship between AD and other psychiatric and behavioral outcomes, we carried out the largest GWAS to date of DSM-IV diagnosed AD. Genome-wide data on 14,904 individuals with AD and 37,944 controls from 28 case/control and family-based studies were meta-analyzed, stratified by genetic ancestry (European, N = 46,568; African; N = 6,280). Independent, genome-wide significant effects of different ADH1B variants were identified in European (rs1229984; p = 9.8E-13) and African ancestries (rs2066702; p = 2.2E-9). Significant genetic correlations were observed with 17 phenotypes, including schizophrenia, ADHD, depression, and use of cigarettes and cannabis. The genetic underpinnings of AD only partially overlap with those for alcohol consumption, underscoring the genetic distinction between pathological and non-pathological drinking behaviors.
Major depressive disorder (MDD), one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide1, poses a major challenge to genetic analysis. To date no robustly replicated genetic loci have been identified 2, despite analysis of more than 9,000 cases3. Using low coverage genome sequence of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD, we identified and replicated two genome-wide significant loci contributing to risk of MDD on chromosome 10: one near the SIRT1 gene (P-value = 2.53×10−10) the other in an intron of the LHPP gene (P = 6.45×10−12). Analysis of 4,509 cases with a severe subtype of MDD, melancholia, yielded an increased genetic signal at the SIRT1 locus. We attribute our success to the recruitment of relatively homogeneous cases with severe illness.
Although association analysis is a useful tool for uncovering the genetic underpinnings of complex traits, its utility is diminished by population substructure, which can produce spurious association between phenotype and genotype within population-based samples. Because family-based designs are robust against substructure, they have risen to the fore of association analysis. Yet, if population substructure could be ignored, this robustness can come at the price of power. Unfortunately it is rarely evident when population substructure can be ignored. Devlin and Roeder recently have proposed a method, termed "genomic control" (GC), which has the robustness of family-based designs even though it uses population-based data. GC uses the genome itself to determine appropriate corrections for population-based association tests. Using the GC method, we contrast the power of two study designs, family trios (i.e., father, mother, and affected progeny) versus case-control. For analysis of trios, we use the TDT test. When population substructure is absent, we find GC is always more powerful than TDT; furthermore, contrary to previous results, we show that as a disease becomes more prevalent the discrepancy in power becomes more extreme. When population substructure is present, however, the results are more complex: TDT is more powerful when population substructure is substantial, and GC is more powerful otherwise. We also explore general issues of power and implementation of GC within the case-control setting and find that, economically, GC is at least comparable to and often less expensive than family-based methods. Therefore, GC methods should prove a useful complement to family-based methods for the genetic analysis of complex traits.
Scanning the genome for association between markers and complex diseases typically requires testing hundreds of thousands of genetic polymorphisms. Testing such a large number of hypotheses exacerbates the trade-off between power to detect meaningful associations and the chance of making false discoveries. Even before the full genome is scanned, investigators often favor certain regions on the basis of the results of prior investigations, such as previous linkage scans. The remaining regions of the genome are investigated simultaneously because genotyping is relatively inexpensive compared with the cost of recruiting participants for a genetic study and because prior evidence is rarely sufficient to rule out these regions as harboring genes with variation of conferring liability (liability genes). However, the multiple testing inherent in broad genomic searches diminishes power to detect association, even for genes falling in regions of the genome favored a priori. Multiple testing problems of this nature are well suited for application of the false-discovery rate (FDR) principle, which can improve power. To enhance power further, a new FDR approach is proposed that involves weighting the hypotheses on the basis of prior data. We present a method for using linkage data to weight the association P values. Our investigations reveal that if the linkage study is informative, the procedure improves power considerably. Remarkably, the loss in power is small, even when the linkage study is uninformative. For a class of genetic models, we calculate the sample size required to obtain useful prior information from a linkage study. This inquiry reveals that, among genetic models that are seemingly equal in genetic information, some are much more promising than others for this mode of analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.