The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
We aggregated genome-wide genotyping data from 32 European-descent GWAS (74,124 T2D cases, 824,006 controls) imputed to high-density reference panels of >30,000 sequenced haplotypes. Analysis of ˜27M variants (˜21M with minor allele frequency [MAF]<5%), identified 243 genome-wide significant loci (p<5x10-8; MAF 0.02%-50%; odds ratio [OR] 1.04-8.05), 135 not previously-implicated in T2D-predisposition. Conditional analyses revealed 160 additional distinct association signals (p<10-5) within the identified loci. The combined set of 403 T2D-risk signals includes 56 low-frequency (0.5%≤MAF<5%) and 24 rare (MAF<0.5%) index SNPs at 60 loci, including 14 with estimated allelic OR>2. Forty-one of the signals displayed effect-size heterogeneity between BMI-unadjusted and adjusted analyses. Increased sample size and improved imputation led to substantially more precise localisation of causal variants than previously attained: at 51 signals, the lead variant after fine-mapping accounted for >80% posterior probability of association (PPA) and at 18 of these, PPA exceeded 99%. Integration with islet regulatory annotations enriched for T2D association further reduced median credible set size (from 42 variants to 32) and extended the number of index variants with PPA>80% to 73. Although most signals mapped to regulatory sequence, we identified 18 genes as human validated therapeutic targets through coding variants that are causal for disease. Genome wide chip heritability accounted for 18% of T2D-risk, and individuals in the 2.5% extremes of a polygenic risk score generated from the GWAS data differed >9-fold in risk. Our observations highlight how increases in sample size and variant diversity deliver enhanced discovery and single-variant resolution of causal T2D-risk alleles, and the consequent impact on mechanistic insights and clinical translation.
Summary paragraphThe Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency <1% and 46% are singletons. These rare variants provide insights into mutational processes and recent human evolutionary history. The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and extends the reach of nearly all genome-wide association studies to include variants down to ~0.01% in frequency.
Reduced glomerular filtration rate defines chronic kidney disease and is associated with cardiovascular and all-cause mortality. We conducted a meta-analysis of genome-wide association studies for estimated glomerular filtration rate (eGFR), combining data across 133,413 individuals with replication in up to 42,166 individuals. We identify 24 new and confirm 29 previously identified loci. Of these 53 loci, nineteen associate with eGFR among individuals with diabetes. Using bioinformatics, we show that identified genes at eGFR loci are enriched for expression in kidney tissues and in pathways relevant for kidney development and transmembrane transporter activity, kidney structure, and regulation of glucose metabolism. Chromatin state mapping and DNase I hypersensitivity analyses across adult tissues demonstrate preferential mapping of associated variants to regulatory regions in kidney but not extra-renal tissues. These findings suggest that genetic determinants of eGFR are mediated largely through direct effects within the kidney and highlight important cell types and biologic pathways.
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
We aggregated coding variant data for 81,412 type 2 diabetes cases and 370,832 controls of diverse ancestry, identifying 40 coding variant association signals (p<2.2×10−7): of these, 16 map outside known risk loci. We make two important observations. First, only five of these signals are driven by low-frequency variants: even for these, effect sizes are modest (odds ratio ≤1.29). Second, when we used large-scale genome-wide association data to fine-map the associated variants in their regional context, accounting for the global enrichment of complex trait associations in coding sequence, compelling evidence for coding variant causality was obtained for only 16 signals. At 13 others, the associated coding variants clearly represent “false leads” with potential to generate erroneous mechanistic inference. Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets: however, appropriate mechanistic inference requires careful specification of their causal contribution to disease predisposition.
Thyroid dysfunction is an important public health problem, which affects 10% of the general population and increases the risk of cardiovascular morbidity and mortality. Many aspects of thyroid hormone regulation have only partly been elucidated, including its transport, metabolism, and genetic determinants. Here we report a large meta-analysis of genome-wide association studies for thyroid function and dysfunction, testing 8 million genetic variants in up to 72,167 individuals. One-hundred-and-nine independent genetic variants are associated with these traits. A genetic risk score, calculated to assess their combined effects on clinical end points, shows significant associations with increased risk of both overt (Graves’ disease) and subclinical thyroid disease, as well as clinical complications. By functional follow-up on selected signals, we identify a novel thyroid hormone transporter (SLC17A4) and a metabolizing enzyme (AADAT). Together, these results provide new knowledge about thyroid hormone physiology and disease, opening new possibilities for therapeutic targets.
Chronic kidney disease (CKD) is an important public health problem with a genetic component. We performed genome-wide association studies in up to 130,600 European ancestry participants overall, and stratified for key CKD risk factors. We uncovered 6 new loci in association with estimated glomerular filtration rate (eGFR), the primary clinical measure of CKD, in or near MPPED2, DDX1, SLC47A1, CDK12, CASP9, and INO80. Morpholino knockdown of mpped2 and casp9 in zebrafish embryos revealed podocyte and tubular abnormalities with altered dextran clearance, suggesting a role for these genes in renal function. By providing new insights into genes that regulate renal function, these results could further our understanding of the pathogenesis of CKD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.