The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Summary: METAL provides a computationally efficient tool for meta-analysis of genome-wide association scans, which is a commonly used approach for improving power complex traits gene mapping studies. METAL provides a rich scripting interface and implements efficient memory management to allow analyses of very large data sets and to support a variety of input file formats.Availability and implementation: METAL, including source code, documentation, examples, and executables, is available at http://www.sph.umich.edu/csg/abecasis/metal/Contact: goncalo@umich.edu
Sequencing studies are increasingly being conducted to identify rare variants associated with complex traits. The limited power of classical single-marker association analysis for rare variants poses a central challenge in such studies. We propose the sequence kernel association test (SKAT), a supervised, flexible, computationally efficient regression method to test for association between genetic variants (common and rare) in a region and a continuous or dichotomous trait while easily adjusting for covariates. As a score-based variance-component test, SKAT can quickly calculate p values analytically by fitting the null model containing only the covariates, and so can easily be applied to genome-wide data. Using SKAT to analyze a genome-wide sequencing study of 1000 individuals, by segmenting the whole genome into 30 kb regions, requires only 7 hr on a laptop. Through analysis of simulated data across a wide range of practical scenarios and triglyceride data from the Dallas Heart Study, we show that SKAT can substantially outperform several alternative rare-variant association tests. We also provide analytic power and sample-size calculations to help design candidate-gene, whole-exome, and whole-genome sequence association studies.
Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association (GWA) meta-analysis based in 135,458 cases and 344,901 control, We identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression, and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relations of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine and define the basis of major depression and imply a continuous measure of risk underlies the clinical phenotype.
We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.
Circulating glucose levels are tightly regulated. To identify novel glycemic loci, we performed meta-analyses of 21 genome-wide associations studies informative for fasting glucose (FG), fasting insulin (FI) and indices of β-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 non-diabetic participants. Follow-up of 25 loci in up to 76,558 additional subjects identified 16 loci associated with FG/HOMA-B and two associated with FI/HOMA-IR. These include nine new FG loci (in or near ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1 and FAM148B) and one influencing FI/HOMA-IR (near IGF1). We also demonstrated association of ADCY5, PROX1, GCK, GCKR and DGKB/TMEM195 with type 2 diabetes (T2D). Within these loci, likely biological candidate genes influence signal transduction, cell proliferation, development, glucose-sensing and circadian regulation. Our results demonstrate that genetic studies of glycemic traits can identify T2D risk loci, as well as loci that elevate FG modestly, but do not cause overt diabetes.
It was unknown whether plasma protein levels can be estimated based on DNA methylation (DNAm) levels, and if so, how the resulting surrogates can be consolidated into a powerful predictor of lifespan. We present here, seven DNAm-based estimators of plasma proteins including those of plasminogen activator inhibitor 1 (PAI-1) and growth differentiation factor 15. The resulting predictor of lifespan, DNAm GrimAge (in units of years), is a composite biomarker based on the seven DNAm surrogates and a DNAm-based estimator of smoking pack-years. Adjusting DNAm GrimAge for chronological age generated novel measure of epigenetic age acceleration, AgeAccelGrim.Using large scale validation data from thousands of individuals, we demonstrate that DNAm GrimAge stands out among existing epigenetic clocks in terms of its predictive ability for time-to-death (Cox regression P=2.0E-75), time-to-coronary heart disease (Cox P=6.2E-24), time-to-cancer (P= 1.3E-12), its strong relationship with computed tomography data for fatty liver/excess visceral fat, and age-at-menopause (P=1.6E-12). AgeAccelGrim is strongly associated with a host of age-related conditions including comorbidity count (P=3.45E-17). Similarly, age-adjusted DNAm PAI-1 levels are associated with lifespan (P=5.4E-28), comorbidity count (P= 7.3E-56) and type 2 diabetes (P=2.0E-26). These DNAm-based biomarkers show the expected relationship with lifestyle factors including healthy diet and educational attainment.Overall, these epigenetic biomarkers are expected to find many applications including human anti-aging studies.
Identifying reliable biomarkers of aging is a major goal in geroscience. While the first generation of epigenetic biomarkers of aging were developed using chronological age as a surrogate for biological age, we hypothesized that incorporation of composite clinical measures of phenotypic age that capture differences in lifespan and healthspan may identify novel CpGs and facilitate the development of a more powerful epigenetic biomarker of aging. Using an innovative two-step process, we develop a new epigenetic biomarker of aging, DNAm PhenoAge, that strongly outperforms previous measures in regards to predictions for a variety of aging outcomes, including all-cause mortality, cancers, healthspan, physical functioning, and Alzheimer's disease. While this biomarker was developed using data from whole blood, it correlates strongly with age in every tissue and cell tested. Based on an in-depth transcriptional analysis in sorted cells, we find that increased epigenetic, relative to chronological age, is associated with increased activation of pro-inflammatory and interferon pathways, and decreased activation of transcriptional/translational machinery, DNA damage response, and mitochondrial signatures. Overall, this single epigenetic biomarker of aging is able to capture risks for an array of diverse outcomes across multiple tissues and cells, and provide insight into important pathways in aging.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.