Summary Height is a highly heritable, classic polygenic trait with ∼700 common associated variants identified so far through genome-wide association studies. Here, we report 83 height-associated coding variants with lower minor allele frequencies (range of 0.1-4.8%) and effects of up to 2 cm/allele (e.g. in IHH, STC2, AR and CRISPLD2), >10 times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (+1-2 cm/allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes mutated in monogenic growth disorders and highlight new biological candidates (e.g. ADAMTS3, IL11RA, NOX4) and pathways (e.g. proteoglycan/glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate to large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.
Genome-wide association studies (GWAS) have identified >250 loci for body mass index (BMI), implicating pathways related to neuronal biology. Most GWAS loci represent clusters of common, non-coding variants from which pinpointing causal genes remains challenging. Here, we combined data from 718,734 individuals to discover rare and low-frequency (MAF<5%) coding variants associated with BMI. We identified 14 coding variants in 13 genes, of which eight in genes (ZBTB7B, ACHE, RAPGEF3, RAB21, ZFHX3, ENTPD6, ZFR2, ZNF169) newly implicated in human obesity, two (MC4R, KSR2) previously observed in extreme obesity, and two variants in GIPR. Effect sizes of rare variants are ~10 times larger than of common variants, with the largest effect observed in carriers of an MC4R stop-codon (p.Tyr35Ter, MAF=0.01%), weighing ~7kg more than non-carriers. Pathway analyses confirmed enrichment of neuronal genes and provide new evidence for adipocyte and energy expenditure biology, widening the potential of genetically-supported therapeutic targets to treat obesity.
Heritability, the proportion of phenotypic variance explained by genetic factors, can be estimated from pedigree data 1 , but such estimates are uninformative with respect to the underlying genetic architecture. Analyses of data from genome-wide association studies (GWAS) on unrelated individuals have shown that for human traits and disease, approximately one-third to two-thirds of heritability is captured by common SNPs 2-5 . It is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular if the causal variants are rare, or other reasons such as overestimation of heritability from pedigree data. Here we show that pedigree heritability for height and body mass index (BMI) appears to be fully recovered from whole-genome sequence (WGS) data on 21,620 unrelated individuals of European ancestry. We assigned 47.1 million genetic variants to groups based upon their minor allele frequencies (MAF) and linkage disequilibrium (LD) with variants nearby, and estimated and partitioned variation accordingly. The estimated heritability was 0.79 (SE 0.09) for height and 0.40 (SE 0.09) for BMI, consistent with pedigree estimates. Low-MAF variants in low LD with neighbouring variants were enriched for heritability, to a greater extent for protein altering variants, consistent with negative selection thereon. Cumulatively variants in the MAF range of 0.0001 to 0.1 explained 0.54 (SE 0.05) and 0.51 (SE 0.11) of heritability for height and BMI, respectively. Our results imply that the still missing heritability of complex traits and disease is accounted for by rare variants, in particular those in regions of low LD.
Large-scale deep-coverage whole-genome sequencing (WGS) is now feasible and offers potential advantages for locus discovery. We perform WGS in 16,324 participants from four ancestries at mean depth >29X and analyze genotypes with four quantitative traits—plasma total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. Common variant association yields known loci except for few variants previously poorly imputed. Rare coding variant association yields known Mendelian dyslipidemia genes but rare non-coding variant association detects no signals. A high 2M-SNP LDL-C polygenic score (top 5th percentile) confers similar effect size to a monogenic mutation (~30 mg/dl higher for each); however, among those with severe hypercholesterolemia, 23% have a high polygenic score and only 2% carry a monogenic mutation. At these sample sizes and for these phenotypes, the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia.
Mitochondria (MT), the major site of cellular energy production, are under dual genetic control by 37 mitochondrial DNA (mtDNA) genes and numerous nuclear genes (MT-nDNA). In the CHARGEmtDNAþ Consortium, we studied genetic associations of mtDNA and MT-nDNA associations with body mass index (BMI), waist-hip-ratio (WHR), glucose, insulin, HOMA-B, HOMA-IR, and HbA1c. This 45-cohort collaboration comprised 70,775 (insulin) to 170,202 (BMI) pan-ancestry individuals. Validation and imputation of mtDNA variants was followed by single-variant and gene-based association testing. We report two significant common variants, one in MT-ATP6 associated (p % 5EÀ04) with WHR and one in the D-loop with glucose. Five rare variants in MT-ATP6, MT-ND5, and MT-ND6 associated with BMI, WHR, or insulin. Gene-based meta-analysis identified MT-ND3 associated with BMI (p % 1EÀ03). We considered 2,282 MT-nDNA candidate gene associations compiled from online summary results for our traits (20 unique studies with 31 dataset consortia's genome-wide associations [GWASs]). Of these, 109 genes associated (p % 1EÀ06) with at least 1 of our 7 traits. We assessed regulatory features of variants in the 109 genes, cis-and trans-gene expression regulation, and performed enrichment and protein-protein interactions analyses. Of the identified mtDNA and MT-nDNA genes, 79 associated with adipose measures, 49 with glucose/insulin, 13 with risk for type 2 diabetes, and 18 with cardiovascular disease, indicating for pleiotropic effects with health implications. Additionally, 21 genes related to cholesterol, suggesting additional important roles for the genes identified. Our results suggest that mtDNA and MT-nDNA genes and variants reported make important contributions to glucose and insulin metabolism, adipocyte regulation, diabetes, and cardiovascular disease.
Deep-coverage whole genome sequencing at the population level is now feasible and offers potential advantages for locus discovery, particularly in the analysis rare mutations in non-coding regions. Here, we performed whole genome sequencing in 16,324 participants from four ancestries at mean depth >29X and analyzed correlations of genotypes with four quantitative traits – plasma levels of total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. We conducted a discovery analysis including common or rare variants in coding as well as non-coding regions and developed a framework to interpret genome sequence for dyslipidemia risk. Common variant association yielded loci previously described with the exception of a few variants not captured earlier by arrays or imputation. In coding sequence, rare variant association yielded known Mendelian dyslipidemia genes and, in non-coding sequence, we detected no rare variant association signals after application of four approaches to aggregate variants in non-coding regions. We developed a new, genome-wide polygenic score for LDL-C and observed that a high polygenic score conferred similar effect size to a monogenic mutation (~30 mg/dl higher LDL-C for each); however, among those with extremely high LDL-C, a high polygenic score was considerably more prevalent than a monogenic mutation (23% versus 2% of participants, respectively).
De novo mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole-genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) Program, we called 93,325 single-nucleotide DNMs across 1,465 trios from an array of diverse human populations, and used them to directly estimate and analyze DNM counts, rates, and spectra. We find a significant positive correlation between local recombination rate and local DNM rate, and that DNM rate explains a substantial portion (8.98 to 34.92%, depending on the model) of the genome-wide variation in population-level genetic variation from 41K unrelated TOPMed samples. Genome-wide heterozygosity does correlate with DNM rate, but only explains <1% of variation. While we are underpowered to see small differences, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, we did find significantly fewer DNMs in Amish individuals, even when compared with other Europeans, and even after accounting for parental age and sequencing center. Specifically, we found significant reductions in the number of C→A and T→C mutations in the Amish, which seem to underpin their overall reduction in DNMs. Finally, we calculated near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.