The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.
BACKGROUND Family history is a significant risk factor for prostate cancer, although the molecular basis for this association is poorly understood. Linkage studies have implicated chromosome 17q21-22 as a possible location of a prostate-cancer susceptibility gene. METHODS We screened more than 200 genes in the 17q21-22 region by sequencing germline DNA from 94 unrelated patients with prostate cancer from families selected for linkage to the candidate region. We tested family members, additional case subjects, and control subjects to characterize the frequency of the identified mutations. RESULTS Probands from four families were discovered to have a rare but recurrent mutation (G84E) in HOXB13 (rs138213197), a homeobox transcription factor gene that is important in prostate development. All 18 men with prostate cancer and available DNA in these four families carried the mutation. The carrier rate of the G84E mutation was increased by a factor of approximately 20 in 5083 unrelated subjects of European descent who had prostate cancer, with the mutation found in 72 subjects (1.4%), as compared with 1 in 1401 control subjects (0.1%) (P = 8.5×10−7). The mutation was significantly more common in men with early-onset, familial prostate cancer (3.1%) than in those with late-onset, nonfamilial prostate cancer (0.6%) (P = 2.0×10−6). CONCLUSIONS The novel HOXB13 G84E variant is associated with a significantly increased risk of hereditary prostate cancer. Although the variant accounts for a small fraction of all prostate cancers, this finding has implications for prostate-cancer risk assessment and may provide new mechanistic insights into this common cancer. (Funded by the National Institutes of Health and others.)
Summary Height is a highly heritable, classic polygenic trait with ∼700 common associated variants identified so far through genome-wide association studies. Here, we report 83 height-associated coding variants with lower minor allele frequencies (range of 0.1-4.8%) and effects of up to 2 cm/allele (e.g. in IHH, STC2, AR and CRISPLD2), >10 times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (+1-2 cm/allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes mutated in monogenic growth disorders and highlight new biological candidates (e.g. ADAMTS3, IL11RA, NOX4) and pathways (e.g. proteoglycan/glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate to large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.
Prostate cancer is the most frequently diagnosed cancer in males in developed countries. To identify common prostate cancer susceptibility alleles, we genotyped 211,155 SNPs on a custom Illumina array (iCOGS) in blood DNA from 25,074 prostate cancer cases and 24,272 controls from the international PRACTICAL Consortium. Twenty-three new prostate cancer susceptibility loci were identified at genome-wide significance (P < 5 × 10−8). More than 70 prostate cancer susceptibility loci, explaining ~30% of the familial risk for this disease, have now been identified. On the basis of combined risks conferred by the new and previously known risk loci, the top 1% of the risk distribution has a 4.7-fold higher risk than the average of the population being profiled. These results will facilitate population risk stratification for clinical studies.
Summary Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance1,2. When MI occurs early in life, the role of inheritance is substantially greater1. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families3–8 whereas common variants at more than 45 loci have been associated with MI risk in the population9–15. Here, we evaluate the contribution of rare mutations to MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes where rare coding-sequence mutations were more frequent in cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare, damaging mutations (3.1% of cases versus 1.3% of controls) were at 2.4-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). This sequence-based estimate of the proportion of early MI cases due to LDLR mutations is remarkably similar to an estimate made more than 40 years ago using total cholesterol16. At apolipoprotein A-V (APOA5), carriers of rare nonsynonymous mutations (1.4% of cases versus 0.6% of controls) were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase15,17 and apolipoprotein C318,19. When combined, these observations suggest that, beyond LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
Age is the dominant risk factor for most chronic human diseases; yet the mechanisms by which aging confers this risk are largely unknown. 1 Recently, the age-related acquisition of somatic mutations in regenerating hematopoietic stem cell populations leading to clonal expansion was associated with both hematologic cancer 2 – 4 and coronary heart disease 5 , a phenomenon termed ‘Clonal Hematopoiesis of Indeterminate Potential’ (CHIP). 6 Simultaneous germline and somatic whole genome sequence analysis now provides the opportunity to identify root causes of CHIP. Here, we analyze high-coverage whole genome sequences from 97,691 participants of diverse ancestries in the NHLBI TOPMed program and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid, and inflammatory traits specific to different CHIP genes. Association of a genome-wide set of germline genetic variants identified three genetic loci associated with CHIP status, including one locus at TET2 that was African ancestry specific. In silico -informed in vitro evaluation of the TET2 germline locus identified a causal variant that disrupts a TET2 distal enhancer resulting in increased hematopoietic stem cell self-renewal. Overall, we observe that germline genetic variation shapes hematopoietic stem cell function leading to CHIP through mechanisms that are both specific to clonal hematopoiesis and shared mechanisms leading to somatic mutations across tissues.
Prostate cancer (PrCa) is the most frequently diagnosed male cancer in developed countries. To identify common PrCa susceptibility alleles, we have previously conducted a genome-wide association study in which 541, 129 SNPs were genotyped in 1,854 PrCa cases with clinically detected disease and 1,894 controls. We have now evaluated promising associations in a second stage, in which we genotyped 43,671 SNPs in 3,650 PrCa cases and 3,940 controls, and a third stage, involving an additional 16,229 cases and 14,821 controls from 21 studies. In addition to previously identified loci, we identified a further seven new prostate cancer susceptibility loci on chromosomes 2, 4, 8, 11, and 22 (P=1.6×10 −8 to P=2.7×10 −33 ).Genome-wide association studies (GWAS) provide a powerful approach to identify common disease alleles. We previously conducted a GWAS 1 , based on genotyping of 541, 129 SNPs in 1,854 clinically detected PrCa cases and 1,894 controls (see Figure 1, stage 1). Follow-up genotyping of SNPs exhibiting strong evidence of association (P<10 −6 ), in a further 3,268 cases and 3,366 controls, allowed us to identify SNPs at 7 susceptibility loci associated with the disease at genome-wide levels of significance 1 . Other studies have identified an additional 8 loci [2][3][4][5][6][7][8][9] . These loci, however, explain only a small fraction of the familial risk of PrCa. Moreover, the strength of the associations that have been detected are generally small (perallele odds ratios, OR, 1.1-1.2), and the power of the existing studies to detect many of the susceptibility alleles has been limited. It is highly likely, therefore, that other PrCa predisposition loci exist, and that such loci should be detectable by studies with larger sample sizes.In an attempt to identify further susceptibility loci, we conducted a more extensive follow-up of SNPs showing evidence of association in stage 1 of our GWAS. We designed a panel of 47,120 SNPs, aiming to include all SNPs with a significant association in stage 1 at P-trend (1df)<.05 or P(2df)<.01 (see Online Methods). These SNPs were genotyped using the Illumina iSELECT platform in 3,894 PrCa cases and 4,055 controls from the United Kingdom (UK) and Australia ( Figure 1, stage 2). After quality control (QC) exclusions (as described in Online Methods), we utilised data from 43,671 SNPs in 3,650 PrCa cases and 3,940 controls. NIH-PA Author ManuscriptNIH-PA Author Manuscript NIH-PA Author ManuscriptGenotype frequencies in cases and controls were compared using a 1 degree of freedom (df) Cochran-Armitage trend test (for QQ plots see Supplementary Figure 1). There was little evidence of inflation in the test statistics in the UK samples (estimated inflation factor λ=1.08), but there was more marked inflation in those from Australia (λ=1.23; λ=1.19 for stage 2 overall), suggestive of some population substructure. The Australian samples were selected from three studies (MCCS, RFPCS and EOPCS; see Supplementary Note for cohort descriptions), and further analysis revealed that ...
Chronic periodontitis (CP) is a common oral disease that confers substantial systemic inflammatory and microbial burden and is a major cause of tooth loss. Here, we present the results of a genome-wide association study of CP that was carried out in a cohort of 4504 European Americans (EA) participating in the Atherosclerosis Risk in Communities (ARIC) Study (mean age—62 years, moderate CP—43% and severe CP—17%). We detected no genome-wide significant association signals for CP; however, we found suggestive evidence of association (P < 5 × 10−6) for six loci, including NIN, NPY, WNT5A for severe CP and NCR2, EMR1, 10p15 for moderate CP. Three of these loci had concordant effect size and direction in an independent sample of 656 adult EA participants of the Health, Aging, and Body Composition (Health ABC) Study. Meta-analysis pooled estimates were severe CP (n = 958 versus health: n = 1909)—NPY, rs2521634 [G]: odds ratio [OR = 1.49 (95% confidence interval (CI = 1.28–1.73, P = 3.5 × 10−7))]; moderate CP (n = 2293)—NCR2, rs7762544 [G]: OR = 1.40 (95% CI = 1.24–1.59, P = 7.5 × 10−8), EMR1, rs3826782 [A]: OR = 2.01 (95% CI = 1.52–2.65, P = 8.2 × 10−7). Canonical pathway analysis indicated significant enrichment of nervous system signaling, cellular immune response and cytokine signaling pathways. A significant interaction of NUAK1 (rs11112872, interaction P = 2.9 × 10−9) with smoking in ARIC was not replicated in Health ABC, although estimates of heritable variance in severe CP explained by all single nucleotide polymorphisms increased from 18 to 52% with the inclusion of a genome-wide interaction term with smoking. These genome-wide association results provide information on multiple candidate regions and pathways for interrogation in future genetic studies of CP.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.