More reliable and faster prediction methods are needed to interpret enormous amounts of data generated by sequencing and genome projects. We have developed a new computational tool, PON-P2, for classification of amino acid substitutions in human proteins. The method is a machine learning-based classifier and groups the variants into pathogenic, neutral and unknown classes, on the basis of random forest probability score. PON-P2 is trained using pathogenic and neutral variants obtained from VariBench, a database for benchmark variation datasets. PON-P2 utilizes information about evolutionary conservation of sequences, physical and biochemical properties of amino acids, GO annotations and if available, functional annotations of variation sites. Extensive feature selection was performed to identify 8 informative features among altogether 622 features. PON-P2 consistently showed superior performance in comparison to existing state-of-the-art tools. In 10-fold cross-validation test, its accuracy and MCC are 0.90 and 0.80, respectively, and in the independent test, they are 0.86 and 0.71, respectively. The coverage of PON-P2 is 61.7% in the 10-fold cross-validation and 62.1% in the test dataset. PON-P2 is a powerful tool for screening harmful variants and for ranking and prioritizing experimental characterization. It is very fast making it capable of analyzing large variant datasets. PON-P2 is freely available at http://structure.bmc.lu.se/PON-P2/.
Next-generation sequencing methods have revolutionized the speed of generating variation information. Sequence data have a plethora of applications and will increasingly be used for disease diagnosis. Interpretation of the identified variants is usually not possible with experimental methods. This has caused a bottleneck that many computational methods aim at addressing. Fast and efficient methods for explaining the significance and mechanisms of detected variants are required for efficient precision/personalized medicine. Computational prediction methods have been developed in three areas to address the issue. There are generic tolerance (pathogenicity) predictors for filtering harmful variants. Gene/protein/disease-specific tools are available for some applications. Mechanism and effect-specific computer programs aim at explaining the consequences of variations. Here, we discuss the different types of predictors and their applications. We review available variation databases and prediction methods useful for variation interpretation. We discuss how the performance of methods is assessed and summarize existing assessment studies. A brief introduction is provided to the principles of the methods developed for variation interpretation as well as guidelines for how to choose the optimal tools and where the field is heading in the future.
Chronic obstructive pulmonary disease (COPD) is associated with age and smoking, but other determinants of the disease are incompletely understood. Clonal hematopoiesis of indeterminate potential (CHIP) is a common, age-related state in which somatic mutations in clonal blood populations induce aberrant inflammatory responses. Patients with CHIP have an elevated risk for cardiovascular disease, but the association with COPD remains unclear. We analyzed whole-genome and exome sequencing data to detect CHIP in 48,835 subjects, of whom 8,444 had moderate-to-very-severe COPD, from four separate cohorts with COPD phenotyping and smoking history. We measured emphysema in murine models in which Tet2 was deleted in hematopoietic cells. In COPDGene, individuals with CHIP had a risk of moderate-to-severe and severe or very severe COPD 1.6 and 2.2 times greater than non-carriers, respectively (adjusted 95% confidence intervals [CI], 1.1 to 2.2 and 1.5 to 3.2). These findings were consistent observed in three additional cohorts and meta-analyses of all subjects. CHIP was also associated with decreased FEV1% predicted in COPDGene (mean between group difference -5.7%; adjusted 95% CI, -8.8 to -2.6), a finding replicated in additional cohorts. Smoke exposure was associated with a small but significant increased risk of having CHIP (OR 1.03 per ten pack-years, 95% CI 1.01-1.05) in the meta-analysis of all subjects. Inactivation of Tet2 in mouse hematopoietic cells exacerbated emphysema development and inflammation in cigarette smoke exposure models. Somatic mutations in blood cells are associated with the development and severity of COPD, independent of age and cumulative smoke exposure.
Background: Premature menopause is an independent risk factor for cardiovascular disease in women, but mechanisms underlying this association remain unclear. Clonal hematopoiesis of indeterminate potential (CHIP), the age-related expansion of hematopoietic cells with leukemogenic mutations without detectable malignancy, is associated with accelerated atherosclerosis. Whether premature menopause is associated with CHIP is unknown. Methods: We included postmenopausal women from the UK Biobank (N=11,495) aged 40-70 years with whole exome sequences and from the Women's Health Initiative (WHI, N=8,111) aged 50-79 years with whole genome sequences. Premature menopause was defined as natural or surgical menopause occurring before age 40 years. Co-primary outcomes were the presence of (1) any CHIP and (2) CHIP with variant allele frequency (VAF) >0.1. Logistic regression tested the association of premature menopause with CHIP, adjusted for age, race, the first 10 principal components of ancestry, smoking, diabetes mellitus, and hormone therapy use. Secondary analyses considered natural vs. surgical premature menopause and gene-specific CHIP subtypes. Multivariable-adjusted Cox models tested the association between CHIP and incident coronary artery disease (CAD). Results: The sample included 19,606 women, including 418 (2.1%) with natural premature menopause and 887 (4.5%) with surgical premature menopause. Across cohorts, CHIP prevalence in postmenopausal women with vs. without a history of premature menopause was 8.8% vs. 5.5% (P<0.001), respectively. After multivariable adjustment, premature menopause was independently associated with CHIP (all CHIP: OR 1.36, 95% 1.10-1.68, P=0.004; CHIP with VAF >0.1: OR 1.40, 95% CI 1.10-1.79, P=0.007). Associations were larger for natural premature menopause (all CHIP: OR 1.73, 95% CI 1.23-2.44, P=0.001; CHIP with VAF >0.1: OR 1.91, 95% CI 1.30-2.80, P<0.001) but smaller and non-significant for surgical premature menopause. In gene-specific analyses, only DNMT3A CHIP was significantly associated with premature menopause. Among postmenopausal middle-aged women, CHIP was independently associated with incident coronary artery disease (HR associated with all CHIP: 1.36, 95% CI 1.07-1.73, P=0.012; HR associated with CHIP with VAF >0.1: 1.48, 95% CI 1.13-1.94, P=0.005). Conclusions: Premature menopause, especially natural premature menopause, is independently associated with CHIP among postmenopausal women. Natural premature menopause may serve as a risk signal for predilection to develop CHIP and CHIP-associated cardiovascular disease.
Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.