Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Various computational strategies have been proposed to predict the effects of missense variants across the human proteome, using many different predictive signals. Here, we present a robust learning framework for leveraging functional assay data to construct computational predictors of disease variant effects. We train cross-protein transfer (CPT) models using deep mutational scanning data from only five proteins and achieve state-of-the-art performance on unseen proteins across the human proteome. On human disease variants annotated in ClinVar, our model CPT-1 improves specificity at 95% sensitivity to 64%, from 31% for ESM-1v and 50% for EVE. Our framework combines general protein sequence models with vertebrate sequence alignments and AlphaFold2 structures, and it is adaptable to the future inclusion of other sources of information. We release predictions for all missense variants in 90% of human genes. Our results establish the utility of functional assay data for learning general properties of variants that can transfer to unseen proteins.
Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel training methodology and unique feature set to improve the AUROC by 18% and decrease the false-positive rate 4-fold on large variant databases. In patient exomes, X-CAP prioritizes causal stopgains better than existing methods do, further illustrating its clinical utility. X-CAP is available at https://github.com/bejerano-lab/X-CAP.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.