Accurate methods to assess the pathogenicity of mutations are needed to fully leverage the possibilities of genome sequencing in diagnosis. Current data-driven and bioinformatics approaches are, however, limited by the large number of new variations found in each newly sequenced genome, and often do not provide direct mechanistic insight. Here we demonstrate, for the first time, that saturation mutagenesis, biophysical modeling and co-variation analysis, performed in silico, can predict the abundance, metabolic stability, and function of proteins inside living cells. As a model system, we selected the human mismatch repair protein, MSH2, where missense variants are known to cause the hereditary cancer predisposition disease, known as Lynch syndrome. We show that the majority of disease-causing MSH2 mutations give rise to folding defects and proteasome-dependent degradation rather than inherent loss of function, and accordingly our in silico modeling data accurately identifies disease-causing mutations and outperforms the traditionally used genetic disease predictors. Thus, in conclusion, in silico biophysical modeling should be considered for making genotype-phenotype predictions and for diagnosis of Lynch syndrome, and perhaps other hereditary diseases.
Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ≈8.8 million stability changes for nearly all single amino acid changes in 1,381 human proteins, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available and enables large-scale analyses of stability in experimental and predicted protein structures.
Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 300 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.
Strategies for investigating and optimizing the expression and folding of proteins for biotechnological and pharmaceutical purposes are in high demand. Here, we describe a dual-reporter biosensor system that simultaneously assesses in vivo protein translation and protein folding, thereby enabling rapid screening of mutant libraries. We have validated the dual-reporter system on five different proteins and find an excellent correlation between reporter signals and the levels of protein expression and solubility of the proteins. We further demonstrate the applicability of the dual-reporter system as a screening assay for deep mutational scanning experiments. The system enables high throughput selection of protein variants with high expression levels and altered protein stability. Next generation sequencing analysis of the resulting libraries of protein variants show a good correlation between computationally predicted and experimentally determined protein stabilities. We furthermore show that the mutational experimental data obtained using this system may be useful for protein structure calculations.
Bactofilins constitute a recently discovered class of bacterial proteins that form cytoskeletal filaments. They share a highly conserved domain (DUF583) of which the structure remains unknown, in part due to the large size and noncrystalline nature of the filaments. Here, we describe the atomic structure of a bactofilin domain from Caulobacter crescentus. To determine the structure, we developed an approach that combines a biophysical model for proteins with recently obtained solid-state NMR spectroscopy data and amino acid contacts predicted from a detailed analysis of the evolutionary history of bactofilins. Our structure reveals a triangular b-helical (solenoid) conformation with conserved residues forming the tightly packed core and polar residues lining the surface. The repetitive structure explains the presence of internal repeats as well as strongly conserved positions, and is reminiscent of other fibrillar proteins. Our work provides a structural basis for future studies of bactofilin biology and for designing molecules that target them, as well as a starting point for determining the organization of the entire bactofilin filament. Finally, our approach presents new avenues for determining structures that are difficult to obtain by traditional means.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.