Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in convolutional neural networks, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D convolutional neural networks can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.
Membrane proteins are prone to misfolding and degradation within the cell, yet the nature of the conformational defects involved in this process remain poorly understood. The earliest stages of membrane protein folding are mediated by the Sec61 translocon, a molecular machine that facilitates the lateral partitioning of the polypeptide into the membrane. Proper membrane integration is an essential prerequisite for folding of the nascent chain. However, the marginal energetic drivers of this reaction suggest the translocon may operate with modest fidelity. In this work, we employed biophysical modeling in conjunction with quantitative biochemical measurements in order to evaluate the extent to which cotranslational folding defects influence membrane protein homeostasis. Protein engineering was employed to selectively perturb the topological energetics of human rhodopsin, and the expression and cellular trafficking of engineered variants were quantitatively compared. Our results reveal clear relationships between topological energetics and the efficiency of rhodopsin biogenesis, which appears to be limited by the propensity of a polar transmembrane domain to achieve its correct topological orientation. Though the polarity of this segment is functionally constrained, we find that its topology can be stabilized in a manner that enhances biogenesis without compromising the functional properties of rhodopsin. Furthermore, sequence alignments reveal this topological instability has been conserved throughout the course of evolution. These results suggest that topological defects significantly contribute to the inefficiency of membrane protein folding in the cell. Additionally, our findings suggest that the marginal stability of rhodopsin may represent an evolved trait.
Background An emerging standard-of-care for long QT syndrome (LQTS) employs clinical genetic testing to identify genetic variants of the KCNQ1 potassium channel. However, interpreting results from genetic testing is confounded by the presence of “variants of unknown significance” (VUS) for which there is inadequate evidence of pathogenicity. Methods and Results In this study, we curated from the literature a “high-quality” set of 107 functionally characterized KCNQ1 variants. Based on this dataset, we completed a detailed quantitative analysis on the sequence conservation patterns of subdomains of KCNQ1 and the distribution of pathogenic variants therein. We found that conserved subdomains generally are critical for channel function and are enriched with dysfunctional variants. Using this experimentally validated dataset, we trained a neural network, designated Q1VarPred, specifically for predicting the functional impact of KCNQ1 VUS. The estimated predictive performance of Q1VarPred in terms of Matthew’s correlation coefficient and area under the receiver operating characteristic curve were 0.581 and 0.884, respectively, superior to the performance of eight previous methods tested in parallel. Q1VarPred is publicly available as a web server at http://meilerlab.org/q1varpred. Conclusions Although a plethora of tools are available for making pathogenicity predictions over a genome-wide scale, previous tools fail to perform in a robust manner when applied to KCNQ1. The contrasting and favorable results for Q1VarPred suggests a promising approach, where a machine learning algorithm is tailored to a specific protein target and trained with a functionally validated dataset to calibrate informatics tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.