New mega dataset combined with deep neural network makes a progress in predicting impact of mutation on protein stability

Pak, Marina A.; Dovidchenko, Nikita V.; Sharma, Satyarth Mishra; Ivankov, Dmitry N.

doi:10.1101/2022.12.31.522396

Cited by 5 publications

(6 citation statements)

References 28 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, the dynamic range of the proteolysis assay is limited to ~5 kcal/mol 19 , while experimental stability datasets such as our Fireprot dataset may include mutations with up to ±10 kcal/mol DDG°. This means models trained on Megascale have limited capability to predict large changes in stability, a property that we also observe in other recently published models utilizing the Megascale dataset 16,26 . Second, we found that surface mutations to cysteine were often observed to be highly stabilizing in the Megascale dataset, such that ThermoMPNN would heavily favor surface cysteine mutations unless omitted from the permitted residue options (Supplementary Fig.…”

Section: Discussionmentioning

confidence: 56%

“…The copyright holder for this preprint (which this version posted July 30, 2023. ; https://doi.org/10.1101/2023.07. 27.550881 doi: bioRxiv preprint Recent achievements using large language models (LLMs) for protein structure prediction have inspired models using pre-learned sequence embeddings to train models for various protein design tasks via transfer learning 15 , including for sequence-based stability prediction 16,17 . At the same time, Dauparas et al released ProteinMPNN, a message-passing neural network (MPNN) trained on 19,700 protein clusters comprising the entire Protein Data Bank (PDB) (after quality filtering) to recover native-like sequences from a given protein backbone 18 .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Dieckhaus

Brocidiacono

Randolph

et al. 2023

Preprint

View full text Add to dashboard Cite

Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability are important in research and medicine. Computational methods for predicting how mutations perturb protein stability are therefore of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here we introduce ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a newly released mega-scale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves competitive performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.

show abstract

Section: Discussionmentioning

confidence: 56%

Section: Introductionmentioning

confidence: 99%

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Dieckhaus

Brocidiacono

Randolph

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Our comprehensive assessments demonstrate that EpHod outperforms numerous computational methods in pHopt prediction, further reinforcing the growing consensus that semi-supervised strategies employing protein language model embeddings facilitate state-of-the-art performance across various tasks. 44,67,84,85 Nevertheless, our analyses reveal that the performance of language models for pHopt prediction varies considerably despite extensive hyperparameter optimization (Figure 2A, 3B-C). Moreover, the relative performance of these language models for our pHopt task differs from other tasks.…”

Section: Discussionmentioning

confidence: 91%

Deep learning prediction of enzyme optimum pH

Gado

Knotts

Shaw

et al. 2023

Preprint

View full text Add to dashboard Cite

The relationship between pH and enzyme catalytic activity, as well as the optimal pH (pHopt) at which enzymes function, is crucial for biotechnological applications. Consequently, computational methods that predict pHoptwould significantly benefit enzyme discovery and design by facilitating accurate identification of enzymes that function optimally at a specific pH, and by promoting a better understanding of how sequence affects enzyme function in relation to pH. In this study, we present EpHod (Enzyme pH optimum prediction with deep learning), which is a deep semi-supervised language model for predicting enzyme pHoptdirectly from the protein sequence. By evaluating various machine learning methods with extensive hyperparameter optimization (training over 4,000 models in total), we find that semi-supervised methods that utilize language model embeddings, including EpHod, achieve the lowest error in predicting pHopt. From sequence data alone, EpHod learns structural and biophysical features that relate to pHopt, including proximity of residues to the catalytic center and the accessibility of solvent molecules. Overall, EpHod presents a promising advancement in pHoptprediction and could potentially speed up the development of improved enzyme technologies.

show abstract

“…This allows ThermoMPNN to reweight the input vector using contextual information via self-attention. Light attention has recently been shown to improve sequence-based protein localization ( 15 ) and ΔΔG° prediction ( 16 ) from LLM sequence embeddings, but this work utilizes light attention for refinement of structural embeddings. The adjusted embedding is then passed through a small multilayer perceptron (MLP) with two hidden layers ( Fig.…”

Section: Resultsmentioning

confidence: 99%

“…Recent achievements using large language models (LLMs) for protein structure prediction have inspired models using prelearned sequence embeddings to train models for various protein design tasks via transfer learning ( 15 ), including for sequence-based stability prediction ( 16 , 17 ). At the same time, Dauparas et al released ProteinMPNN, a message-passing neural network (MPNN) trained on 19,700 protein clusters comprising the entire Protein Data Bank (PDB) (after quality filtering) to recover native-like sequences from a given protein backbone ( 18 ).…”

mentioning

confidence: 99%

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Dieckhaus,

Brocidiacono,

Randolph

et al. 2024

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

Amino acid mutations that lower a protein’s thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability can be important in research and medicine. Computational methods for predicting how mutations perturb protein stability are, therefore, of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here, we describe ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a recently released megascale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from ProteinMPNN, a deep neural network trained to predict a protein’s amino acid sequence given its three-dimensional structure. We show that our method achieves state-of-the-art performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.

show abstract

New mega dataset combined with deep neural network makes a progress in predicting impact of mutation on protein stability

Cited by 5 publications

References 28 publications

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Deep learning prediction of enzyme optimum pH

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Contact Info

Product

Resources

About