Deep learning of protein sequence design of protein–protein interactions

Syrlybaeva, Raulia; Strauch, Eva-Maria

doi:10.1093/bioinformatics/btac733

Cited by 10 publications

(3 citation statements)

References 48 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[25][26][27][28] While new and improved models are continuing to be developed, they do not invariably produce designs that retain tertiary structure and activity at high temperatures. [29][30][31] Strategies which are based on substructures, energy functions, or patterns learned from protein structures represented in the PDB, are limited considering that the majority of proteins are non-thermophilic: only 5% of proteins from the top 25 most populous source organisms are thermophilic. [32][33][34][35] The temperature-dependent nature of enthalpic and entropic forces in the protein means that stability at ambient temperature does not necessarily translate to high-temperature stability.…”

Section: Background and Summarymentioning

confidence: 99%

Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe

Komp,

Alanzi,

Francis

et al. 2023

Preprint

View full text Add to dashboard Cite

Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins are inadequate. To drive innovation in this space, we have curated a large dataset, learn2thermDB, of protein-temperature examples, totalling 24 million instances, and paired proteins across temperatures based on homology, yielding 69 million protein pairs - orders of magnitude larger than the current largest. This important step of pairing allows for study of high-temperature stability in a sequence-dependent manner in the big data era. The data pipeline is parameterized and open, allowing it to be tuned by downstream users. We further show that the data contains signal for deep learning. This data offers a new doorway towards thermal stability design models.

show abstract

Section: Background and Summarymentioning

confidence: 99%

Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe

Komp,

Alanzi,

Francis

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…[9][10][11][12][13][14][15] a property of interest, including thermal stability, [19,20] zero-shot predictors of the same, [21,22] structure prediction models, [23,24] and sequence design models. [25,26] Existing supervised strategies help rank proteins among a pool of variants after training on a specific thermal stability target, but require time and resource intensive labeled data from the specific protein of interest to be accurate. [27][28][29] Zero-shot predictors remove the need for labeled data by either learning from evolutionary scale sequence datasets or by conditioning on homologs of the protein of interest and impressively achieve some predictive performance on observable properties.…”

Section: Introductionmentioning

confidence: 99%

A learnable transition from low temperature to high temperature proteins with neural machine translation

Komp,

Phillips,

Alanzi

et al. 2024

Preprint

View full text Add to dashboard Cite

This work presents Neural Optimization for Melting-temperature Enabled by Leveraging Translation (NOMELT), a novel approach for designing and ranking high-temperature stable proteins using neural machine translation. The model, trained on over 4 million protein homologous pairs from organisms adapted to different temperatures, demonstrates promising capability in targeting thermal stability. A designed variant of theDrosophila melanogasterEngrailed Homeodomain shows increased stability at high temperatures, as validated by estimators and molecular dynamics simulations. Furthermore, NOMELT achieves zero-shot predictive capabilities in ranking experimental melting and half-activation temperatures across two protein families. It achieves this without requiring extensive homology data or massive training datasets as do existing zero-shot predictors by specifically learning thermophilicity, as opposed to all natural variation. These findings underscore the potential of leveraging organismal growth temperatures in context-dependent design of proteins for enhanced thermal stability.

show abstract

“…As a cheminformatics model, ML combines chemistry, computer science, and information technology to aid in drug discovery through tasks like virtual screening, library design, and high-throughput screening analysis [10][11][12]. Machine learning algorithms leverage large chemical datasets for predictive modeling and pattern recognition, including the prediction of the properties and activities of peptides based on their sidechains [13][14][15][16]. This integration has accelerated the discovery and design of novel peptides with desired biological activities, opening new avenues for peptide-based drug development.…”

Section: Introductionmentioning

confidence: 99%

Prospection of Peptide Inhibitors of Thrombin from Diverse Origins Using a Machine Learning Pipeline

Balakrishnan,

Katkar,

Pham

et al. 2023

Bioengineering

View full text Add to dashboard Cite

Thrombin is a key enzyme involved in the development and progression of many cardiovascular diseases. Direct thrombin inhibitors (DTIs), with their minimum off-target effects and immediacy of action, have greatly improved the treatment of these diseases. However, the risk of bleeding, pharmacokinetic issues, and thrombotic complications remain major concerns. In an effort to increase the effectiveness of the DTI discovery pipeline, we developed a two-stage machine learning pipeline to identify and rank peptide sequences based on their effective thrombin inhibitory potential. The positive dataset for our model consisted of thrombin inhibitor peptides and their binding affinities (KI) curated from published literature, and the negative dataset consisted of peptides with no known thrombin inhibitory or related activity. The first stage of the model identified thrombin inhibitory sequences with Matthew’s Correlation Coefficient (MCC) of 83.6%. The second stage of the model, which covers an eight-order of magnitude range in KI values, predicted the binding affinity of new sequences with a log room mean square error (RMSE) of 1.114. These models also revealed physicochemical and structural characteristics that are hidden but unique to thrombin inhibitor peptides. Using the model, we classified more than 10 million peptides from diverse sources and identified unique short peptide sequences (<15 aa) of interest, based on their predicted KI. Based on the binding energies of the interaction of the peptide with thrombin, we identified a promising set of putative DTI candidates. The prediction pipeline is available on a web server.

show abstract

Deep learning of protein sequence design of protein–protein interactions

Cited by 10 publications

References 48 publications

Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe

Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe

A learnable transition from low temperature to high temperature proteins with neural machine translation

Prospection of Peptide Inhibitors of Thrombin from Diverse Origins Using a Machine Learning Pipeline

Contact Info

Product

Resources

About