Large-scale design and refinement of stable proteins using sequence-only models

Singer, Jedediah M.; Novotney, Scott; Strickland, Devin; Haddox, Hugh K.; Leiby, Nicholas; Rocklin, Gabriel J.; Chow, Cameron M.; Bera, Asim K.; Motta, Francis C.; Cao, Longxing; Strauch, Eva-Maria; Chidyausiku, Tamuka M.; Ford, Alex T.; Ho, Ethan; Mackenzie, Craig O.; Eramian, Hamed; DiMaio, Frank; Grigoryan, Gevorg; Vaughn, Matthew; Stewart, Lance; Baker, David C.; Klavins, Eric

doi:10.1101/2021.03.12.435185

Cited by 10 publications

(16 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also further the work of [12] and evaluate models’ predictions on single-site mutations to test their ability to predict small changes in the stability landscape. Models tested include a convolutional neural network (CNN) model from [16] and three simplified variants of this model, as well as gradient boosting models built using three recently-published protein embeddings inspired by natural language processing: (i) the Evolutionary Scale Model (ESM, [14]), (ii) ProtBERT [6], and (iii) UniRep [1]. We use XGBoost [4] to construct these gradient boosting models and test both default and optimized parameters.…”

Section: Introductionmentioning

confidence: 99%

“…Despite recent progress, however, statistical and physics-based methods for stability prediction remain computationally expensive and struggle to model and predict the quantities and interactions governing protein stability [8,10]. Meanwhile, advancements in data-driven methods such as high throughput stability measurement, deep sequencing, and machine learning are beginning to mitigate issues of computational cost and prediction accuracy [1,7,12,16,18].…”

Section: Introductionmentioning

confidence: 99%

“…Here we present a rigorous comparison and evaluation of several data-driven stability prediction models using 112,455 synthetic proteins from a recently published dataset [16] of experimental stability data. We evaluate performance on held-out protein classes to test models' ability to extrapolate new kinds of protein designs.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Comparison and evaluation of data-driven protein stability prediction models

Csicsery-Ronay¹,

Zaitzeff²,

Singer³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Predicting protein stability is important to protein engineering yet poses unsolved challenges. Computational costs associated with physics-based models, and the limited amount of data available to support data-driven models, have left stability prediction behind the prediction of structure. New data and advancements in modeling approaches now afford greater opportunities to solve this challenge. We evaluate a set of data-driven prediction models using a large, newly published dataset of various synthetic proteins and their experimental stability data. We test the models in two separate tasks, exercising extrapolation to new protein classes and prediction of the effects on stability of small mutations. Results: Small convolutional neural networks trained from scratch on stability data and large protein embedding models passed through simple downstream models trained on stability data are both able to predict stability comparably well. The largest of the embedding models yields the best performance in all tasks and metrics. We also explored the marginal performance gains seen with two ensemble models.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Comparison and evaluation of data-driven protein stability prediction models

Csicsery-Ronay¹,

Zaitzeff²,

Singer³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recently deep learning techniques have impacted both protein structure prediction and protein design (Pearce and Zhang, 2021;Wittmann et al, 2021). Design tasks tackled with deep learning include fixed backbone design (O'Connell et al, 2018;Ingraham et al, 2019;Qi and Zhang, 2020;Norn et al, 2021), antibody design (Wang et al, 2018;Saka et al, 2021;Shin et al, 2021), de novo design Moffat and Jones, 2021), and the prediction of whether a sequence has a stable structure from sequence alone (Singer et al, 2021). A variety of neural network architectures have been used including variational autoencoders (Greener et al, 2018;Hawkins-Hooker et al, 2021), deep exploration networks (Linder et al, 2020), graph neural networks (Strokach et al, 2020), recurrent neural networks (Alley et al, 2019) and autoregressive models (Shin et al, 2021;Trinquier et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design

Moffat

Greener

Jones

2021

Preprint

View full text Add to dashboard Cite

The prediction of protein structure and the design of novel protein sequences and structures have long been intertwined. The recently released AlphaFold has heralded a new generation of accurate protein structure prediction, but the extent to which this affects protein design stands yet unexplored. Here we develop a rapid and effective approach for fixed backbone computational protein design, leveraging the predictive power of AlphaFold. For several designs we demonstrate that not only are the AlphaFold predicted structures in agreement with the desired backbones, but they are also supported by the structure predictions of other supervised methods as well as ab initio folding. These results suggest that AlphaFold, and methods like it, are able to facilitate the development of a new range of novel and accurate protein design methodologies.

show abstract

“…In our approach, we designed thousands of de novo proteins and measured their folding stabilities using a yeast display-based proteolysis assay coupled to next-generation sequencing (1). Several new studies have applied our methodology (16)(17)(18)(19) as it has several advantages. First, measuring folding stability for thousands of proteins makes it possible to statistically quantify biophysical features that contribute to stability.…”

Section: Introductionmentioning

confidence: 99%

Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation

Kim¹,

Tsuboyama²,

Houliston

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Designing entirely new protein structures remains challenging because we do not fully understand the biophysical determinants of folding stability. Yet some protein folds are easier to design than others. Previous work identified the 43-residue αββ&#945 fold as especially challenging: the best designs had only a 2% success rate, compared to 39-87% success for other simple folds (1). This suggested the αββ&#945 fold would be a useful model system for gaining a deeper understanding of folding stability determinants and for testing new protein design methods. Here, we designed over ten thousand new αββ&#945 proteins and found over three thousand of them to fold into stable structures using a high-throughput protease-based assay. Nuclear magnetic resonance, hydrogen-deuterium exchange, circular dichroism, deep mutational scanning, and scrambled sequence control experiments indicated that our stable designs fold into their designed αββ&#945 structures with exceptional stability for their small size. Our large dataset enabled us to quantify the influence of universal stability determinants including nonpolar burial, helix capping, and buried unsatisfied polar atoms, as well as stability determinants unique to the αββ&#945 topology. Our work demonstrates how large-scale design and test cycles can solve challenging design problems while illuminating the biophysical determinants of folding.

show abstract

Large-scale design and refinement of stable proteins using sequence-only models

Cited by 10 publications

References 61 publications

Comparison and evaluation of data-driven protein stability prediction models

Comparison and evaluation of data-driven protein stability prediction models

Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design

Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation

Contact Info

Product

Resources

About