2021
DOI: 10.1101/2021.03.12.435185
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Large-scale design and refinement of stable proteins using sequence-only models

Abstract: Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we report a neural network model that predicts protein stability based only on sequences of amino acids, and demonstrate its performance … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 10 publications
(16 citation statements)
references
References 61 publications
0
15
0
Order By: Relevance
“…We also further the work of [12] and evaluate models’ predictions on single-site mutations to test their ability to predict small changes in the stability landscape. Models tested include a convolutional neural network (CNN) model from [16] and three simplified variants of this model, as well as gradient boosting models built using three recently-published protein embeddings inspired by natural language processing: (i) the Evolutionary Scale Model (ESM, [14]), (ii) ProtBERT [6], and (iii) UniRep [1]. We use XGBoost [4] to construct these gradient boosting models and test both default and optimized parameters.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…We also further the work of [12] and evaluate models’ predictions on single-site mutations to test their ability to predict small changes in the stability landscape. Models tested include a convolutional neural network (CNN) model from [16] and three simplified variants of this model, as well as gradient boosting models built using three recently-published protein embeddings inspired by natural language processing: (i) the Evolutionary Scale Model (ESM, [14]), (ii) ProtBERT [6], and (iii) UniRep [1]. We use XGBoost [4] to construct these gradient boosting models and test both default and optimized parameters.…”
Section: Introductionmentioning
confidence: 99%
“…Despite recent progress, however, statistical and physics-based methods for stability prediction remain computationally expensive and struggle to model and predict the quantities and interactions governing protein stability [8,10]. Meanwhile, advancements in data-driven methods such as high throughput stability measurement, deep sequencing, and machine learning are beginning to mitigate issues of computational cost and prediction accuracy [1,7,12,16,18].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently deep learning techniques have impacted both protein structure prediction and protein design (Pearce and Zhang, 2021;Wittmann et al, 2021). Design tasks tackled with deep learning include fixed backbone design (O'Connell et al, 2018;Ingraham et al, 2019;Qi and Zhang, 2020;Norn et al, 2021), antibody design (Wang et al, 2018;Saka et al, 2021;Shin et al, 2021), de novo design Moffat and Jones, 2021), and the prediction of whether a sequence has a stable structure from sequence alone (Singer et al, 2021). A variety of neural network architectures have been used including variational autoencoders (Greener et al, 2018;Hawkins-Hooker et al, 2021), deep exploration networks (Linder et al, 2020), graph neural networks (Strokach et al, 2020), recurrent neural networks (Alley et al, 2019) and autoregressive models (Shin et al, 2021;Trinquier et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…In our approach, we designed thousands of de novo proteins and measured their folding stabilities using a yeast display-based proteolysis assay coupled to next-generation sequencing (1). Several new studies have applied our methodology (16)(17)(18)(19) as it has several advantages. First, measuring folding stability for thousands of proteins makes it possible to statistically quantify biophysical features that contribute to stability.…”
Section: Introductionmentioning
confidence: 99%