Improving Protein Expression, Stability, and Function with ProteinMPNN

Sumida, Kiera H.; Núñez-Franco, Reyes; Kalvet, Indrek; Pellock, Samuel J.; Wicky, Basile I. M.; Milles, Lukas F.; Dauparas, Justas; Wang, Jue; Kipnis, Yakov; Jameson, Noel; Kang, Alex; De La Cruz, Joshmyn; Sankaran, Banumathi; Bera, Asim K.; Jiménez-Osés, Gonzalo; Baker, David

doi:10.1021/jacs.3c10941

Cited by 42 publications

(27 citation statements)

References 39 publications

(65 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There, a more conservative cutoff of 7 Å for fixing the ligand-proximal amino acids during reengineering was chosen, but given the larger size of myoglobin, resulting designs had 41%-55% sequence identity with the most similar protein in the UniRef100 database (Sumida et al, 2024). Several examples where protein-or peptide-binding proteins (TEV protease, ubiquitin, ghrelin receptor) were reengineered using Pro-teinMPNN similarly display high success rates (de Haas et al, 2023;Goverde et al, 2023;Sumida et al, 2024). Finally, new methods called LigandMPNN and CARBo-nAra were recently described that explicitly model nonprotein components, but their codes are not yet readily available (Dauparas et al, 2023;Krapp et al, 2023;Krishna et al, 2023).…”

Section: Discussionmentioning

confidence: 99%

“…In a paper submitted after this one, Sumida and colleagues demonstrate that ProteinMPNN can be used to reengineer another ligand-binding colored protein, human myoglobin, with a comparable success rate (Sumida et al, 2024). There, a more conservative cutoff of 7 Å for fixing the ligand-proximal amino acids during reengineering was chosen, but given the larger size of myoglobin, resulting designs had 41%-55% sequence identity with the most similar protein in the UniRef100 database (Sumida et al, 2024). Several examples where protein-or peptide-binding proteins (TEV protease, ubiquitin, ghrelin receptor) were reengineered using Pro-teinMPNN similarly display high success rates (de Haas et al, 2023;Goverde et al, 2023;Sumida et al, 2024).…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Reengineering of a flavin‐binding fluorescent protein using ProteinMPNN

Nikolaev,

Kuzmin,

Markeeva

et al. 2024

Protein Science

View full text Add to dashboard Cite

Recent advances in machine learning techniques have led to development of a number of protein design and engineering approaches. One of them, ProteinMPNN, predicts an amino acid sequence that would fold and match user‐defined backbone structure. Its performance was previously tested for proteins composed of standard amino acids, as well as for peptide‐ and protein‐binding proteins. In this short report, we test whether ProteinMPNN can be used to reengineer a non‐proteinaceous ligand‐binding protein, flavin‐based fluorescent protein CagFbFP. We fixed the native backbone conformation and the identity of 20 amino acids interacting with the chromophore (flavin mononucleotide, FMN) while letting ProteinMPNN predict the rest of the sequence. The software package suggested replacing 36–48 out of the remaining 86 amino acids so that the resulting sequences are 55%–66% identical to the original one. The three designs that we tested experimentally displayed different expression levels, yet all were able to bind FMN and displayed fluorescence, thermal stability, and other properties similar to those of CagFbFP. Our results demonstrate that ProteinMPNN can be used to generate diverging unnatural variants of fluorescent proteins, and, more generally, to reengineer proteins without losing their ligand‐binding capabilities.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Reengineering of a flavin‐binding fluorescent protein using ProteinMPNN

Nikolaev,

Kuzmin,

Markeeva

et al. 2024

Protein Science

View full text Add to dashboard Cite

show abstract

“…The performance of our zero-shot library contributes to a growing body of evidence showing that samples from models fit on natural sequences can be used to generate libraries that are not only enriched for functional variants 82,97,98 but also contain variants with improved fitness 42,59,[99][100][101][102][103][104][105] , even though the zero-shot sampling process did not take the target phenotype of the engineering campaign into account. We encourage further exploration of these techniques for initial library design 63,106 , particularly in lower-throughput settings where improving hit rates can increase the chance of finding at least one satisfactory variant (Figure 5b).…”

Section: Discussionmentioning

confidence: 99%

“…Importantly, our ML campaign outperformed two directed evolution approaches that used the same platform: one that was run independently and using standard in-vitro techniques for hit selection and diversification and one that was designed in-silico and pooled with the ML-designed libraries for screening. Finally, the performance of our zero-shot library contributes to a growing body of evidence showing that samples from models fit on natural sequences can be used to generate libraries that are not only enriched for functional variants [76][77][78] but also contain variants with improved fitness 41,59,[79][80][81][82][83][84][85] .…”

Section: Discussionmentioning

confidence: 99%

Engineering of highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening

Thomas,

Belanger,

et al. 2024

Preprint

View full text Add to dashboard Cite

Designing enzymes to function in novel chemical environments is a central goal of synthetic biology with broad applications. Guiding protein design with machine learning (ML) has the potential to accelerate the discovery of high-performance enzymes by precisely navigating a rugged fitness landscape. In this work, we describe an ML-guided campaign to engineer the nuclease NucB, an enzyme with applications in the treatment of chronic wounds due to its ability to degrade biofilms. In a multi-round enzyme evolution campaign, we combined ultra-high-throughput functional screening with ML and compared to parallelin-vitrodirected evolution (DE) andin-silicohit recombination (HR) strategies that used the same microfluidic screening platform. The ML-guided campaign discovered hundreds of highly-active variants with up to 19-fold nuclease activity improvement, while the best variant found by DE had 12-fold improvement. Further, the ML-designed hits were up to 15 mutations away from the NucB wildtype, far outperforming the HR approach in both hit rate and diversity. We also show that models trained on evolutionary data alone, without access to any experimental data, can design functional variants at a significantly higher rate than a traditional approach to initial library generation. To drive future progress in ML-guided design, we curate a dataset of 55K diverse variants, one of the most extensive genotype-phenotype enzyme activity landscapes to date. Data and code is available at:https://github.com/google-deepmind/nuclease_design.

show abstract

“…It consists of 3 encoder layers that encode the backbone coordinates of the input protein, followed by 3 decoder layers that predict a sequence in seconds and in an autoregressive manner [14]. It is a powerful tool that has been successfully applied to many protein design problems, including the de novo design of new folds [16], protein binders [17] and enzymes [18], and the redesign of native proteins [19]. However, DL design tools have met limited success for protein folds mainly composed of antiparallel β-sheets [20].…”

Section: Introductionmentioning

confidence: 99%

ProteinMPNN Recovers Complex Sequence Properties of Transmembrane β-barrels

Dolorfino,

Samanta,

Vorobieva

2024

Preprint

View full text Add to dashboard Cite

Recent deep-learning (DL) protein design methods have been successfully applied to a range of protein design problems including the de novo design of novel folds, protein binders, and enzymes. However, DL methods have yet to meet the challenge of de novo membrane protein (MP) and β-barrel design tasks. We performed a comprehensive benchmark of one DL protein sequence design method, ProteinMPNN, on MP and β-barrel design tasks, and compared the performance of ProteinMPNN to the state-of-the-art Franklin2023 Rosetta MP energy function. We characterized the ability of ProteinMPNN to capture global sequence properties of transmembrane β-barrels (TMBs), generate diverse sequences for novel folds, and generate sequences likely to fold in vitro. We also tested the effect of input backbone refinement on ProteinMPNN design success. We found that given refined and well-defined inputs, ProteinMPNN more accurately captures global sequence properties and generates TMB sequences with higher sequence diversity of pore-facing residues than Franklin2023. In addition, ProteinMPNN was able to design TMB sequences likely to fold in vitro, suggesting that it could be used in de novo design tasks of diverse nanopores for single-molecule sensing and sequencing. Lastly, the improvement of ProteinMPNN with input refinement indicates that the difficulty of ProteinMPNN in designing sequences for challenging protein folds, such as TMBs, stems from input definition rather than software limitations.

show abstract

Improving Protein Expression, Stability, and Function with ProteinMPNN

Cited by 42 publications

References 39 publications

Reengineering of a flavin‐binding fluorescent protein using ProteinMPNN

Reengineering of a flavin‐binding fluorescent protein using ProteinMPNN

Engineering of highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening

ProteinMPNN Recovers Complex Sequence Properties of Transmembrane β-barrels

Contact Info

Product

Resources

About