TransMEP: Transfer learning on large protein language models to predict mutation effects of proteins from a small known dataset

Hoffbauer, Tilman; Strodel, Birgit

doi:10.1101/2024.01.12.575432

Cited by 2 publications

(1 citation statement)

References 27 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PLMs are an important component in many existing methods for low-N protein engineering. They have been used to extract protein sequence representations [3,[74][75][76], for finetuning on the low-N function data [76][77][78], and to generate auxiliary training data in more complex models [78][79][80]. Other computational strategies for addressing the low-N problem include Gaussian processes [75,81,82], augmenting regression models with sequence-based [15,83] or structure-based [84] scores, custom protein 10/45 representations that can produce pretraining data [85], representations of proteins' 3D shape [86], meta learning [87], and contrastive finetuning [88].…”

Section: Discussionmentioning

confidence: 99%

Biophysics-based protein language models for protein engineering

Gelman,

Johnson,

Freschlin

et al. 2024

Preprint

View full text Add to dashboard Cite

Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.

show abstract