2023
DOI: 10.1101/2023.06.18.545472
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Assessing the performance of protein regression models

Abstract: To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically diff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 60 publications
(89 reference statements)
0
7
0
Order By: Relevance
“…Regime extrapolation tests a model’s ability to predict how mutations combine by training on single amino acid substitutions and predicting the effects of multiple substitutions [18, 19, 21, 22] (Figs. 3c and S4).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Regime extrapolation tests a model’s ability to predict how mutations combine by training on single amino acid substitutions and predicting the effects of multiple substitutions [18, 19, 21, 22] (Figs. 3c and S4).…”
Section: Resultsmentioning
confidence: 99%
“…We found METL-Local and ESM-2 displayed the strongest performance for mutation extrapolation, achieving an average Spearman correlation of 0.77 across datasets. Position extrapolation evaluates a model's ability to generalize across sequence positions and make predictions for amino acid substitutions at sites that do not vary in the training data [17][18][19] (Fig. 3b).…”
Section: Generalization Abilities Of Biophysics-based Protein Languag...mentioning
confidence: 99%
“…As an alternative, learned embeddings can be extracted from PLMs, such as those mentioned above. While these representations can offer performance boosts for certain tasks, they have not yet offered significant performance boosts compared to simple sequence encodings for supervised fitness prediction in MLDE or relevant protein engineering benchmarks such as predicting multimutant fitness from the fitness effects of single mutations. , Fine-tuning and semisupervised learning are other strategies to augment model performance when only a small amount of labeled data is available; this has shown initial promise but should be explored further . Additional benchmarks are needed to evaluate whether learned embeddings are more effective for ML-assisted protein fitness prediction.…”
Section: Navigating Protein Fitness Landscapes Using Machine Learningmentioning
confidence: 99%
“…Although OHE offers no protein information aside from the amino acid identities, it is used extensively as a fast and effective method for converting biological information into numerical vectors (Elabd et al, 2020;Goldman et al, 2022;Greenhalgh et al, 2021;Hsu et al, 2022;Michael et al, 2023;Raimondi et al, 2019;M. Yang et al, 2018).…”
Section: Fixed Sequence Representationsmentioning
confidence: 99%