Directed evolution is a powerful tool for protein engineering via mapping sequence‐function of proteins using high‐throughput cell screening methods. Considering the vast possible combination of protein mutations, directed evolution is limited to the size of the library of protein mutants, and precision and speed of the high throughput screening methods, and even the largest library of protein variants and the best screening methods are not able to cover the vast mutational space. Furthermore, rounds of directed evolution can be extremely labor‐ and time‐ intensive. Thus, an algorithm‐based approach that can narrow this selection will be saving time and resources while providing more efficient routes to relate protein sequence and function.
Recent advances in machine learning; in particular, in language modeling, bring this dream closer to the reality. Targeting metalloproteinases (MPs) has a great potential in developing novel therapeutics as these proteases are known to be responsible in several diseases such as cancer, neurodegenerative, and cardiovascular diseases. Directed evolution was previously used to engineer binding affinity and selectivity of natural inhibitors of MPs, tissue inhibitors of metalloproteinases (TIMPs). Here, we demonstrated a machine learning approach on a library of MMP inhibitor variants to predict their functionality based on amino acid sequence. We used several pre‐trained protein language models to extract features that represent biological properties of each amino acid sequence in our MMP library. Using these features, strong MMP inhibitors form a cluster that is well separated from weak inhibitors indicating the effectiveness of pre‐trained protein language models in screening MMP inhibitor variants. A downstream classification model trained on the pre‐trained features to predict the MMP inhibitors binding showed a cross validation accuracy of over 80%. This study will shed light on protein sequence‐function relation using directed evolution in combination with deep mutational scanning and machine learning approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.