Machine learning could enable an unprecedented level of control in protein engineering for therapeutic and industrial applications. Critical to its use in designing proteins with desired properties, machine learning models must capture the protein sequence-function relationship, often termed fitness landscape. Existing bench-marks like CASP or CAFA assess structure and function predictions of proteins, respectively, yet they do not target metrics relevant for protein engineering. In this work, we introduce Fitness Landscape Inference for Proteins (FLIP), a benchmark for function prediction to encourage rapid scoring of representation learning for protein engineering. Our curated tasks, baselines, and metrics probe model generalization in settings relevant for protein engineering, e.g. low-resource and extrapolative. Currently, FLIP encompasses experimental data across adeno-associated virus stability for gene therapy, protein domain B1 stability and immunoglobulin binding, and thermostability from multiple protein families. In order to enable ease of use and future expansion to new tasks, all data are presented in a standard format. FLIP scripts and data are freely accessible at https://benchmark.protein.properties.
Signal peptides are critical for the efficient expression and routing of extracellular and secreted proteins. Most protein production and screening technologies rely upon a relatively small set of signal peptides. Despite their central role in biotechnology, there are limited studies comprehensively examining the interplay between signal peptides and expressed protein sequences. Here, we describe a high-throughput method to screen novel signal peptides that maintain a high degree of surface expression across a range of protein scaffolds with highly variable N-termini. We find that the canonical signal peptide used in yeast surface display, derived from Aga2p, fails to achieve high surface expression for 42.5% of constructs containing diverse N-termini. To circumvent this, we have identified two novel signal peptides derived from endogenous yeast proteins, SRL1 and KISH, which are highly tolerant to diverse N-terminal sequences. This pipeline can be used to expand our understanding of signal peptide function, identify improved signal peptides for protein expression, and refine the computational tools used for signal peptide prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.