The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.
Alcohol-forming fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for microbial production of fatty alcohols. Many metabolic engineering strategies utilize FARs to produce fatty alcohols from intracellular acyl-CoA and acyl-ACP pools; however, enzyme activity, especially on acyl-ACPs, remains a significant bottleneck to high-flux production. Here, we engineer FARs with enhanced activity on acyl-ACP substrates by implementing a machine learning (ML)-driven approach to iteratively search the protein fitness landscape. Over the course of ten design-test-learn rounds, we engineer enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We characterize the top sequence and show that it has an enhanced catalytic rate on palmitoyl-ACP. Finally, we analyze the sequence-function data to identify features, like the net charge near the substrate-binding site, that correlate with in vivo activity. This work demonstrates the power of ML to navigate the fitness landscape of traditionally difficult-to-engineer proteins.
Fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for the microbial production of fatty alcohols. Many existing metabolic engineering strategies utilize these reductases to produce fatty alcohols from intracellular acyl-CoA pools; however, acting on acyl-ACPs from fatty acid biosynthesis has a lower energetic cost and could enable more efficient production of fatty alcohols. Here we engineer FARs to preferentially act on acyl-ACP substrates and produce fatty alcohols directly from the fatty acid biosynthesis pathway. We implemented a machine learning-driven approach to iteratively search the protein fitness landscape for enzymes that produce high titers of fatty alcohols in vivo. After ten design-test-learn rounds, our approach converged on engineered enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We further characterized the top identified sequence and found its improved alcohol production was a result of an enhanced catalytic rate on acyl-ACP substrates, rather than enzyme expression or KM effects. Finally, we analyzed the sequence-function data generated during the enzyme engineering to identify sequence and structure features that influence fatty alcohol production. We found an enzyme's net charge near the substrate-binding site was strongly correlated with in vivo activity on acyl-ACP substrates. These findings suggest future rational design strategies to engineer highly active enzymes for fatty alcohol production.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.