Sarah A Fahlberg scite author profile

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

show abstract

Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production

Greenhalgh

Fahlberg

Pfleger

et al. 2021

Nat Commun

View full text Add to dashboard Cite

Alcohol-forming fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for microbial production of fatty alcohols. Many metabolic engineering strategies utilize FARs to produce fatty alcohols from intracellular acyl-CoA and acyl-ACP pools; however, enzyme activity, especially on acyl-ACPs, remains a significant bottleneck to high-flux production. Here, we engineer FARs with enhanced activity on acyl-ACP substrates by implementing a machine learning (ML)-driven approach to iteratively search the protein fitness landscape. Over the course of ten design-test-learn rounds, we engineer enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We characterize the top sequence and show that it has an enhanced catalytic rate on palmitoyl-ACP. Finally, we analyze the sequence-function data to identify features, like the net charge near the substrate-binding site, that correlate with in vivo activity. This work demonstrates the power of ML to navigate the fitness landscape of traditionally difficult-to-engineer proteins.

show abstract

Machine learning to navigate fitness landscapes for protein engineering

Freschlin

Fahlberg

Romero

2022

Current Opinion in Biotechnology

View full text Add to dashboard Cite

Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production

Greenhalgh

Fahlberg

Pfleger

et al. 2021

Preprint

View full text Add to dashboard Cite

Fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for the microbial production of fatty alcohols. Many existing metabolic engineering strategies utilize these reductases to produce fatty alcohols from intracellular acyl-CoA pools; however, acting on acyl-ACPs from fatty acid biosynthesis has a lower energetic cost and could enable more efficient production of fatty alcohols. Here we engineer FARs to preferentially act on acyl-ACP substrates and produce fatty alcohols directly from the fatty acid biosynthesis pathway. We implemented a machine learning-driven approach to iteratively search the protein fitness landscape for enzymes that produce high titers of fatty alcohols in vivo. After ten design-test-learn rounds, our approach converged on engineered enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We further characterized the top identified sequence and found its improved alcohol production was a result of an enhanced catalytic rate on acyl-ACP substrates, rather than enzyme expression or KM effects. Finally, we analyzed the sequence-function data generated during the enzyme engineering to identify sequence and structure features that influence fatty alcohol production. We found an enzyme's net charge near the substrate-binding site was strongly correlated with in vivo activity on acyl-ACP substrates. These findings suggest future rational design strategies to engineer highly active enzymes for fatty alcohol production.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sarah A Fahlberg

Neural networks to learn protein sequence–function relationships from deep mutational scanning data

Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production

Machine learning to navigate fitness landscapes for protein engineering

Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production

Contact Info

Product

Resources

About