Improving enzymes by directed evolution requires the navigation of very large search spaces; we survey how to do this intelligently.
The microbial production of fine chemicals provides a promising biosustainable manufacturing solution that has led to the successful production of a growing catalog of natural products and high-value chemicals. However, development at industrial levels has been hindered by the large resource investments required. Here we present an integrated Design–Build-Test–Learn (DBTL) pipeline for the discovery and optimization of biosynthetic pathways, which is designed to be compound agnostic and automated throughout. We initially applied the pipeline for the production of the flavonoid (2S)-pinocembrin in Escherichia coli, to demonstrate rapid iterative DBTL cycling with automation at every stage. In this case, application of two DBTL cycles successfully established a production pathway improved by 500-fold, with competitive titers up to 88 mg L−1. The further application of the pipeline to optimize an alkaloids pathway demonstrates how it could facilitate the rapid optimization of microbial strains for production of any chemical compound of interest.
The field of synthetic biology aims to make the design of biological systems predictable, shrinking the huge design space to practical numbers for testing. When designing microbial cell factories, most optimization efforts have focused on enzyme and strain selection/ engineering, pathway regulation, and process development. In silico tools for the predictive design of bacterial ribosome binding sites (RBSs) and RBS libraries now allow translational tuning of biochemical pathways; however, methods for predicting optimal RBS combinations in multigene pathways are desirable. Here we present the implementation of machine learning algorithms to model the RBS sequence−phenotype relationship from representative subsets of large combinatorial RBS libraries allowing the accurate prediction of optimal high-producers. Applied to a recombinant monoterpenoid production pathway in Escherichia coli, our approach was able to boost production titers by over 60% when screening under 3% of a library. To facilitate library screening, a multiwell plate fermentation procedure was developed, allowing increased screening throughput with sufficient resolution to discriminate between high and low producers. High producers from one library did not translate during scale-up, but the reduced screening requirements allowed rapid rescreening at the larger scale. This methodology is potentially compatible with any biochemical pathway and provides a powerful tool toward predictive design of bacterial production chassis.
Monoterpenes (C10 isoprenoids) are a structurally diverse group of natural compounds that are attractive to industry as flavours and fragrances. Monoterpenes are produced from a single linear substrate, geranyl diphosphate, by a group of enzymes called the monoterpene cyclases/synthases (mTC/Ss) that catalyse high-energy cyclisation reactions involving unstable carbocation intermediates. Efforts towards producing monoterpenes via biocatalysis or metabolic engineering often result in the formation of multiple products due to the nature of the highly branched reaction mechanism of mTC/Ss. Rational engineering of mTC/Ss is hampered by the lack of correlation between the active site sequence and cyclisation type. We used available mutagenesis data to show that amino acids involved in product outcome are clustered and spatially conserved within the mTC/S family. Consensus sequences for three such plasticity regions were introduced in different mTC/S with increasingly complex cyclisation cascades, including the model enzyme limonene synthase (LimS). In all three mTC/S studied, mutations in the first two regions mostly give rise to products that result from premature quenching of the linalyl or α-terpinyl cations, suggesting that both plasticity regions are involved in the formation and stabilisation of cations early in the reaction cascade. A LimS variant with mutations in the second region (S454G, C457V, M458I), produced mainly more complex bicyclic products. QM/MM MD simulations reveal that the second cyclisation is not due to compression of the C2-C7 distance in the α-terpinyl cation, but is the result of an increased distance between C8 of the α-terpinyl cation and two putative bases (W324, H579) located on the other side of the active site, preventing early termination by deprotonation. Such insights into the impact of mutations can only be obtained using integrated experimental and computational approaches, and will aid the design of altered mTC/S activities towards clean monoterpenoid products.
Natural plant-based flavonoids have drawn significant attention as dietary supplements due to their potential health benefits, including anti-cancer, anti-oxidant, and anti-asthmatic activities. Naringenin, pinocembrin, eriodictyol and homoeriodictyol are classified as (2S)-flavanones, an important sub-group of naturally-occurring flavonoids, with wide-reaching applications in human health and nutrition. These four compounds occupy a central position as branch point intermediates towards a broad spectrum of naturally occurring flavonoids. Here, we report the development of E. coli production chassis for each of these key gatekeeper flavonoids. Selection of key enzymes, genetic construct design, and the optimization of process conditions resulted in the highest reported titers for naringenin (484 mg/L), improved production of pinocembrin (198 mg/L) and eriodictyol (55 mg/L from caffeic acid), and provided the first example of in vivo production of homoeriodictyol directly from glycerol (17 mg/L). This work provides a springboard for future production of diverse downstream natural and non-natural flavonoid targets.
The de novo synthesis of genes is becoming increasingly common in synthetic biology studies. However, the inherent error rate (introduced by errors incurred during oligonucleotide synthesis) limits its use in synthesising protein libraries to only short genes. Here we introduce SpeedyGenes, a PCR-based method for the synthesis of diverse protein libraries that includes an error-correction procedure, enabling the efficient synthesis of large genes for use directly in functional screening. First, we demonstrate an accurate gene synthesis method by synthesising and directly screening (without pre-selection) a 747 bp gene for green fluorescent protein (yielding 85% fluorescent colonies) and a larger 1518 bp gene (a monoamine oxidase, producing 76% colonies with full catalytic activity, a 4-fold improvement over previous methods). Secondly, we show that SpeedyGenes can accommodate multiple and combinatorial variant sequences while maintaining efficient enzymatic error correction, which is particularly crucial for larger genes. In its first application for directed evolution, we demonstrate the use of SpeedyGenes in the synthesis and screening of large libraries of MAO-N variants. Using this method, libraries are synthesised, transformed and screened within 3 days. Importantly, as each mutation we introduce is controlled by the oligonucleotide sequence, SpeedyGenes enables the synthesis of large, diverse, yet controlled variant sequences for the purposes of directed evolution.
Synthetic biology utilises the Design-Build-Test-Learn pipeline for the engineering of biological systems. Typically, this requires the construction of specifically designed, large and complex DNA assemblies. The availability of cheap DNA synthesis and automation enables high-throughput assembly approaches, which generates a heavy demand for DNA sequencing to verify correctly assembled constructs. Next-generation sequencing is ideally positioned to perform this task, however with expensive hardware costs and bespoke data analysis requirements few laboratories utilise this technology in-house. Here a workflow for highly multiplexed sequencing is presented, capable of fast and accurate sequence verification of DNA assemblies using nanopore technology. A novel sample barcoding system using PCR is introduced and sequencing data is analysed through a bespoke analysis algorithm. Crucially, this algorithm overcomes the problem of high-error rate nanopore data (which typically prevents identification of single nucleotide variants) through statistical analysis of strand bias, permitting accurate sequence analysis with single-base resolution. As an example, 576 constructs (6 x 96 well plates) were processed in a single workflow in 72 hours (from E. coli colonies to analysed data). Given our procedure’s low hardware costs and highly multiplexed capability, this provides cost effective access to powerful DNA sequencing for any laboratory, with applications beyond synthetic biology including directed evolution, SNP analysis and gene synthesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.