Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In E. coli , decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange (RMCE) system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10,898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our dataset using a simple log-linear statistical model. Neural network models can explain greater than 95% of the variance in our dataset, and show the increased power is due to nonlinear interactions of other elements such as the spacer, background, and UP elements.
Despite decades of intense genetic, biochemical, and evolutionary characterizations of bacterial promoters, we still lack the basic ability to identify or predict transcriptional activities of promoters using primary sequence. Even in simple, well-characterized organisms such as E. coli there is little agreement on the number, location, and strength of promoters. Here, we use a genomically-encoded massively parallel reporter assay to perform the first full characterization of autonomous promoter activity across the E. coli genome. We measure promoter activity of >300,000 sequences spanning the entire genome and precisely map 2,228 promoters active in rich media. We show that antisense promoters have a profound effect on global transcription and how codon usage has adapted to encode intragenic promoters. Furthermore, we perform a scanning mutagenesis of 2,057 promoters to uncover regulatory sequences responsible for regulating promoter activity. Finally, we show that despite these large datasets and modern machine learning algorithms, the task of predicting promoter activity from primary sequence sequence is still challenging.
Abstract:Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In E. coli , decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange (RMCE) system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10,898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our dataset using a simple log-linear statistical model. Simple neural network models explain greater than 95% of the variance in our dataset by capturing nonlinear interactions with the spacer, background, and UP elements.
SummaryBiomolecular condensates, membraneless organelles found throughout the cell, play critical roles in many aspects of cellular function. Ribonucleoprotein granules (RNPs), a type of biomolecular condensate found in neurons that are necessary for local protein synthesis and are involved in long-term potentiation (LTP). Several RNA-binding proteins present in RNPs are necessary for the synaptic plasticity involved in LTP and long-term memory. Most of these proteins possess low complexity motifs, allowing for increased promiscuity. We explore the role the low complexity motif plays for RNA binding protein cytoplasmic polyadenylation element binding protein 3 (CPEB3), a protein necessary for long-term memory persistence. We found that RNA binding and SUMOylation are necessary for CPEB3 localization to the P body, thereby having functional implications on translation. Here, we investigate the role of the low complexity motif of CPEB3 and find that it is necessary for P body localization and downstream targeting for local protein synthesis.
What enables strains of the same species to coexist in a microbiome? Here, we investigate if host anatomy can explain strain co-residence of Cutibacterium acnes, the most abundant species on human skin. We reconstruct on-person evolution and migration using 947 C. acnes colony genomes acquired from 16 subjects, including from individual skin pores, and find that pores maintain diversity by limiting competition. Although strains with substantial fitness differences coexist within centimeter-scale regions, each pore is dominated by a single strain. Moreover, colonies from a pore typically have identical genomes. An absence of adaptive signatures suggests a genotype-independent source of low within-pore diversity. We therefore propose that pore anatomy imposes random single-cell bottlenecks during migration into pores and subsequently blocks new migrants; the resulting population fragmentation reduces competition and promotes coexistence. Our findings imply that therapeutic interventions involving pore-dwelling species should focus on removing resident populations over optimizing probiotic fitness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.