Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In E. coli , decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange (RMCE) system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10,898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our dataset using a simple log-linear statistical model. Neural network models can explain greater than 95% of the variance in our dataset, and show the increased power is due to nonlinear interactions of other elements such as the spacer, background, and UP elements.
SUMMARY
Mutations that lead to splicing defects can have severe consequences on
gene function and cause disease. Here, we explore how human genetic variation
affects exon recognition by developing a multiplexed functional assay of
splicing using Sort-seq (MFASS). We assayed 27,733 variants in the Exome
Aggregation Consortium (ExAC) within or adjacent to 2,198 human exons in the
MFASS minigene reporter and found that 3.8% (1,050) of variants, most of which
are extremely rare, led to large-effect splice-disrupting variants (SDVs).
Importantly, we find that 83% of SDVs are located outside of canonical splice
sites, are distributed evenly across distinct exonic and intronic regions, and
are difficult to predict a priori. Our results indicate extant,
rare genetic variants can have large functional effects on splicing at
appreciable rates, even outside the context of disease, and MFASS enables their
empirical assessment at scale.
Despite decades of intense genetic, biochemical, and evolutionary characterizations of bacterial promoters, we still lack the basic ability to identify or predict transcriptional activities of promoters using primary sequence. Even in simple, well-characterized organisms such as E. coli there is little agreement on the number, location, and strength of promoters. Here, we use a genomically-encoded massively parallel reporter assay to perform the first full characterization of autonomous promoter activity across the E. coli genome. We measure promoter activity of >300,000 sequences spanning the entire genome and precisely map 2,228 promoters active in rich media. We show that antisense promoters have a profound effect on global transcription and how codon usage has adapted to encode intragenic promoters. Furthermore, we perform a scanning mutagenesis of 2,057 promoters to uncover regulatory sequences responsible for regulating promoter activity. Finally, we show that despite these large datasets and modern machine learning algorithms, the task of predicting promoter activity from primary sequence sequence is still challenging.
A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics.
Highlights d Systematic analysis of TFBS architecture by using the c-AMP response element (CRE) d Assay CRE affinity, number, placement, spacing, and surrounding sequence content d Similar expression trends between an episomal and singlecopy, genomic MPRA
Abstract:Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In E. coli , decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange (RMCE) system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10,898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our dataset using a simple log-linear statistical model. Simple neural network models explain greater than 95% of the variance in our dataset by capturing nonlinear interactions with the spacer, background, and UP elements.
In eukaryotes, transcription factors orchestrate gene expression by binding to TF-Binding Sites (TFBSs) and localizing transcriptional co-regulators and RNA Polymerase II to cisregulatory elements. The strength and regulation of transcription can be modulated by a variety of factors including TFBS composition, TFBS affinity and number, distance between TFBSs, distance of TFBSs to transcription start sites, and epigenetic modifications. We still lack a basic comprehension of how such variables shaping cis-regulatory architecture culminate in quantitative transcriptional responses. Here we explored how such factors determine the transcriptional activity of a model transcription factor, the c-AMP Response Element (CRE) binding protein. We measured expression driven by 4,602 synthetic regulatory elements in a massively parallel reporter assay (MPRA) exploring the impact of CRE number, affinity, distance to the promoter, and spacing between multiple CREs. We found the number and affinity of CREs within regulatory elements largely determines overall expression, and this relationship is shaped by the proximity of each CRE to the downstream promoter. In addition, while we observed expression periodicity as the CRE distance to the promoter varied, the spacing between multiple CREs altered this periodicity. Finally, we compare library expression between an episomal MPRA and a new, genomically-integrated MPRA in which a single synthetic regulatory element is present per cell at a defined locus. We observe that these largely recapitulate each other although weaker, non-canonical CREs exhibited greater activity in the genomic context. next-generation sequencing, as well as the UCLA Eli & Edythe Broad Center of Regenerative Medicine Flow Cytometry Core for assistance in flow cytometry. We also thank Laura Day from the Kruglyak lab at UCLA for use of and assistance on their MiSeq. We also thank Dr. Michael R. Sawaya for his guidance in CREB-CRE structural modeling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.