Although the key promoter elements necessary to drive transcription in Escherichia coli have long been understood, we still cannot predict the behavior of arbitrary novel promoters, hampering our ability to characterize the myriad of sequenced regulatory architectures as well as to design novel synthetic circuits. This work builds on a beautiful recent experiment by Urtecho et al. who measured the gene expression of over 10,000 promoters spanning all possible combinations of a small set of regulatory elements. Using this data, we demonstrate that a central claim in energy matrix models of gene expression -that each promoter element contributes independently and additively to gene expression -contradicts experimental measurements. We propose that a key missing ingredient from such models is the avidity between the -35 and -10 RNA polymerase binding sites and develop what we call a refined energy matrix model that incorporates this effect. We show that this the refined energy matrix model can characterize the full suite of gene expression data and explore several applications of this framework, namely, how multivalent binding at the -35 and -10 sites can buffer RNAP kinetics against mutations and how promoters that bind overly tightly to RNA polymerase can inhibit gene expression. The success of our approach suggests that avidity represents a key physical principle governing the interaction of RNA polymerase to its promoter.
Significance StatementCellular behavior is ultimately governed by the genetic program encoded in its DNA and through the arsenal of molecular machines that actively transcribe its genes, yet we lack the ability to predict how an arbitrary DNA sequence will perform. To that end, we analyze the performance of over 10,000 regulatory sequences and develop a model that can predict the behavior of any sequence based on its composition. By considering promoters that only vary by one or two elements, we can characterize how different components interact, providing fundamental insights into the mechanisms of transcription.