Until now, it has been reasonably assumed that specific base-pair recognition is the only mechanism controlling the specificity of transcription factor (TF)−DNA binding. Contrary to this assumption, here we show that nonspecific DNA sequences possessing certain repeat symmetries, when present outside of specific TF binding sites (TFBSs), statistically control TF−DNA binding preferences. We used highthroughput protein−DNA binding assays to measure the binding levels and free energies of binding for several human TFs to tens of thousands of short DNA sequences with varying repeat symmetries. Based on statistical mechanics modeling, we identify a new protein−DNA binding mechanism induced by DNA sequence symmetry in the absence of specific base-pair recognition, and experimentally demonstrate that this mechanism indeed governs protein−DNA binding preferences. protein−DNA binding is an important biophysical mechanism operating in a living cell (1). This seminal work makes it possible to interpret experiments that measured how transcription factors (TFs) search for their specific target sites flanked by nonconsensus sequence elements (1-10). A specific consensus motif is a short DNA sequence, typically 6-20 base pairs (bp), that possesses an enhanced binding affinity for a particular TF. For example, the sequence CACGTG represents the specific consensus motif for the human protein Max used in this study (Fig. 1). The process of establishing specific, consensus protein−DNA binding requires the formation of precise geometrical fit between the protein and its consensus DNA motif, accompanied by the formation of specific hydrogen and electrostatic contacts at the protein−DNA binding interface (6, 7) ( Fig. 1). In addition to binding to their consensus DNA motifs, transcription factors can also bind, albeit with lower affinity, to DNA regions lacking any consensus motifs. The term "nonspecific protein−DNA binding" (6) is typically used to describe these weaker interactions. Von Hippel and Berg suggested classifying nonspecific protein−DNA binding into two related mechanisms (6). The first mechanism includes protein binding to its mutated specific motifs that retain some residual, reduced specificity. The second mechanism is largely DNA sequence independent, and it involves electrostatic binding modulated by the overall DNA geometry (6). Despite significant experimental progress, molecular mechanisms responsible for these two types of nonspecific binding remain poorly understood, and the free energy of nonspecific protein−DNA binding has not been systematically characterized (11)(12)(13)(14). The interplay between consensus and nonconsensus DNA sequence elements emerges as a dominant factor that governs protein−DNA binding preferences. However, this interplay is also poorly understood (15, 16). Until now, it has been reasonably assumed that specific (consensus) base-pair recognition must control the genome-wide specificity of TF−DNA binding.Contrary to this assumption, here we identify a general mechanism for protein−DNA bi...
SUMMARY PARAGRAPH Transcription factors (TF) recognize specific genomic sequences to regulate complex gene expression programs. Although it is well established that TFs bind specific DNA sequences using a combination of base readout and shape recognition, some fundamental aspects of protein-DNA binding remain poorly understood 1 , 2 . Many DNA-binding proteins induce changes in the DNA structure outside the intrinsic B-DNA envelope. However, how the energetic cost associated with distorting DNA contributes to recognition has proven difficult to study because the distorted DNA exists in low-abundance in the unbound ensemble 3 – 9 . Here, we use a novel high-throughput assay called SaMBA ( Sa turation M ismatch- B inding A ssay) to investigate the role of DNA conformational penalties in TF-DNA recognition. In SaMBA, mismatches are introduced to pre-induce DNA structural distortions much larger than those induced by changes in Watson-Crick sequence. Strikingly, approximately 10% of mismatches increased TF binding, and at least one mismatch was found that increased the binding affinity for each of 22 examined TFs. Mismatches also converted non-specific sites into high-affinity sites, and high-affinity sites into super-sites stronger than any known canonical binding site. Determination of high-resolution X-ray structures, combined with NMR measurements and structural analyses revealed that many of the mismatches that increase binding induce distortions similar to those induced by protein binding, thus pre-paying some of the energetic cost to deform the DNA. Our work indicates that conformational penalties are a major determinant of protein-DNA recognition, and reveals mechanisms by which mismatches can recruit TFs and thus modulate replication and repair activities in the cell 10 , 11 .
Non-coding genetic variants/mutations can play functional roles in the cell by disrupting regulatory interactions between transcription factors (TFs) and their genomic target sites. For most human TFs, a myriad of DNA-binding models are available and could be used to predict the effects of DNA mutations on TF binding. However, information on the quality of these models is scarce, making it hard to evaluate the statistical significance of predicted binding changes. Here, we present QBiC-Pred, a web server for predicting quantitative TF binding changes due to nucleotide variants. QBiC-Pred uses regression models of TF binding specificity trained on high-throughput in vitro data. The training is done using ordinary least squares (OLS), and we leverage distributional results associated with OLS estimation to compute, for each predicted change in TF binding, a P-value reflecting our confidence in the predicted effect. We show that OLS models are accurate in predicting the effects of mutations on TF binding in vitro and in vivo, outperforming widely-used PWM models as well as recently developed deep learning models of specificity. QBiC-Pred takes as input mutation datasets in several formats, and it allows post-processing of the results through a user-friendly web interface. QBiC-Pred is freely available at http://qbic.genome.duke.edu.
Quantitative understanding of the principles regulating nucleosome occupancy on a genome-wide level is a central issue in eukaryotic genomics. Here, we address this question using budding yeast, Saccharomyces cerevisiae, as a model organism. We perform a genome-wide computational analysis of the nonspecific transcription factor (TF)-DNA binding free-energy landscape and compare this landscape with experimentally determined nucleosome-binding preferences. We show that DNA regions with enhanced nonspecific TF-DNA binding are statistically significantly depleted of nucleosomes. We suggest therefore that the competition between TFs with histones for nonspecific binding to genomic sequences might be an important mechanism influencing nucleosome-binding preferences in vivo. We also predict that poly(dA:dT) and poly(dC:dG) tracts represent genomic elements with the strongest propensity for nonspecific TF-DNA binding, thus allowing TFs to outcompete nucleosomes at these elements. Our results suggest that nonspecific TF-DNA binding might provide a barrier for statistical positioning of nucleosomes throughout the yeast genome. We predict that the strength of this barrier increases with the concentration of DNA binding proteins in a cell. We discuss the connection of the proposed mechanism with the recently discovered pathway of active nucleosome reconstitution.
In the process of transcription elongation, RNA polymerase (RNAP) pauses at highly nonrandom positions across genomic DNA, broadly regulating transcription; however, molecular mechanisms responsible for the recognition of such pausing positions remain poorly understood. Here, using a combination of statistical mechanical modeling and high-throughput sequencing and biochemical data, we evaluate the effect of thermal fluctuations on the regulation of RNAP pausing. We demonstrate that diffusive backtracking of RNAP, which is biased by repetitive DNA sequence elements, causes transcriptional pausing. This effect stems from the increased microscopic heterogeneity of an elongation complex, and thus is entropydominated. This report shows a linkage between repetitive sequence elements encoded in the genome and regulation of RNAP pausing driven by thermal fluctuations. An elongation complex (EC) consists of RNAP bound to double-stranded DNA and the RNA/DNA hybrid with the 3′ end of the RNA positioned in the active site of RNAP (4, 5). As the phosphodiester bond is formed, the RNA/DNA hybrid shifts back to vacate the active site, enabling the next NTP to enter and pair with the next exposed template DNA base in a process called translocation (1). Translocation is a smooth process (6, 7), except in cases where certain DNA sequences impose intrinsic translocation barriers (1,2,8). This block of translocation and any inhibition of the next bond formation are causes for RNAP pausing (1, 9-11).Backtracking of RNAP along DNA stabilizes pausing by preventing a forward translocation and NMP addition to the elongating RNA (1,8,12). Backtracking leads to extrusion of one or more nucleotide(s) at the 3′ RNA end beyond the active site of RNAP (13-15). Some backtracked ECs are stable enough to block DNA replication (16), and thus destabilize a genome (17-19). Prokaryotic Gre factors or eukaryotic TFIIS allows the backtracked EC to resume transcription by causing RNAP to cleave any extruded 3′ RNA from the active site (20, 21), thereby removing a potential barrier to replicating DNA polymerases (17)(18)(19).To investigate the sequence motif that causes transcriptional pausing and the distribution in vivo, we have previously performed native elongating transcript sequencing (NET-seq) (22) combined with RNase footprinting of the transcripts (RNET-seq) (10). This approach identified GNNNNNNTGCG as a representative RNAP pause-inducing element (PIE) in Escherichia coli cells. PIE is similar to pausing motifs identified by single-molecule or biochemical studies for E. coli RNAP and yeast/human RNAPII (2,8,23). However, the presence of this consensus DNA motif, when transcribed, does not always result in pausing, indicating that RNAP pausing is controlled by additional intrinsic or extrinsic mechanistic factors. Therefore, this fact leaves open a key question regarding the mechanism of RNAP pausing.In other areas of transcription, we have shown that certain genomic background sequences surrounding a consensus motif can modulate binding of the...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.