The recognition of specific DNA sequences by proteins is thought to depend on two types of mechanisms: one that involves the formation of hydrogen bonds with specific bases, primarily in the major groove, and one involving sequence-dependent deformations of the DNA helix. By comprehensively analyzing the three dimensional structures of protein-DNA complexes, we show that the binding of arginines to narrow minor grooves is a widely used mode for protein-DNA recognition. This readout mechanism exploits the phenomenon that narrow minor grooves strongly enhance the negative electrostatic potential of the DNA. The nucleosome core particle offers a striking example of this effect. Minor groove narrowing is often associated with the presence of A-tracts, AT-rich sequences that exclude the flexible TpA step. These findings suggest that the ability to detect local variations in DNA shape and electrostatic potential is a general mechanism that enables proteins to use information in the minor groove, which otherwise offers few opportunities for the formation of base-specific hydrogen bonds, to achieve DNA binding specificity.
Specific interactions between proteins and DNA are fundamental to many biological processes. In this review, we provide a revised view of protein-DNA interactions that emphasizes the importance of the three-dimensional structures of both macromolecules. We divide protein-DNA interactions into two categories: those where the protein recognizes the unique chemical signatures of the DNA bases (base readout) and those where the protein recognizes a sequence-dependent DNA shape (shape readout). We further divide base readout into those interactions that occur in the major groove from those that occur in the minor groove. Analogously, the readout of DNA shape is subdivided into global shape recognition, for example when the DNA helix exhibits an overall bend, and local shape recognition, for example when a base pair step is kinked or when a region of the minor groove is narrow. Based on the >1500 structures of protein-DNA complexes now available in the Protein Data Base, we argue that individual DNA binding proteins combine multiple readout mechanisms to achieve DNA binding specificity. Specificity that distinguishes between families frequently involves base readout in the major groove while shape readout is often exploited for higher resolution specificity, to distinguish between members within the same DNA-binding protein family.
Summary Members of transcription factor families typically have similar DNA binding specificities yet execute unique functions in vivo. Transcription factors often bind DNA as multiprotein complexes, raising the possibility that complex formation might modify their DNA binding specificities. To test this hypothesis, we developed an experimental and computational platform, SELEX-seq, that can be used to determine the relative affinities to any DNA sequence for any transcription factor complex. Applying this method to all eight Drosophila Hox proteins, we show that they obtain novel recognition properties when they bind DNA with the dimeric cofactor Extradenticle-Homothorax (Exd). Exd-Hox specificities group into three main classes that obey Hox gene collinearity rules and DNA structure predictions suggest that anterior and posterior Hox proteins prefer DNA sequences with distinct minor groove topographies. Together, these data suggest that emergent DNA recognition properties revealed by interactions with cofactors contribute to transcription factor specificities in vivo.
The recognition of specific DNA-binding sites by transcription factors is a critical yet poorly understood step in the control of gene expression. Members of the Hox family of transcription factors bind DNA by making nearly identical major groove contacts via the recognition helices of their homeodomains. In vivo specificity, however, often depends on extended and unstructured regions that link Hox homeodomains to a DNA-bound cofactor, Extradenticle (Exd). Using a combination of structure determination, computational analysis, and in vitro and in vivo assays, we show that Hox proteins recognize specific Hox-Exd binding sites via residues located in these extended regions that insert into the minor groove but only when presented with the correct DNA sequence. Our results suggest that these residues, which are conserved in a paralog-specific manner, confer specificity by recognizing a sequence-dependent DNA structure instead of directly reading a specific DNA sequence.
Transcription factors (TFs) influence cell fate by interpreting the regulatory DNA within a genome. TFs recognize DNA in a specific manner; the mechanisms underlying this specificity have been identified for many TFs, based on three-dimensional structures of protein-DNA complexes. More recently, structural views have been complemented with data from high-throughput in vitro and in vivo explorations of the DNA binding preferences of many TFs. Together, these approaches have greatly expanded our understanding of TF-DNA interactions. However, the mechanisms by which TFs select in vivo binding sites and alter gene expression remain unclear. Recent work has highlighted the many variables that influence TF-DNA binding, while demonstrating that a biophysical understanding of these many factors will be central to understanding TF function.
We present a method and web server for predicting DNA structural features in a high-throughput (HT) manner for massive sequence data. This approach provides the framework for the integration of DNA sequence and shape analyses in genome-wide studies. The HT methodology uses a sliding-window approach to mine DNA structural information obtained from Monte Carlo simulations. It requires only nucleotide sequence as input and instantly predicts multiple structural features of DNA (minor groove width, roll, propeller twist and helix twist). The results of rigorous validations of the HT predictions based on DNA structures solved by X-ray crystallography and NMR spectroscopy, hydroxyl radical cleavage data, statistical analysis and cross-validation, and molecular dynamics simulations provide strong confidence in this approach. The DNAshape web server is freely available at http://rohslab.cmb.usc.edu/DNAshape/.
SUMMARY DNA sequence is a major determinant of the binding specificity of transcription factors (TFs) for their genomic targets. However, eukaryotic cells often express, at the same time, TFs with highly similar DNA binding motifs but distinct in vivo targets. Currently, it is not well understood how TFs with seemingly identical DNA motifs achieve unique specificities in vivo. Here, we used custom protein binding microarrays to analyze TF specificity for putative binding sites in their genomic sequence context. Using yeast TFs Cbf1 and Tye7 as our case study, we found that binding sites of these bHLH TFs (i.e., E-boxes) are bound differently in vitro and in vivo, depending on their genomic context. Computational analyses suggest that nucleotides outside E-box binding sites contribute to specificity by influencing the 3D structure of DNA binding sites. Thus, local shape of target sites might play a widespread role in achieving regulatory specificity within TF families.
p53 binds as a tetramer to DNA targets consisting of two decameric half-sites separated by a variable spacer. Here we present high-resolution crystal structures of complexes between p53 core-domain tetramers and DNA targets consisting of contiguous half-sites. In contrast to previously reported p53-DNA complexes that display standard Watson-Crick base pairs, the newly reported structures exhibit non-canonical Hoogsteen base-pairing geometry at the central A/T doublet of each half-site. Structural and computational analyses demonstrate that the Hoogsteen geometry distinctly modulates the B-DNA helix in terms of local shape and electrostatic potential which together with the contiguous DNA configuration results in enhanced protein-DNA and protein-protein interactions compared to non-contiguous half-sites. Our results suggest a mechanism, which relates spacer length to protein-DNA binding affinity. Our findings also expand the current understanding of protein-DNA recognition and establish the structural and chemical properties of Hoogsteen base pairs as the basis for a novel mode of sequence readout.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.