Uncovering the mechanisms that affect the binding specificity of transcription factors (TFs) is critical for understanding the principles of gene regulation. Although sequence-based models have been used successfully to predict TF binding specificities, we found that including DNA shape information in these models improved their accuracy and interpretability. Previously, we developed a method for modeling DNA binding specificities based on DNA shape features extracted from Monte Carlo (MC) simulations. Prediction accuracies of our models, however, have not yet been compared to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) data or Molecular Dynamics (MD) simulations. Here, we integrated DNA shape information extracted from MC or MD simulations and XRC data into predictive models of TF binding and compared their performance. Models that incorporated structural information consistently showed improved performance over sequence-based models regardless of data source. Furthermore, we derived and validated nine additional DNA shape features beyond our original set of four features. The expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width, is available in our R/Bioconductor package DNAshapeR and enables a comprehensive structural description of the double helix on a genome-wide scale.
Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu.
FOXA2, a member of the forkhead family of transcription factors, plays essential roles in liver development and bile acid homeostasis. In this study, we report a 2.8 Å co-crystal structure of the FOXA2 DNA-binding domain (FOXA2-DBD) bound to a DNA duplex containing a forkhead consensus binding site (GTAAACA). FOXA2-DBD adopts the canonical winged-helix fold, with helix H3 and wing 1 regions mainly mediating the DNA recognition. Although the wing 2 region was not defined in the structure, isothermal titration calorimetry (ITC) assays suggested that this region was required for optimal DNA binding. Structure comparison with the FOXA3-DBD bound to DNA revealed more major groove contacts and less minor groove contacts in the FOXA2 structure compared to the FOXA3 structure. Structure comparison with the FOXO1-DBD bound to DNA showed that different forkhead proteins could induce different DNA conformations upon binding to identical DNA sequences. Our findings provide the structural basis for FOXA2 protein binding to a consensus forkhead site and elucidate how members of the forkhead protein family bind different DNA sites.
DNAproDB (https://dnaprodb.usc.edu) is a web-based database and structural analysis tool that offers a combination of data visualization, data processing and search functionality that improves the speed and ease with which researchers can analyze, access and visualize structural data of DNA–protein complexes. In this paper, we report significant improvements made to DNAproDB since its initial release. DNAproDB now supports any DNA secondary structure from typical B-form DNA to single-stranded DNA to G-quadruplexes. We have updated the structure of our data files to support complex DNA conformations, multiple DNA–protein complexes within a DNAproDB entry and model indexing for analysis of ensemble data. Support for chemically modified residues and nucleotides has been significantly improved along with the addition of new structural features, improved structural moiety assignment and use of more sequence-based annotations. We have redesigned our report pages and search forms to support these enhancements, and the DNAproDB website has been improved to be more responsive and user-friendly. DNAproDB is now integrated with the Nucleic Acid Database, and we have increased our coverage of available Protein Data Bank entries. Our database now contains 95% of all available DNA–protein complexes, making our tools for analysis of these structures accessible to a broad community.
We present a direct atom-by-atom chemical identification of the nanostructures and defects of topological insulators (TIs) with a state-of-the-art atomic mapping technology. Combining this technique and density function theory calculations, we identify and explain the layer chemistry evolution of Bi(2)Te(3–x)Se(x) ternary TIs. We also reveal a long neglected but crucially important extended defect found to be universally present in Bi(2)Te(3) films, the seven-layer Bi(3)Te(4) nanolamella acceptors. Intriguingly, this defect is found to locally pull down the conduction band, leading to local n-type conductivity, despite being an acceptor which pins the Fermi energy near the valence band maximum. This nanolamella may explain inconsistencies in measured conduction type as well as open up a new route to manipulate bulk carrier concentration. Our work may pave the way to more thoroughly understand and tailor the nature of the bulk, as well as secure controllable bulk states for future applications in quantum computing and dissipationless devices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.