The derivation of human ES cells (hESCs) from human blastocysts represents one of the milestones in stem cell biology. The full potential of hESCs in research and clinical applications requires a detailed understanding of the genetic network that governs the unique properties of hESCs. Here, we report a genome-wide RNA interference screen to identify genes which regulate self-renewal and pluripotency properties in hESCs. Interestingly, functionally distinct complexes involved in transcriptional regulation and chromatin remodelling are among the factors identified in the screen. To understand the roles of these potential regulators of hESCs, we studied transcription factor PRDM14 to gain new insights into its functional roles in the regulation of pluripotency. We showed that PRDM14 regulates directly the expression of key pluripotency gene POU5F1 through its proximal enhancer. Genome-wide location profiling experiments revealed that PRDM14 colocalized extensively with other key transcription factors such as OCT4, NANOG and SOX2, indicating that PRDM14 is integrated into the core transcriptional regulatory network. More importantly, in a gain-of-function assay, we showed that PRDM14 is able to enhance the efficiency of reprogramming of human fibroblasts in conjunction with OCT4, SOX2 and KLF4. Altogether, our study uncovers a wealth of novel hESC regulators wherein PRDM14 exemplifies a key transcription factor required for the maintenance of hESC identity and the reacquisition of pluripotency in human somatic cells.
Transcription factors (TFs) influence cell fate by interpreting the regulatory DNA within a genome. TFs recognize DNA in a specific manner; the mechanisms underlying this specificity have been identified for many TFs, based on three-dimensional structures of protein-DNA complexes. More recently, structural views have been complemented with data from high-throughput in vitro and in vivo explorations of the DNA binding preferences of many TFs. Together, these approaches have greatly expanded our understanding of TF-DNA interactions. However, the mechanisms by which TFs select in vivo binding sites and alter gene expression remain unclear. Recent work has highlighted the many variables that influence TF-DNA binding, while demonstrating that a biophysical understanding of these many factors will be central to understanding TF function.
We present a method and web server for predicting DNA structural features in a high-throughput (HT) manner for massive sequence data. This approach provides the framework for the integration of DNA sequence and shape analyses in genome-wide studies. The HT methodology uses a sliding-window approach to mine DNA structural information obtained from Monte Carlo simulations. It requires only nucleotide sequence as input and instantly predicts multiple structural features of DNA (minor groove width, roll, propeller twist and helix twist). The results of rigorous validations of the HT predictions based on DNA structures solved by X-ray crystallography and NMR spectroscopy, hydroxyl radical cleavage data, statistical analysis and cross-validation, and molecular dynamics simulations provide strong confidence in this approach. The DNAshape web server is freely available at http://rohslab.cmb.usc.edu/DNAshape/.
Somatic cells can be reprogrammed to induced pluripotent stem cells (iPSCs) with the introduction of Oct4, Sox2, Klf4, and c-Myc. Among these four factors, Oct4 is critical in inducing pluripotency because no transcription factor can substitute for Oct4, whereas Sox2, Klf4, and c-Myc can be replaced by other factors. Here we show that the orphan nuclear receptor Nr5a2 (also known as Lrh-1) can replace Oct4 in the derivation of iPSCs from mouse somatic cells, and it can also enhance reprogramming efficiency. Sumoylation mutants of Nr5a2 with enhanced transcriptional activity can further increase reprogramming efficiency. Genome-wide location analysis reveals that Nr5a2 shares many common gene targets with Sox2 and Klf4, which suggests that the transcription factor trio works in concert to mediate reprogramming. We also show that Nr5a2 works in part through activating Nanog. Together, we show that unrelated transcription factors can replace Oct4 and uncovers an exogenous Oct4-free reprogramming code.
DNA binding specificities of transcription factors (TFs) are a key component of gene regulatory processes. Underlying mechanisms that explain the highly specific binding of TFs to their genomic target sites are poorly understood. A better understanding of TF−DNA binding requires the ability to quantitatively model TF binding to accessible DNA as its basic step, before additional in vivo components can be considered. Traditionally, these models were built based on nucleotide sequence. Here, we integrated 3D DNA shape information derived with a high-throughput approach into the modeling of TF binding specificities. Using support vector regression, we trained quantitative models of TF binding specificity based on protein binding microarray (PBM) data for 68 mammalian TFs. The evaluation of our models included crossvalidation on specific PBM array designs, testing across different PBM array designs, and using PBM-trained models to predict relative binding affinities derived from in vitro selection combined with deep sequencing (SELEX-seq). Our results showed that shapeaugmented models compared favorably to sequence-based models. Although both k-mer and DNA shape features can encode interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space. In addition, analyzing the feature weights of DNA shape-augmented models uncovered TF family-specific structural readout mechanisms that were not revealed by the DNA sequence. As such, this work combines knowledge from structural biology and genomics, and suggests a new path toward understanding TF binding and genome function.protein−DNA recognition | statistical machine learning | support vector regression | protein binding microarray | DNA structure
Large scale mapping of transcriptomes has revealed significant levels of transcriptional activity within both unannotated and annotated regions of the genome. Interestingly, many of the novel transcripts demonstrate tissue-specific expression and some level of sequence conservation across species, but most have low protein-coding potential. Here we describe progress in identifying and characterizing long noncoding RNAs and review how these transcripts interact with other biological molecules to regulate diverse cellular processes. We also preview emerging techniques that will help advance the discovery and characterization of novel transcripts. Finally, we discuss the role of long non-coding RNAs in disease and therapeutics.
Transcription factors (TFs) preferentially bind sites contained in regions of computationally predicted high nucleosomal occupancy, suggesting that nucleosomes are gatekeepers of TF binding sites. However, because of their complexity mammalian genomes contain millions of randomly occurring, unbound TF consensus binding sites. We hypothesized that the information controlling nucleosome assembly may coincide with the information that enables TFs to bind cis-regulatory elements while ignoring randomly occurring sites. Hence, nucleosome would selectively mask genomic sites contacted by TFs and thus potentially functional. The hematopoietic TF Pu.1 maintained nucleosome depletion at macrophage-specific enhancers that displayed a broad range of nucleosome occupancy in other cell types and in reconstituted chromatin. We identified a minimal set of DNA sequence and shape features that accurately predicted both Pu.1 binding and nucleosome occupancy genome-wide. These data reveal a basic organizational principle of mammalian cis-regulatory elements whereby TF recruitment and nucleosome deposition are controlled by overlapping DNA sequence features.
Summary Protein-DNA binding is mediated by the recognition of the chemical signatures of the DNA bases and the three-dimensional shape of the DNA molecule. Because DNA shape is a consequence of sequence, it is difficult to dissociate these modes of recognition. Here, we tease them apart in the context of Hox-DNA binding by mutating residues that, in a co-crystal structure, only recognize DNA shape. Complexes made with these mutants lose the preference to bind sequences with specific DNA shape features. Introducing shape-recognizing residues from one Hox protein to another swapped binding specificities in vitro and gene regulation in vivo. Statistical machine learning revealed that the accuracy of binding specificity predictions improves by adding shape features to a model that only depends on sequence, and feature selection identified shape features important for recognition. Thus, shape readout is a direct and independent component of binding site selection by Hox proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.