Evaluating the specificity spectra of DNA binding molecules is a nontrivial challenge that hinders the ability to decipher gene regulatory networks or engineer molecules that act on genomes.Here we compare the DNA sequence specificities for different classes of proteins and engineered DNA binding molecules across the entire sequence space. These high-content data are visualized and interpreted using an interactive "specificity landscape" which simultaneously displays the affinity and specificity of a million-plus DNA sequences. Contrary to expectation, specificity landscapes reveal that synthetic DNA ligands match, and often surpass, the specificities of eukaryotic DNA binding proteins. The landscapes also identify differential specificity constraints imposed by diverse structural folds of natural and synthetic DNA binders. Importantly, the sequence context of a binding site significantly influences binding energetics, and utilizing the full contextual information permits greater accuracy in annotating regulatory elements within a given genome. Assigning such context-dependent binding values to every DNA sequence across the genome yields predictive genome-wide binding landscapes (genomescapes). A genomescape of a synthetic DNA binding molecule provided insight into its differential regulatory activity in cultured cells. The approach we describe will accelerate the creation of precision-tailored DNA therapeutics and uncover principles that govern sequence-specificity of DNA binding molecules.
Summary The control and function of RNA are governed by the specificity of RNA binding proteins. Here, we describe a method for global unbiased analysis of RNA-protein interactions that uses in vitro selection, high-throughput sequencing, and sequence-specificity landscapes. The method yields affinities for a vast array of RNAs in a single experiment, including both low- and high-affinity sites. It is reproducible and accurate. Using this approach, we analyzed members of the PUF (Pumilio and FBF) family of eukaryotic mRNA regulators. Our data identify effects of a specific protein partner on PUF-RNA interactions, reveal subsets of target sites not previously detected, and demonstrate that designer PUF proteins can precisely alter specificity. The approach described here is, in principle, broadly applicable for analysis of any molecule that binds RNA, including proteins, nucleic acids, and small molecules.
How transcription factor dimerization impacts DNA-binding specificity is poorly understood. Guided by protein dimerization properties, we examined DNA binding specificities of 270 human bZIP pairs. DNA interactomes of 80 heterodimers and 22 homodimers revealed that 72% of heterodimer motifs correspond to conjoined half-sites preferred by partnering monomers. Remarkably, the remaining motifs are composed of variably-spaced half-sites (12%) or ‘emergent’ sites (16%) that cannot be readily inferred from half-site preferences of partnering monomers. These binding sites were biochemically validated by EMSA-FRET analysis and validated in vivo by ChIP-seq data from human cell lines. Focusing on ATF3, we observed distinct cognate site preferences conferred by different bZIP partners, and demonstrated that genome-wide binding of ATF3 is best explained by considering many dimers in which it participates. Importantly, our compendium of bZIP-DNA interactomes predicted bZIP binding to 156 disease associated SNPs, of which only 20 were previously annotated with known bZIP motifs.DOI: http://dx.doi.org/10.7554/eLife.19272.001
Targeting the genome with sequence-specific synthetic molecules is a major goal at the interface of chemistry, biology, and personalized medicine. Pyrrole/imidazole based polyamides can be rationally designed to target specific DNA sequences with exquisite precision in vitro; yet, the biological outcomes are often difficult to interpret using current models of binding energetics. To directly identify the binding sites of polyamides across the genome, we designed, synthesized, and tested polyamide derivatives that enabled covalent crosslinking and localization of polyamide–DNA interaction sites in live human cells. Bioinformatic analysis of the data reveals that clustered binding sites, spanning a broad range of affinities, best predict occupancy in cells. In contrast to the prevailing paradigm of targeting single high-affinity sites, our results point to a new design principle to deploy polyamides and perhaps other synthetic molecules to effectively target desired genomic sites in vivo.
Spatial and temporal expression of genes is essential for maintaining phenotype integrity. Transcription factors (TFs) modulate expression patterns by binding to specific DNA sequences in the genome. Along with the core binding motif, the flanking sequence context can play a role in DNA–TF recognition. Here, we employ high-throughput in vitro and in silico analyses to understand the influence of sequences flanking the cognate sites in binding of three most prevalent eukaryotic TF families (zinc finger, homeodomain and bZIP). In vitro binding preferences of each TF toward the entire DNA sequence space were correlated with a wide range of DNA structural parameters, including DNA flexibility. Results demonstrate that conformational plasticity of flanking regions modulates binding affinity of certain TF families. DNA duplex stability and minor groove width also play an important role in DNA–TF recognition but differ in how exactly they influence the binding in each specific case. Our analyses further reveal that the structural features of preferred flanking sequences are not universal, as similar DNA-binding folds can employ distinct DNA recognition modes.
Targeting the genome with sequence-specific DNA-binding molecules is a major goal at the interface of chemistry, biology, and precision medicine. Polyamides, composed of N-methylpyrrole and N-methylimidazole monomers, are a class of synthetic molecules that can be rationally designed to “read” specific DNA sequences. However, the impact of different chromatin states on polyamide binding in live cells remains an unresolved question that impedes their deployment in vivo. Here, we use cross-linking of small molecules to isolate chromatin coupled to sequencing to map the binding of two bioactive and structurally distinct polyamides to genomes directly within live H1 human embryonic stem cells. This genome-wide view from live cells reveals that polyamide-based synthetic genome readers bind cognate sites that span a range of binding affinities. Polyamides can access cognate sites within repressive heterochromatin. The occupancy patterns suggest that polyamides could be harnessed to target loci within regions of the genome that are inaccessible to other DNA-targeting molecules.
Ligand-responsive allosteric transcription factors (aTF) play a vital role in genetic circuits and high-throughput screening because they transduce biochemical signals into gene expression changes. Programmable control of gene expression from aTF-regulated promoter is important because different downstream effector genes function optimally at different expression levels. However, tuning gene expression of native promoters is difficult due to complex layers of homeostatic regulation encoded within them. We engineered synthetic promoters de novo by embedding operator sites with varying affinities and radically reshaped binding preferences within a minimal, constitutive Escherichia coli promoter. Multiplexed cell-based screening of promoters for three TetR-like aTFs generated with this approach gave rich diversity of gene expression levels, dynamic ranges and ligand sensitivities and were 50- to 100-fold more active over their respective native promoters. Machine learning on our dataset revealed that relative position of the core motif and bases flanking the core motif play an important role in modulating induction response. Our generalized approach yields customizable and programmable aTF-regulated promoters for engineering cellular pathways and enables the discovery of new small molecule biosensors.
Artificial transcription factors (ATFs) are precision-tailored molecules designed to bind DNA and regulate transcription in a preprogrammed manner. Libraries of ATFs enable the high-throughput screening of gene networks that trigger cell fate decisions or phenotypic changes. We developed a genome-scale library of ATFs that display an engineered interaction domain (ID) to enable cooperative assembly and synergistic gene expression at targeted sites. We used this ATF library to screen for key regulators of the pluripotency network and discovered three combinations of ATFs capable of inducing pluripotency without exogenous expression of Oct4 (POU domain, class 5, TF 1). Cognate site identification, global transcriptional profiling, and identification of ATF binding sites reveal that the ATFs do not directly target Oct4; instead, they target distinct nodes that converge to stimulate the endogenous pluripotency network. This forward genetic approach enables cell type conversions without a priori knowledge of potential key regulators and reveals unanticipated gene network dynamics that drive cell fate choices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.