The majority of CpG dinucleotides in the human genome are methylated at cytosine bases. However, active gene regulatory elements are generally hypomethylated relative to their flanking regions, and the binding of some transcription factors (TFs) is diminished by methylation of their target sequences. By analysis of 542 human TFs with methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment), we found that there are also many TFs that prefer CpG-methylated sequences. Most of these are in the extended homeodomain family. Structural analysis showed that homeodomain specificity for methylcytosine depends on direct hydrophobic interactions with the methylcytosine 5-methyl group. This study provides a systematic examination of the effect of an epigenetic DNA modification on human TF binding specificity and reveals that many developmentally important proteins display preference for mCpG-containing sequences.
Gene expression is regulated by transcription factors (TFs), proteins that recognize short DNA sequence motifs. Such sequences are very common in the human genome, and an important determinant of the specificity of gene expression is the cooperative binding of multiple TFs to closely located motifs. However, interactions between DNA-bound TFs have not been systematically characterized. To identify TF pairs that bind cooperatively to DNA, and to characterize their spacing and orientation preferences, we have performed consecutive affinity-purification systematic evolution of ligands by exponential enrichment (CAP-SELEX) analysis of 9,400 TF-TF-DNA interactions. This analysis revealed 315 TF-TF interactions recognizing 618 heterodimeric motifs, most of which have not been previously described. The observed cooperativity occurred promiscuously between TFs from diverse structural families. Structural analysis of the TF pairs, including a novel crystal structure of MEIS1 and DLX3 bound to their identified recognition site, revealed that the interactions between the TFs were predominantly mediated by DNA. Most TF pair sites identified involved a large overlap between individual TF recognition motifs, and resulted in recognition of composite sites that were markedly different from the individual TF's motifs. Together, our results indicate that the DNA molecule commonly plays an active role in cooperative interactions that define the gene regulatory lexicon.
Cohesin is present in almost all active enhancer regions, where it is associated with transcription factors. Cohesin frequently colocalizes with CTCF (CCCTC-binding factor), affecting genomic stability, expression and epigenetic homeostasis. Cohesin subunits are mutated in cancer, but CTCF/cohesin-binding sites (CBSs) in DNA have not been examined for mutations. Here we report frequent mutations at CBSs in cancers displaying a mutational signature where mutations in A•T base pairs predominate. Integration of whole-genome sequencing data from 213 colorectal cancer (CRC) samples and chromatin immunoprecipitation sequencing (ChIP-exo) data identified frequent point mutations at CBSs. In contrast, CRCs showing an ultramutator phenotype caused by defects in the exonuclease domain of DNA polymerase ɛ (POLE) displayed significantly fewer mutations at and adjacent to CBSs. Analysis of public data showed that multiple cancer types accumulate CBS mutations. CBSs are a major mutational hotspot in the noncoding cancer genome.
During cell division, transcription factors (TFs) are removed from chromatin twice, during DNA synthesis and during condensation of chromosomes. How TFs can efficiently find their sites following these stages has been unclear. Here, we have analyzed the binding pattern of expressed TFs in human colorectal cancer cells. We find that binding of TFs is highly clustered and that the clusters are enriched in binding motifs for several major TF classes. Strikingly, almost all clusters are formed around cohesin, and loss of cohesin decreases both DNA accessibility and binding of TFs to clusters. We show that cohesin remains bound in S phase, holding the nascent sister chromatids together at the TF cluster sites. Furthermore, cohesin remains bound to the cluster sites when TFs are evicted in early M phase. These results suggest that cohesin-binding functions as a cellular memory that promotes re-establishment of TF clusters after DNA replication and chromatin condensation.
PAPER ABSTRACTGene expression is regulated by transcription factors (TFs), proteins that recognize short DNA sequence motifs. Such sequences are very common in the human genome, and an important determinant of the specificity of gene expression is the cooperative binding of multiple TFs to closely located motifs. However, interactions between DNA-bound TFs have not been systematically characterized. To identify TF pairs that bind cooperatively to DNA, and to characterize their spacing and orientation preferences, we have performed consecutive affinity-purification systematic evolution of ligands by exponential enrichment (CAP-SELEX) analysis of 9,400 TF-TF-DNA interactions. This analysis revealed 315 TF-TF interactions recognizing 618 heterodimeric motifs, most of which have not been previously described. The observed cooperativity occurred promiscuously between TFs from diverse structural families. Structural analysis of the TF pairs, including a novel crystal structure of MEIS1 and DLX3 bound to their identified recognition site, revealed that the interactions between the TFs were predominantly mediated by DNA. Most TF pair sites identified involved a large overlap between individual TF recognition motifs, and resulted in recognition of composite sites that were markedly different from the individual TF's motifs. Together, our results indicate that the DNA molecule commonly plays an active role in cooperative interactions that define the gene regulatory lexicon. SUMMARYCharting transcription factor interactions. By using a new method, we discovered a large number of transcription factor (TF) pairs that bind cooperatively to DNA, and found that in many cases the TF pairs recognize a composite motif ('compound DNA word') that is markedly different from that expected from the individual TF motifs ('DNA words'). The problemIndividual TF recognition sequences (DNA words) are common in the human genome, so the specificity of gene expression depends on the cooperative binding of multiple TFs to sites close to each other. The critical role of particular TF combinations in cell-fate determination and development is well established 1,2 . The TFs in cultured cells bind to only a subset of their potential target sites, and many of the occupied sites do not contain high-affinity TF motifs, suggesting that cooperative interactions allow TFs to bind to low-affinity sites [3][4][5] . Research has shown many examples where two TFs bind DNA together as cooperative complexes that form as a result of protein-protein interactions 6 , DNA-facilitated protein-protein contacts, and interactions mediated by DNA [7][8][9] . Most cases have been discovered by studying individual TF pairs, however, so it has not been known how common such TF-TF interactions are. The lack of understanding of TF interactions is one reason why the gene regulatory code, which determines how DNA sequence defines geneexpression patterns, has remained poorly understood. The solutionTo systematically identify TF-TF interactions in the presence of DNA, we dev...
DNA can determine where and when genes are expressed, but the full set of sequence determinants that control gene expression is unknown. Here, we measured the transcriptional activity of DNA sequences that represent an ~100 times larger sequence space than the human genome using massively parallel reporter assays (MPRAs). Machine learning models revealed that transcription factors (TFs) generally act in an additive manner with weak grammar and that most enhancers increase expression from a promoter by a mechanism that does not appear to involve specific TF–TF interactions. The enhancers themselves can be classified into three types: classical, closed chromatin and chromatin dependent. We also show that few TFs are strongly active in a cell, with most activities being similar between cell types. Individual TFs can have multiple gene regulatory activities, including chromatin opening and enhancing, promoting and determining transcription start site (TSS) activity, consistent with the view that the TF binding motif is the key atomic unit of gene expression.
Point mutations in cancer have been extensively studied but chromosomal gains and losses have been more challenging to interpret due to their unspecific nature. Here we examine high-resolution allelic imbalance (AI) landscape in 1699 colorectal cancers, 256 of which have been whole-genome sequenced (WGSed). The imbalances pinpoint 38 genes as plausible AI targets based on previous knowledge. Unbiased CRISPR-Cas9 knockout and activation screens identified in total 79 genes within AI peaks regulating cell growth. Genetic and functional data implicate loss of TP53 as a sufficient driver of AI. The WGS highlights an influence of copy number aberrations on the rate of detected somatic point mutations. Importantly, the data reveal several associations between AI target genes, suggesting a role for a network of lineage-determining transcription factors in colorectal tumorigenesis. Overall, the results unravel the contribution of AI in colorectal cancer and provide a plausible explanation why so few genes are commonly affected by point mutations in cancers.
The gene desert upstream of the MYC oncogene on chromosome 8q24 contains susceptibility loci for several major forms of human cancer. The region shows high conservation between human and mouse and contains multiple MYC enhancers that are activated in tumor cells. However, the role of this region in normal development has not been addressed. Here we show that a 538 kb deletion of the entire MYC upstream super-enhancer region in mice results in 50% to 80% decrease in Myc expression in multiple tissues. The mice are viable and show no overt phenotype. However, they are resistant to tumorigenesis, and most normal cells isolated from them grow slowly in culture. These results reveal that only cells whose MYC activity is increased by serum or oncogenic driver mutations depend on the 8q24 super-enhancer region, and indicate that targeting the activity of this element is a promising strategy of cancer chemoprevention and therapy.DOI: http://dx.doi.org/10.7554/eLife.23382.001
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.