Summary CpG islands (CGIs) function as promoters for approximately 60% of human genes. Most of these elements remain protected from CpG methylation, a prevalent epigenetic modification associated with transcriptional silencing. Here, we report that methylation-resistant CGI promoters are characterized by significant strand asymmetry in the distribution of guanines and cytosines (GC skew) immediately downstream from their transcription start sites. Using innovative genomics methodologies, we show that transcription through regions of GC skew leads to the formation of long R-loop structures. Furthermore, we show that GC skew and R-loop formation potential is correlated with and predictive of the unmethylated state of CGIs. Finally, we provide evidence that R-loop formation protects from DNMT3B1, the primary de novo DNA methyltransferase in early development. Altogether, these results suggest that protection from DNA methylation is a built-in characteristic of the DNA sequence of CGI promoters that is revealed by the co-transcriptional formation of R-loop structures.
The majority of CpG dinucleotides in the human genome are methylated at cytosine bases. However, active gene regulatory elements are generally hypomethylated relative to their flanking regions, and the binding of some transcription factors (TFs) is diminished by methylation of their target sequences. By analysis of 542 human TFs with methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment), we found that there are also many TFs that prefer CpG-methylated sequences. Most of these are in the extended homeodomain family. Structural analysis showed that homeodomain specificity for methylcytosine depends on direct hydrophobic interactions with the methylcytosine 5-methyl group. This study provides a systematic examination of the effect of an epigenetic DNA modification on human TF binding specificity and reveals that many developmentally important proteins display preference for mCpG-containing sequences.
R-loops are three-stranded nucleic acid structures formed upon annealing of an RNA strand to one strand of duplex DNA. We profiled R-loops using a high-resolution, strand-specific methodology in human and mouse cell types. R-loops are prevalent, collectively occupying up to 5% of mammalian genomes. R-loop formation occurs over conserved genic hotspots such as promoter and terminator regions of poly(A)-dependent genes. In most cases, R-loops occur co-transcriptionally and undergo dynamic turnover. Detailed epigenomic profiling revealed that R-loops associate with specific chromatin signatures. At promoters, R-loops associate with a hyper-accessible state characteristic of unmethylated CpG island promoters. By contrast, terminal R-loops associate with an enhancer- and insulator-like state and define a broad class of transcription terminators. Altogether, this suggests that the retention of nascent RNA transcripts at their site of expression represents an abundant, dynamic, and programmed component of the mammalian chromatin that impacts chromatin patterning and the control of gene expression.
Strand asymmetry in the distribution of guanines and cytosines, measured by GC skew, predisposes DNA sequences toward R-loop formation upon transcription. Previous work revealed that GC skew and R-loop formation associate with a core set of unmethylated CpG island (CGI) promoters in the human genome. Here, we show that GC skew can distinguish four classes of promoters, including three types of CGI promoters, each associated with unique epigenetic and gene ontology signatures. In particular, we identify a strong and a weak class of CGI promoters and show that these loci are enriched in distinct chromosomal territories reflecting the intrinsic strength of their protection against DNA methylation. Interestingly, we show that strong CGI promoters are depleted from the X chromosome while weak CGIs are enriched, a property consistent with the acquisition of DNA methylation during dosage compensation. Furthermore, we identify a third class of CGI promoters based on its unique GC skew profile and show that this gene set is enriched for Polycomb group targets. Lastly, we show that nearly 2000 genes harbor GC skew at their 3′ ends and that these genes are preferentially located in gene-dense regions and tend to be closely arranged. Genomic profiling of R-loops accordingly showed that a large proportion of genes with terminal GC skew form R-loops at their 3′ ends, consistent with a role for these structures in permitting efficient transcription termination. Altogether, we show that GC skew and R-loop formation offer significant insights into the epigenetic regulation, genomic organization, and function of human genes.
Eukaryotic transcription factors (TFs) are key determinants of gene activity, yet they bind only a fraction of their corresponding DNA sequence motifs in any given cell type. Chromatin has the potential to restrict accessibility of binding sites; however, in which context chromatin states are instructive for TF binding remains mainly unknown. To explore the contribution of DNA methylation to constrained TF binding, we mapped DNase-I-hypersensitive sites in murine stem cells in the presence and absence of DNA methylation. Methylation-restricted sites are enriched for TF motifs containing CpGs, especially for those of NRF1. In fact, the TF NRF1 occupies several thousand additional sites in the unmethylated genome, resulting in increased transcription. Restoring de novo methyltransferase activity initiates remethylation at these sites and outcompetes NRF1 binding. This suggests that binding of DNA-methylation-sensitive TFs relies on additional determinants to induce local hypomethylation. In support of this model, removal of neighbouring motifs in cis or of a TF in trans causes local hypermethylation and subsequent loss of NRF1 binding. This competition between DNA methylation and TFs in vivo reveals a case of cooperativity between TFs that acts indirectly via DNA methylation. Methylation removal by methylation-insensitive factors enables occupancy of methylation-sensitive factors, a principle that rationalizes hypomethylation of regulatory regions.
DNA methylation is considered a stable epigenetic mark, yet methylation patterns can vary during differentiation and in diseases such as cancer. Local levels of DNA methylation result from opposing enzymatic activities, the rates of which remain largely unknown. Here we developed a theoretical and experimental framework enabling us to infer methylation and demethylation rates at 860,404 CpGs in mouse embryonic stem cells. We find that enzymatic rates can vary as much as two orders of magnitude between CpGs with identical steady-state DNA methylation. Unexpectedly, de novo and maintenance methylation activity is reduced at transcription factor binding sites, while methylation turnover is elevated in transcribed gene bodies. Furthermore, we show that TET activity contributes substantially more than passive demethylation to establishing low methylation levels at distal enhancers. Taken together, our work unveils a genome-scale map of methylation kinetics, revealing highly variable and context-specific activity for the DNA methylation machinery.
Regulation of transcription, replication, and cell division relies on differential protein binding to DNA and chromatin, yet it is unclear which regulatory components remain bound to compacted mitotic chromosomes. By utilizing the buoyant density of DNA–protein complexes after cross-linking, we here develop a mass spectrometry-based approach to quantify the chromatin-associated proteome at separate stages of the cell cycle. While epigenetic modifiers that promote transcription are lost from mitotic chromatin, repressive modifiers generally remain associated. Furthermore, while proteins involved in transcriptional elongation are evicted, most identified transcription factors are retained on mitotic chromatin to varying degrees, including core promoter binding proteins. This predicts conservation of the regulatory landscape on mitotic chromosomes, which we confirm by genome-wide measurements of chromatin accessibility. In summary, this work establishes an approach to study chromatin, provides a comprehensive catalog of chromatin changes during the cell cycle, and reveals the degree to which the genomic regulatory landscape is maintained through mitosis.
The DNTM3A and DNMT3B de novo DNA methyltransferases (DNMTs) are responsible for setting genomic DNA methylation patterns, a key layer of epigenetic information. Here, using an in vivo episomal methylation assay and extensive bisulfite methylation sequencing, we show that human DNMT3A and DNMT3B possess significant and distinct flanking sequence preferences for target CpG sites. Selection for high or low efficiency sites is mediated by the base composition at the −2 and +2 positions flanking the CpG site for DNMT3A, and at the −1 and +1 positions for DNMT3B. This intrinsic preference reproducibly leads to the formation of specific de novo methylation patterns characterized by up to 34-fold variations in the efficiency of DNA methylation at individual sites. Furthermore, analysis of the distribution of signature methylation hotspot and coldspot motifs suggests that DNMT flanking sequence preference has contributed to shaping the composition of CpG islands in the human genome. Our results also show that the DNMT3L stimulatory factor modulates the formation of de novo methylation patterns in two ways. First, DNMT3L selectively focuses the DNA methylation machinery on properly chromatinized DNA templates. Second, DNMT3L attenuates the impact of the intrinsic DNMT flanking sequence preference by providing a much greater boost to the methylation of poorly methylated sites, thus promoting the formation of broader and more uniform methylation patterns. This study offers insights into the manner by which DNA methylation patterns are deposited and reveals a new level of interplay between members of the de novo DNMT family.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.