G-rich genomic regions can form G4 DNA upon transcription or replication. We have quantified the potential for G4 DNA formation (G4P) of the 16 654 genes in the human RefSeq database, and then correlated gene function with G4P. We have found that very low and very high G4P correlates with specific functional classes of genes. Notably, tumor suppressor genes have very low G4P and proto-oncogenes have very high G4P. G4P of these genes is evenly distributed between exons and introns, and it does not reflect enrichment for CpG islands or local chromosomal environment. These results show that genomic structure undergoes selection based on gene function. Selection based on G4P could promote genomic stability (or instability) of specific classes of genes; or reflect mechanisms for global regulation of gene expression.
To understand how potential for G-quadruplex formation might influence regulation of gene expression, we examined the 2 kb spanning the transcription start sites (TSS) of the 18 217 human RefSeq genes, distinguishing contributions of template and nontemplate strands. Regions both upstream and downstream of the TSS are G-rich, but the downstream region displays a clear bias toward G-richness on the nontemplate strand. Upstream of the TSS, much of the G-richness and potential for G-quadruplex formation derives from the presence of well-defined canonical regulatory motifs in duplex DNA, including CpG dinucleotides which are sites of regulatory methylation, and motifs recognized by the transcription factor SP1. This challenges the notion that quadruplex formation upstream of the TSS contributes to regulation of gene expression. Downstream of the TSS, G-richness is concentrated in the first intron, and on the nontemplate strand, where polymorphic sequence elements with potential to form G-quadruplex structures and which cannot be accounted for by known regulatory motifs are found in almost 3000 (16%) of the human RefSeq genes, and are conserved through frogs. These elements could in principle be recognized either as DNA or as RNA, providing structural targets for regulation at the level of transcription or RNA processing.
G4 motifs are greatly enriched near promoters, suggesting that quadruplex structures may be targets of transcriptional regulation. Here we show, by ChIP-Seq analysis of human cells, that 40% of the binding sites of the transcription-associated helicases, XPB and XPD, overlap with G4 motifs. The highly significant overlap of XPB and XPD binding sites with G4 motifs cannot be explained by GC-richness or parameters of the genomewide analysis, but instead suggests that these proteins are recruited to quadruplex structures that form in genomic DNA (G4 DNA). Biochemical analysis demonstrates that XPD is a robust G4 DNA helicase, and XPB binds to G4 DNA. XPB and XPD are enriched near the transcription start site (TSS) at 20% of genes, especially highly transcribed genes. XPB and XPD enrichment at G4 motifs characterizes specific signaling pathways and regulatory pathways associated with specific cancers. These results identify new candidate pathways for therapies targeted to quadruplexes.
The RNA Pol II transcription complex pauses just downstream of the promoter in a significant fraction of human genes. The local features of genomic structure that contribute to pausing have not been defined. Here, we show that genes that pause are more G-rich within the region flanking the transcription start site (TSS) than RefSeq genes or non-paused genes. We show that enrichment of binding motifs for common transcription factors, such as SP1, may account for G-richness upstream but not downstream of the TSS. We further show that pausing correlates with the presence of a GrIn1 element, an element bearing one or more G4 motifs at the 5′-end of the first intron, on the non-template DNA strand. These results suggest potential roles for dynamic G4 DNA and G4 RNA structures in cis-regulation of pausing, and thus genome-wide regulation of gene expression, in human cells.
Formation of G4 DNA may occur in the course of replication and transcription, and contribute to genomic instability. We have quantitated abundance of G4 motifs and potential for G4 DNA formation of the nontemplate strand of 5' exons and introns of transcripts of human genes. We find that, for all human genes, G4 motifs are enriched in 5' regions of transcripts relative to downstream regions; and in 5' regulatory regions relative to coding regions. Notably, although tumor suppressor genes are depleted and proto-oncogenes enriched in G4 motifs, abundance of G4 motifs in the 5' regions of transcripts of genes in these categories do not differ. These results support the hypothesis that G4 motifs are under selection in the human genome. They further show that for tumor suppressor genes and proto-oncogenes, independent selection determines G4 DNA potential of 5' regulatory regions of transcripts and downstream coding regions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.